--- Log opened Sun Dec 25 00:00:19 2011 | ||
-!- blackburn1 [~blackburn@188.168.4.33] has quit [Quit: Leaving.] | 00:49 | |
-!- blackburn [~blackburn@188.168.4.177] has joined #shogun | 11:31 | |
-!- blackburn [~blackburn@188.168.4.177] has quit [Ping timeout: 240 seconds] | 12:51 | |
-!- blackburn [~blackburn@83.234.54.14] has joined #shogun | 12:52 | |
-!- blackburn [~blackburn@83.234.54.14] has quit [Ping timeout: 252 seconds] | 13:21 | |
-!- blackburn [~blackburn@188.168.5.99] has joined #shogun | 13:22 | |
-!- blackburn [~blackburn@188.168.5.99] has quit [Ping timeout: 240 seconds] | 14:00 | |
-!- blackburn [~blackburn@188.168.4.157] has joined #shogun | 15:00 | |
-!- puneetgoyal [~puneetgoy@117.203.127.5] has joined #shogun | 15:43 | |
-!- puneetgoyal [~puneetgoy@117.203.127.5] has quit [Ping timeout: 248 seconds] | 15:50 | |
-!- puneetgoyal [~puneetgoy@117.203.127.5] has joined #shogun | 16:37 | |
-!- puneetgoyal [~puneetgoy@117.203.127.5] has quit [Read error: Connection reset by peer] | 17:05 | |
-!- puneetgoyal [~puneetgoy@117.203.127.5] has joined #shogun | 18:33 | |
-!- blackburn [~blackburn@188.168.4.157] has quit [Quit: Leaving.] | 19:10 | |
-!- blackburn [~blackburn@188.168.4.204] has joined #shogun | 19:41 | |
-!- puneetgoyal [~puneetgoy@117.203.127.5] has quit [Read error: No route to host] | 20:33 | |
-!- puneetgoyal [~puneetgoy@117.203.127.5] has joined #shogun | 20:33 | |
CIA-1 | shogun: Soeren Sonnenburg master * r4688b54 / examples/undocumented/java_modular/check.sh : require 1GB java heap space - http://git.io/Xo9XAA | 21:09 |
---|---|---|
blackburn | heh you online | 21:09 |
blackburn | sonney2k: hello!:) | 21:13 |
CIA-1 | shogun: Soeren Sonnenburg master * rfd27f86 / examples/undocumented/python_modular/serialization_matrix_modular.py : add missing import os - http://git.io/unSFKg | 21:51 |
shogun-buildbot | build #256 of java_modular is complete: Success [build successful] Build details are at http://www.shogun-toolbox.org/buildbot/builders/java_modular/builds/256 | 22:35 |
puneetgoyal | blackburn: u there? | 22:40 |
blackburn | puneetgoyal: yes | 22:42 |
puneetgoyal | blackburn: Have you checked the results of the file I sent you ? | 22:42 |
blackburn | puneetgoyal: sry not very carefully | 22:42 |
blackburn | how to use it properly? | 22:42 |
blackburn | I'll check it now | 22:43 |
puneetgoyal | ok, I will mail you some details | 22:43 |
blackburn | you probably should not use absolute paths.. | 22:44 |
blackburn | like indir='/home/puneet/shogun/test' | 22:44 |
blackburn | puneetgoyal: what is stopwords? | 22:44 |
puneetgoyal | blackburn: stopwords are those which we dont need to take in into account while categorizing them | 22:45 |
blackburn | hmm | 22:45 |
puneetgoyal | as spam or ham | 22:45 |
blackburn | you probably better to use TF-IDF for this.. | 22:45 |
puneetgoyal | ok | 22:47 |
blackburn | puneetgoyal: but anyways, what exactly I'm supposed to check? | 22:50 |
puneetgoyal | I just need to know..if I am going on the right path....and what to do next | 22:50 |
blackburn | puneetgoyal: hmm okay it is useful experience anyway | 22:51 |
blackburn | I would suggest you to write simple TF-IDF | 22:51 |
blackburn | and use it as features | 22:51 |
blackburn | for classify | 22:51 |
blackburn | do you know what TF-IDF is? | 22:51 |
puneetgoyal | I havent read much about it | 22:52 |
blackburn | it is pretty easy | 22:52 |
blackburn | puneetgoyal: thresholding tf-idf can be used just like 'stopwords' concept | 22:52 |
blackburn | i.e. common words will have ~0 tf-idf | 22:53 |
puneetgoyal | 0 means it has no contribution in calculating the probability of the mail being a spam or a ham? | 22:54 |
blackburn | puneetgoyal: yes | 22:55 |
puneetgoyal | ok...and more the no. that word occurred...more will be its value of tf-idf | 22:56 |
puneetgoyal | if it is not a stop word | 22:56 |
blackburn | puneetgoyal: idf(term) = log of (total word counts) / (number of documents having term) | 22:57 |
blackburn | yes, then thresholding it say 0.1 or so | 22:57 |
blackburn | you will get most valuable words | 22:57 |
blackburn | and then you can just form feature vectors | 22:57 |
puneetgoyal | I was stucked here | 22:58 |
blackburn | why? | 22:58 |
puneetgoyal | Actually what method I was using..I found the words which were valuable....but was not able to find out where should procede further | 22:59 |
puneetgoyal | feature vector of what I should make? | 22:59 |
blackburn | puneetgoyal: okay if you have computed tf-idfs | 22:59 |
blackburn | you will get some 'rates' | 23:00 |
blackburn | for words X,Y,Z,... | 23:00 |
puneetgoyal | yup | 23:00 |
blackburn | then for document 1 you will get (X rate for doc 1, Y rate for doc 1, Z rate for doc 1, ...) | 23:00 |
blackburn | same way for other docs | 23:00 |
blackburn | then you can use really any classifier | 23:01 |
blackburn | cause you will get euclidean representation for your texts | 23:01 |
puneetgoyal | hmm..ok | 23:01 |
blackburn | isn't it clear for you yet? we've got to make it really clear :) | 23:02 |
puneetgoyal | actually I am not much clear with the concept of classification... | 23:02 |
blackburn | puneetgoyal: hmm ok | 23:03 |
puneetgoyal | Even after running some examples...I got the training data...I got all the results which you were asking us to get...but didnt get to know how it was being classified | 23:03 |
blackburn | puneetgoyal: you could check some lectures probably.. | 23:04 |
blackburn | what exactly don't you understand? | 23:04 |
puneetgoyal | hmm...I guess I will need to look for more examples.. | 23:05 |
blackburn | puneetgoyal: we have really bad examples | 23:06 |
blackburn | that's the thing you can help us with | 23:06 |
blackburn | in fact our examples is just tests :) | 23:07 |
puneetgoyal | hmmm | 23:08 |
blackburn | puneetgoyal: okay I'll write you a snippet | 23:10 |
puneetgoyal | blackburn: oh...no need to do that if you re busy...I will just get back to you with some solid example..where I could tell you what my real problem is | 23:11 |
blackburn | puneetgoyal: not busy now :) | 23:11 |
puneetgoyal | blackburn: gr8!....I got an example though... | 23:16 |
blackburn | puneetgoyal: I sent a little example | 23:38 |
blackburn | puneetgoyal: there are two figures: one for train data - two gaussian blobs | 23:39 |
blackburn | then we add new points and predict it | 23:39 |
puneetgoyal | ok | 23:39 |
blackburn | you will see how it work a little | 23:39 |
blackburn | ok sleep time now :) | 23:40 |
blackburn | puneetgoyal: see you | 23:40 |
puneetgoyal | blackburn: ok..thanks a lot.good nite :) | 23:40 |
-!- blackburn [~blackburn@188.168.4.204] has quit [Quit: Leaving.] | 23:41 | |
-!- puneetgoyal [~puneetgoy@117.203.127.5] has quit [Quit: Leaving] | 23:41 | |
--- Log closed Mon Dec 26 00:00:19 2011 |
Generated by irclog2html.py 2.10.0 by Marius Gedminas - find it at mg.pov.lt!