--- Log opened Sun Jan 01 00:00:19 2012 | ||
-!- in3xes [~in3xes@49.249.161.120] has quit [Quit: Leaving] | 05:45 | |
-!- blackburn [~blackburn@109.226.100.113] has joined #shogun | 15:49 | |
-!- blackburn1 [~blackburn@109.226.118.164] has joined #shogun | 17:24 | |
-!- blackburn [~blackburn@109.226.100.113] has quit [Ping timeout: 268 seconds] | 17:27 | |
-!- blackburn1 [~blackburn@109.226.118.164] has quit [Read error: Connection reset by peer] | 17:44 | |
-!- blackburn [~blackburn@109.226.69.156] has joined #shogun | 17:44 | |
-!- ishaanmlhtr [~ishaan@115.242.7.172] has joined #shogun | 19:09 | |
-!- ishaanmlhtr [~ishaan@115.242.7.172] has quit [Quit: Leaving] | 20:01 | |
-!- puneetgoyal [~puneetgoy@117.197.180.221] has joined #shogun | 21:17 | |
puneetgoyal | blackburn : hi | 21:18 |
---|---|---|
blackburn | puneetgoyal: hi | 21:19 |
puneetgoyal | blackburn: I wanted to ask you one more thing....If I create a matrix with the top most words of each document...the matrix will only be having the tf-idf values right? | 21:21 |
blackburn | puneetgoyal: yes, tf-idfs associated with these top words | 21:21 |
puneetgoyal | and we will only be using those values in computing...and not the words themselves? | 21:22 |
blackburn | yes, for classifying we will use this representation, without words explicitly | 21:23 |
puneetgoyal | blackburn: but these values could be for any document...if we are not verifying them with the words | 21:24 |
puneetgoyal | I mean...words will tell us if the email is a spam or a ham..and not the importance of any word to the mail | 21:25 |
blackburn | hmm in this approach this importance matters | 21:26 |
blackburn | i.e. if we choosed words 'porno', 'viagra', etc | 21:26 |
blackburn | then with weights calculated we could determine spam or ham | 21:26 |
puneetgoyal | yes, we need to know that these words are there | 21:27 |
blackburn | well if tf-idf is not zero it is there for sure | 21:27 |
blackburn | we learn classifier, not doing heuristics | 21:28 |
puneetgoyal | what I mean is...if we are looking for a spam and has some word 'porno' with the highes tf-idf value...say 0.2 ......and we have another mail with some word 'computer' as the most important word...with same value 0.2 and we have gave them the respective labels | 21:31 |
blackburn | no, 'top' words are chosen dataset-wide | 21:32 |
puneetgoyal | ohk..so it means we will be using the same words in the test mail as well? | 21:33 |
blackburn | yes | 21:33 |
puneetgoyal | ok | 21:33 |
blackburn | puneetgoyal: e.g. porno, viagra, the, hello, sincerely | 21:33 |
blackburn | :D | 21:33 |
puneetgoyal | ok...got it now...thanks :) | 21:34 |
blackburn | but 'the' is bad word | 21:34 |
blackburn | it will have ~0 idf | 21:34 |
blackburn | cause it is common used | 21:34 |
puneetgoyal | yes...its one of the stop words | 21:35 |
blackburn | with tf-idf you would not have to choose some stop words or so | 21:35 |
blackburn | that's what we do in machine learning - it is better to do things automagically | 21:36 |
puneetgoyal | yeah...they get out automatically...with the calculation | 21:36 |
@sonney2k | I would suggest to even just use n-grams | 22:45 |
blackburn | sonney2k: wow mysterious guy here! | 22:48 |
CIA-1 | shogun: Soeren Sonnenburg master * r9ab7e21 / (9 files in 2 dirs): | 22:49 |
CIA-1 | shogun: Merge pull request #346 from karlnapf/master | 22:49 |
CIA-1 | shogun: load_file_parameter (+11 more commits...) - http://git.io/719kug | 22:49 |
blackburn | uncatchable sonney2k | 22:49 |
@sonney2k | blackburn, just normal overload | 22:49 |
blackburn | sonney2k: don't you had holidays? | 22:50 |
@sonney2k | exactly - more work then no holidays :) | 22:50 |
blackburn | crazy | 22:50 |
@sonney2k | seems like heiko has been doing a good amount of work :) | 22:51 |
@sonney2k | lets hope he manages to complete things before vanishing in studies again... | 22:51 |
blackburn | sonney2k: I have absolutely no idea what are you guys doing | 22:52 |
@sonney2k | it is still all about this serialization business and variable name / type changes | 22:52 |
blackburn | sonney2k: will you have a chance to glance over my paper soon? | 22:53 |
@sonney2k | e.g. one could have scalar parameters C1 C2 and that change to a parameter vector called C | 22:53 |
blackburn | ah | 22:53 |
@sonney2k | and now you serialize one object that has C1 / C2 (old shogun version) | 22:53 |
@sonney2k | and want to load it into the new one | 22:54 |
@sonney2k | so we need a migration C1 / C2 -> C | 22:54 |
@sonney2k | heavy stuff | 22:54 |
blackburn | yeah | 22:54 |
blackburn | unmanageable | 22:54 |
@sonney2k | blackburn, as I told you I read your paper | 22:54 |
blackburn | yes but you wanted to make some fixes | 22:54 |
@sonney2k | and wanted to do some fixed but didn't find the time for that :( | 22:55 |
@sonney2k | no big things though | 22:55 |
blackburn | yes, exactly what I'm asking | 22:55 |
-!- puneetgoyal [~puneetgoy@117.197.180.221] has left #shogun ["Leaving"] | 23:01 | |
shogun-buildbot | build #425 of cmdline_static is complete: Failure [failed test_1] Build details are at http://www.shogun-toolbox.org/buildbot/builders/cmdline_static/builds/425 blamelist: heiko.strathmann@gmail.com | 23:03 |
shogun-buildbot | build #399 of python_static is complete: Failure [failed test_1] Build details are at http://www.shogun-toolbox.org/buildbot/builders/python_static/builds/399 blamelist: heiko.strathmann@gmail.com | 23:05 |
shogun-buildbot | build #407 of r_static is complete: Failure [failed test_1] Build details are at http://www.shogun-toolbox.org/buildbot/builders/r_static/builds/407 blamelist: heiko.strathmann@gmail.com | 23:06 |
shogun-buildbot | build #426 of cmdline_static is complete: Success [build successful] Build details are at http://www.shogun-toolbox.org/buildbot/builders/cmdline_static/builds/426 | 23:09 |
shogun-buildbot | build #400 of python_static is complete: Success [build successful] Build details are at http://www.shogun-toolbox.org/buildbot/builders/python_static/builds/400 | 23:10 |
shogun-buildbot | build #412 of octave_static is complete: Success [build successful] Build details are at http://www.shogun-toolbox.org/buildbot/builders/octave_static/builds/412 | 23:14 |
shogun-buildbot | build #408 of r_static is complete: Success [build successful] Build details are at http://www.shogun-toolbox.org/buildbot/builders/r_static/builds/408 | 23:16 |
shogun-buildbot | build #259 of r_modular is complete: Failure [failed compile] Build details are at http://www.shogun-toolbox.org/buildbot/builders/r_modular/builds/259 blamelist: sonne@debian.org, heiko.strathmann@gmail.com | 23:53 |
--- Log closed Mon Jan 02 00:00:19 2012 |
Generated by irclog2html.py 2.10.0 by Marius Gedminas - find it at mg.pov.lt!