IRC logs of #shogun for Sunday, 2012-01-01

--- Log opened Sun Jan 01 00:00:19 2012
-!- in3xes [~in3xes@49.249.161.120] has quit [Quit: Leaving]05:45
-!- blackburn [~blackburn@109.226.100.113] has joined #shogun15:49
-!- blackburn1 [~blackburn@109.226.118.164] has joined #shogun17:24
-!- blackburn [~blackburn@109.226.100.113] has quit [Ping timeout: 268 seconds]17:27
-!- blackburn1 [~blackburn@109.226.118.164] has quit [Read error: Connection reset by peer]17:44
-!- blackburn [~blackburn@109.226.69.156] has joined #shogun17:44
-!- ishaanmlhtr [~ishaan@115.242.7.172] has joined #shogun19:09
-!- ishaanmlhtr [~ishaan@115.242.7.172] has quit [Quit: Leaving]20:01
-!- puneetgoyal [~puneetgoy@117.197.180.221] has joined #shogun21:17
puneetgoyalblackburn : hi21:18
blackburnpuneetgoyal: hi21:19
puneetgoyalblackburn: I wanted to ask you one more thing....If I create a matrix with the top most words of each document...the matrix will only be having the tf-idf values right?21:21
blackburnpuneetgoyal: yes, tf-idfs associated with these top words21:21
puneetgoyaland we will only be  using those values in computing...and not the words themselves?21:22
blackburnyes, for classifying we will use this representation, without words explicitly21:23
puneetgoyalblackburn: but these values could be for any document...if we are not verifying them with the words21:24
puneetgoyalI mean...words will tell us if the email is a spam or a ham..and not the importance of any word to the mail21:25
blackburnhmm in this approach this importance matters21:26
blackburni.e. if we choosed words 'porno', 'viagra', etc21:26
blackburnthen with weights calculated we could determine spam or ham21:26
puneetgoyalyes, we need to know that these words are there21:27
blackburnwell if tf-idf is not zero it is there for sure21:27
blackburnwe learn classifier, not doing heuristics21:28
puneetgoyalwhat I mean is...if we are looking for a spam and has some word 'porno' with the highes tf-idf value...say 0.2   ......and we have another mail with some word 'computer' as the most important word...with same value 0.2 and we have gave them the respective labels21:31
blackburnno, 'top' words are chosen dataset-wide21:32
puneetgoyalohk..so it means we will be using the same words in the test mail as well?21:33
blackburnyes21:33
puneetgoyalok21:33
blackburnpuneetgoyal: e.g. porno, viagra, the, hello, sincerely21:33
blackburn:D21:33
puneetgoyalok...got it now...thanks :)21:34
blackburnbut 'the' is bad word21:34
blackburnit will have ~0 idf21:34
blackburncause it is common used21:34
puneetgoyalyes...its one of the stop words21:35
blackburnwith tf-idf you would not have to choose some stop words or so21:35
blackburnthat's what we do in machine learning - it is better to do things automagically21:36
puneetgoyalyeah...they get out automatically...with the calculation21:36
@sonney2kI would suggest to even just use n-grams22:45
blackburnsonney2k: wow mysterious guy here!22:48
CIA-1shogun: Soeren Sonnenburg master * r9ab7e21 / (9 files in 2 dirs):22:49
CIA-1shogun: Merge pull request #346 from karlnapf/master22:49
CIA-1shogun: load_file_parameter (+11 more commits...) - http://git.io/719kug22:49
blackburnuncatchable sonney2k22:49
@sonney2kblackburn, just normal overload22:49
blackburnsonney2k: don't you had holidays?22:50
@sonney2kexactly - more work then no holidays :)22:50
blackburncrazy22:50
@sonney2kseems like heiko has been doing a good amount of work :)22:51
@sonney2klets hope he manages to complete things before vanishing in studies again...22:51
blackburnsonney2k: I have absolutely no idea what are you guys doing22:52
@sonney2kit is still all about this serialization business and variable name / type changes22:52
blackburnsonney2k: will you have a chance to glance over my paper soon?22:53
@sonney2ke.g. one could have scalar parameters C1 C2 and that change to a parameter vector called C22:53
blackburnah22:53
@sonney2kand now you serialize one object that has C1 / C2 (old shogun version)22:53
@sonney2kand want to load it into the new one22:54
@sonney2kso we need a migration C1 / C2 -> C22:54
@sonney2kheavy stuff22:54
blackburnyeah22:54
blackburnunmanageable22:54
@sonney2kblackburn, as I told you I read your paper22:54
blackburnyes but you wanted to make some fixes22:54
@sonney2kand wanted to do some fixed but didn't find the time for that :(22:55
@sonney2kno big things though22:55
blackburnyes, exactly what I'm asking22:55
-!- puneetgoyal [~puneetgoy@117.197.180.221] has left #shogun ["Leaving"]23:01
shogun-buildbotbuild #425 of cmdline_static is complete: Failure [failed test_1]  Build details are at http://www.shogun-toolbox.org/buildbot/builders/cmdline_static/builds/425  blamelist: heiko.strathmann@gmail.com23:03
shogun-buildbotbuild #399 of python_static is complete: Failure [failed test_1]  Build details are at http://www.shogun-toolbox.org/buildbot/builders/python_static/builds/399  blamelist: heiko.strathmann@gmail.com23:05
shogun-buildbotbuild #407 of r_static is complete: Failure [failed test_1]  Build details are at http://www.shogun-toolbox.org/buildbot/builders/r_static/builds/407  blamelist: heiko.strathmann@gmail.com23:06
shogun-buildbotbuild #426 of cmdline_static is complete: Success [build successful]  Build details are at http://www.shogun-toolbox.org/buildbot/builders/cmdline_static/builds/42623:09
shogun-buildbotbuild #400 of python_static is complete: Success [build successful]  Build details are at http://www.shogun-toolbox.org/buildbot/builders/python_static/builds/40023:10
shogun-buildbotbuild #412 of octave_static is complete: Success [build successful]  Build details are at http://www.shogun-toolbox.org/buildbot/builders/octave_static/builds/41223:14
shogun-buildbotbuild #408 of r_static is complete: Success [build successful]  Build details are at http://www.shogun-toolbox.org/buildbot/builders/r_static/builds/40823:16
shogun-buildbotbuild #259 of r_modular is complete: Failure [failed compile]  Build details are at http://www.shogun-toolbox.org/buildbot/builders/r_modular/builds/259  blamelist: sonne@debian.org, heiko.strathmann@gmail.com23:53
--- Log closed Mon Jan 02 00:00:19 2012

Generated by irclog2html.py 2.10.0 by Marius Gedminas - find it at mg.pov.lt!