IRC logs of #shogun for Monday, 2012-03-05

--- Log opened Mon Mar 05 00:00:19 2012
n4nd0in high school people that go into letters may study Latin and/or Greek00:00
blackburnI knew about 200-300 words I guess00:01
blackburnit impacts sometimes when you understand a word you have never seen because it is similar to some latin00:02
n4nd0yeah00:02
n4nd0I had a teacher in Swedish who had some knowledge in Spanish and Italian00:03
n4nd0not enough for a fluent conversation but he could read quite a bit00:03
n4nd0he claimed that he never studied those, just Latin00:03
n4nd0really curious!00:04
blackburnlike meta-language00:04
blackburn:)00:04
blackburnuh 3 am00:05
blackburnI guess I have to sleep a little :)00:05
n4nd0oh that's late00:05
n4nd0"just" 12 here00:05
n4nd0good night then00:06
blackburnI wish it was 12 here00:06
blackburn:)00:06
blackburngood night00:06
-!- blackburn [~qdrgsm@31.28.32.139] has quit [Quit: Leaving.]00:06
-!- n4nd0 [~nando@s83-179-44-135.cust.tele2.se] has quit [Ping timeout: 276 seconds]01:08
-!- axitkhurana [~akshit@14.98.227.233] has joined #shogun01:47
-!- axitkhurana [~akshit@14.98.227.233] has left #shogun []01:47
-!- vikram360 [~vikram360@117.192.171.117] has quit [Read error: Connection reset by peer]02:12
-!- n4nd0 [~nando@s83-179-44-135.cust.tele2.se] has joined #shogun08:29
CIA-64shogun: Soeren Sonnenburg master * rd3f6438 / (4 files in 2 dirs):09:07
CIA-64shogun: Mahalanobis distance fixes09:07
CIA-64shogun: - use mean of all examples09:07
CIA-64shogun: - improve documentation09:07
CIA-64shogun: - serialization support - http://git.io/0kJS3w09:07
-!- sonne|work [~sonnenbu@194.78.35.195] has joined #shogun09:10
sonne|workn4nd0: please have a look at my mahalanobis commit09:11
sonne|workthis is what I meant - but I didn't have time to check it thoroughly would be great if you could do it09:11
sonne|workthanks!09:11
n4nd0sonne|work: sure I will check it, give me some minutes09:20
sonne|workn4nd0: you basically did it like I had in mind but missed to compute the mean over both lhs/rhs and some minor issues (serialization / documentation)09:21
n4nd0sonne|work: I will take it a look so I can do it better next time09:22
-!- wiking [~wiking@huwico/staff/wiking] has joined #shogun09:23
sonne|workn4nd0: keep in mind that not everything I do is correct so have a critical eye on it - I am open for discussion :)09:23
-!- wiking [~wiking@huwico/staff/wiking] has quit [Quit: wiking]09:30
-!- wiking [~wiking@huwico/staff/wiking] has joined #shogun09:30
-!- wiking [~wiking@huwico/staff/wiking] has quit [Ping timeout: 260 seconds]09:35
n4nd0sonne|work: ups! is the current build working good? I just pulled and compiled but the linker is complaining in lot of points10:26
n4nd0multiple definition of lot of methods in shogun::MulticlassMachine10:27
sonne|workn4nd0: yes - do a git clean -dfx  to erase all files not in the repository (warning...)10:28
n4nd0I deleted .o files in multiclass and it worked10:29
n4nd0thank you :)10:29
-!- blackburn [5bdfb203@gateway/web/freenode/ip.91.223.178.3] has joined #shogun10:34
n4nd0sonne|work: so one thing in your commit is the use of (l == r)10:35
sonne|workn4nd0: yeah that's sufficient10:36
n4nd0sonne|work: that makes that they are considered different even if they have the same values but MahalanobisDistance is instantiated with different CSimpleFeatures10:36
n4nd0sonne|work: ah ok, no problem then10:36
blackburnsonne|work: I answered ;)10:38
blackburnsonne|work: about ocas - it is working10:39
sonne|workyou fixed it?10:39
sonne|workblackburn: ^10:39
blackburnnope, but for simple examples it was ok10:40
blackburnI have to check my code10:40
blackburnsonne|work: well test we have says it is ok10:40
blackburnfrom tester_all I mean10:40
sonne|workI had another report from sb else who also complained that it didn't work10:40
sonne|workblackburn: our oversight then10:41
blackburnprobably, I'll check later10:41
blackburnsonne|work: about mc-liblinear - yes it works10:43
blackburnI even got better results on my data over simple OvR liblinear10:43
sonne|workso 97 again now?10:43
blackburnsonne|work: 96.8 but I didn't do model selection very well :)10:46
blackburnpretty good anyway10:46
blackburnsuch exact homogeneous map works well and I like it pretty much :) much better to use linear spaces10:47
n4nd0sonne|work: I tested the results, they are right10:52
n4nd0sonne|work: it actually makes sense using the whole data when l != r and take the mean over both distributions, sorry I didn't get you :S10:54
sonne|workblackburn: yeah it is really fast later on!10:54
sonne|workn4nd0: yeah - I thought it is the same like cov / one should use lhs and rhs if available for mean too10:55
blackburnsonne|work: btw, I've added rejection strategies class10:55
blackburnI can't mind any not threshold based rejection strategy but it is ok to keep it modular I think10:56
sonne|workadd an example to show how it works...10:56
blackburnyeah gradually I'll do, just some rush10:56
blackburnsonne|work: rejects are particularly important for me (e.g. actual accuracy can be measured w/o rejects and it should be ~1.0)10:58
blackburnI have seen some SVMs that trains with reject option, but it would take time to implement it..10:58
sonne|workit is unclear though if you can gain a lot using this. I would suspect your simple thresholding works good enough for most cases :)11:06
blackburnsonne|work: maybe some assumption that trainset should have rejected vectors that should not turn hyperplane round ;)11:09
sonne|workyeah but you can control that already by giving different Cs to examples11:10
blackburntrue11:11
sonne|workof course you would need to know which examples could be problematic11:11
sonne|workprobably the ones misclassified in a previous run :D11:12
blackburnsonne|work: I had some idea (unrelated to classification) - can you imagine some python object that delegates some ops to lambdas?11:12
blackburnsome example:11:12
blackburnPythonFeatures with get_feature_vector implemented in python11:13
blackburnI did not get *any* idea how to get it done..11:13
sonne|work?11:13
blackburnsonne|work: imagine Features instance with set get_feature_vector/get_dim_feature_space/etc to lambda11:14
blackburnI think it is impossible..11:14
blackburnI mean it could be custom then11:14
sonne|workto lambda?11:14
blackburnyeah to functions11:15
sonne|workI don't understand what you want to say?11:15
blackburne.g. get_feature_vector = lambda x: some-sql-select11:15
sonne|workautogenerated features?11:15
blackburnnot, custom11:15
sonne|workformulas11:15
blackburnwhere you can set operations11:15
sonne|workcustom!?!11:15
sonne|worklike you provide some python script?11:15
blackburnyes-yes11:16
sonne|workthat's easy11:16
blackburnhow?11:16
sonne|workjust overload the get_feature_vector functions etc11:16
sonne|work(from python)11:16
blackburnreally?11:16
blackburnwill it work??11:16
sonne|workfor this to work you have to enable directors for swig though11:16
blackburndo you find it useful? I do..11:16
sonne|workwell I accidentally did that in the first swig based releases11:16
sonne|workthings become very slow then11:17
blackburnthat's bad11:17
sonne|workso I would rather want a separate class just for that11:17
sonne|workthen only this class gets director functionality11:17
sonne|workand get/set * can be overriden from $LANG11:17
blackburndamn I thought it is impossible11:17
sonne|workwelcome to  swig11:18
blackburnsonne|work: another issue (have you 2 mins more?)11:18
sonne|workyou can overload a C++ method from $LANG11:18
sonne|workno11:18
blackburnbad, ok then later11:18
blackburnhmm nevermind, useless suggestion (I thought of integrating lapack to shogun code)11:19
n4nd0blackburn: hey there! hope you are not too angry after the results in the elections11:30
n4nd0blackburn: I wanted to ask you one thing about QDA11:33
n4nd0blackburn: LDA is shogun is implemented regularized so I suppose that we are interested in regularized QDA right?11:33
blackburnn4nd0: not angry at all - let this people live with this guy ;)11:35
blackburnn4nd0: is regularization there some X+delta I?11:35
n4nd0blackburn: do you mean in QDA or LDA?11:36
blackburnboth? :)11:37
blackburnI just don't know what is the regularization there11:37
blackburnas for your question - I just meant that it would possibly be pretty easy to make it regularized11:37
blackburnor not?11:37
n4nd0I am not really sure right now11:38
n4nd0I am still reading documentation about it11:38
n4nd0but it seems to me that the method changes more than just a little when regularization is used11:39
blackburnreally?11:41
blackburnn4nd0: I think the easiest way is to implement it just as it is in scikits ;)11:42
n4nd0blackburn: haha ok11:42
n4nd0I took a look there11:43
n4nd0but I didn't find documentation about how they do it11:44
n4nd0there is an example showing a couple of plots, and the code of course11:44
blackburnlooks pretty straightforward..11:49
blackburnwhat makes you unhappy? ;)11:49
n4nd0nothing :P11:53
-!- sonne|work [~sonnenbu@194.78.35.195] has quit [Ping timeout: 276 seconds]11:53
blackburnoh we lost colonel sonnenburg11:54
blackburn:D11:54
blackburnn4nd0: http://s1-05.twitpicproxy.com/photos/large/531249569.png?key=89021311:57
blackburnit is for real ;)11:59
n4nd0oh12:00
n4nd0I saw some percentages but they were not that high12:01
n4nd0I saw something like 60 something for Putin over 70 total votations12:01
blackburnah it is in chechnya12:01
blackburnlocal region12:01
n4nd0haha it is big local region12:02
n4nd0it could almost be capital in Sweden in terms of population12:02
blackburnsmall republic12:02
n4nd0I am guessing those numbers in black are # voters12:03
blackburnyes12:03
n4nd0ah fuck I did't recognize the name at first sight12:03
n4nd0I recognize it as "Chechenia"12:03
n4nd0it is how we pronounce it in Spanish12:03
blackburnthere was a war as you may probably know :)12:04
n4nd0yeah, that's why I remember the name12:05
n4nd0it appeared a lot in the news12:05
-!- sonne|work [~sonnenbu@194.78.35.195] has joined #shogun12:09
-!- vikram360 [~vikram360@117.192.190.106] has joined #shogun12:16
vikram360blackburn : and putin wins12:16
blackburnsurprise?12:17
blackburn:)12:17
-!- n4nd0 [~nando@s83-179-44-135.cust.tele2.se] has quit [Ping timeout: 252 seconds]12:31
vikram360nope.. but the media seems to be having a field day. 3000 official complaints with the voting.12:35
sonne|workblackburn: isn't QDA the same as LDA but just on quadratic features?13:17
blackburnsonne|work: what is quadratic features?13:19
sonne|workall monomials of degree 213:19
blackburnyou probably know better? ;)13:20
sonne|workx_1*x_2 x_1^2 x_2^213:20
sonne|workfor 2d input vectors13:20
blackburnsonne|work: well we have no such features?13:20
sonne|workpolynomialdotfeatures?13:20
sonne|workor sth?13:20
sonne|workPolyFeatures13:21
blackburnah13:21
blackburnsonne|work: well I don't know then, do you think QDA is useless?13:21
sonne|workanyway it makes sense to make things explicit, i.e., if it is the same use LDA on simplefeatures?13:22
sonne|workerr imple QDA on simplefeatures by using PolyFeatures internally13:23
blackburnyeah i got it13:23
-!- vikram360 [~vikram360@117.192.190.106] has quit [Read error: Connection reset by peer]13:24
-!- sonne|work [~sonnenbu@194.78.35.195] has quit [Ping timeout: 276 seconds]13:30
-!- wiking [~wiking@huwico/staff/wiking] has joined #shogun13:38
-!- sonne|work [~sonnenbu@194.78.35.195] has joined #shogun13:45
blackburnsonne|work: it took one year for me to finally understand how features are working here in shogun hahahah13:47
sonne|workblackburn: anyway better check that LDA on squared features is QDA - could be that there is sth else to it :)14:04
blackburnsonne|work: n4ndo will do probably ;)14:04
blackburnsonne|work: I have seen interesting thing in your talk14:04
blackburnoptimizing svm with auprc14:04
blackburndid you try to train svm this way?14:06
sonne|workblackburn: doesn't help look at t joachims paper (best paper award ICML) - gives you like 0.00000001% :)14:06
sonne|workblackburn: which talk?14:06
blackburnsonne|work: http://sonnenburgs.de/soeren/talks/2006-05-02-perf-measures.pdf14:06
sonne|workohh that crap14:07
sonne|workprobably all wrong14:07
blackburnah I see14:07
blackburn:D14:07
sonne|workI guess best is to look at this one page in my thesis - there are all the perf measures I know of (and in shogun) in there14:07
blackburnI was interested in svm on last page14:08
blackburnsonne|work: I can't understand why in mc svms there are (<w_n,x>+b_n) - (<w_m,x>+b_m) < 1 - \xi_m14:08
sonne|workyeah it is some paper by Thorsten Joachims doing that fast but it doesn't help14:08
blackburnwhy 1? :)14:08
sonne|workmargin fixed to 114:08
sonne|worklike in svm14:08
blackburnI am thinking about ECOC training of svm14:09
sonne|worklike in mc-svm like in structured output leraning14:09
blackburnand don't know how to formulate this boundary14:09
blackburnsomething makes me think there won't be 1 :)14:09
sonne|workin words: f(good_x) - f(other_x) > 1- sth14:10
sonne|workanyway back to work14:10
blackburnit looks like you work in iron mine14:10
blackburn:)14:11
-!- faktotum [~cronor@fb.ml.tu-berlin.de] has joined #shogun14:45
faktotumHello!14:46
blackburnhi14:46
faktotumis there the possibility to set a custom sparse kernel?14:46
faktotumi know there are sparse kernels and that you can set custom kernels. but how do you set custom sparse kernels?14:46
faktotumi'm using python module if that is of interest14:46
blackburnI see.. I guess it is not yet implemented14:46
blackburnbut I think it is pretty straightforward to implement14:47
faktotummy current workaround is to do a cholasky K = LL* decomposition and then use L as a sparse feature vector but that is not tractable with bigger matrices14:47
blackburnI am not sure I understood why do you do cholesky14:48
sonne|workfaktotum: sounds like an easy task to add - patches welcome :)14:48
faktotumi will try it tonight14:48
faktotumblackburn: if i have K = LL* i can set my sparse features to L and then use a linear kernel. then i would end up with K as a custom sparse kernel14:50
blackburnwhoa I see14:50
sonne|workfaktotum: depending on how sparse things are you could just use SGSparseMatrix14:51
sonne|workbut it is not fast enough I guess..14:51
faktotumoh that was my idea, why is it not fast enough?14:53
blackburnfaktotum: probably it would be slow in means of checking if k_i,j is zero15:02
faktotumok, but doesn't the kernel created from sparse real features have the same problem?15:04
blackburnit has as well..15:05
blackburnI guess some hash map should be here15:05
blackburnfaktotum: anyway cholesky of some say 4000x4000 matrix is pretty slow ;)15:17
faktotumha! don't try 15k x 15k!15:23
blackburn15k x 15k?!15:23
blackburnthat probably takes a lot of memory :)15:24
faktotumchompack has a sparse cholesky decomposition implemented15:24
blackburnah15:24
blackburnI guess dense 15K would never finish15:25
-!- vikram360 [~vikram360@117.192.190.106] has joined #shogun16:35
-!- blackburn [5bdfb203@gateway/web/freenode/ip.91.223.178.3] has quit [Quit: Page closed]16:38
sonne|workfaktotum: maybe it is good enough: basically finding the kernel row is fast but not finding the column16:43
sonne|workif it is really sparse some kind of hasmap of tuples or whatever could be faster...16:44
sonne|workbut a lot of overhead then16:44
sonne|workfaktotum: so please go ahead with the sparse matrix idea - should do the job16:49
-!- in3xes [~in3xes@180.149.49.227] has joined #shogun16:50
-!- cronor [~cronor@141.23.80.206] has joined #shogun16:52
-!- faktotum [~cronor@fb.ml.tu-berlin.de] has quit [Ping timeout: 260 seconds]16:56
-!- cronor [~cronor@141.23.80.206] has quit [Remote host closed the connection]17:00
-!- cronor [~cronor@fb.ml.tu-berlin.de] has joined #shogun17:00
-!- cronor [~cronor@fb.ml.tu-berlin.de] has quit [Quit: cronor]17:07
-!- cronor [~cronor@fb.ml.tu-berlin.de] has joined #shogun17:17
-!- in3xes [~in3xes@180.149.49.227] has quit [Quit: Leaving]17:19
-!- cronor_ [~cronor@141.23.80.206] has joined #shogun17:21
-!- cronor [~cronor@fb.ml.tu-berlin.de] has quit [Ping timeout: 260 seconds]17:23
-!- cronor_ is now known as cronor17:23
-!- n4nd0 [~nando@s83-179-44-135.cust.tele2.se] has joined #shogun17:28
-!- wiking [~wiking@huwico/staff/wiking] has quit [Remote host closed the connection]17:50
-!- wiking [~wiking@huwico/staff/wiking] has joined #shogun17:50
vikram360I know this is probably a n00b question but there seems to be very little information about it: In what way is the C5.0 algorithm better than the C4.5?17:53
-!- wiking_ [~wiking@huwico/staff/wiking] has joined #shogun18:01
-!- wiking [~wiking@huwico/staff/wiking] has quit [Ping timeout: 244 seconds]18:02
-!- wiking_ is now known as wiking18:02
-!- cronor [~cronor@141.23.80.206] has quit [Quit: cronor]18:52
-!- axitkhurana [~akshit@14.98.55.250] has joined #shogun19:29
-!- axitkhurana [~akshit@14.98.55.250] has left #shogun []19:29
-!- blackburn [~qdrgsm@188.168.4.3] has joined #shogun19:47
-!- blackburn [~qdrgsm@188.168.4.3] has quit [Quit: Leaving.]20:12
@sonney2kvikram360, it is not clear to me either - all I know is that there were papers showing that it is better...22:46
@sonney2kthere just was no open source impl. of c5.0 around22:46
@sonney2kand for c4.5 only some free for acadamic use thingy22:47
@sonney2kso people tried c4.5 if they could but that's it22:48
@sonney2kahh btw weka has a java version of c4.5 (iirc called j45) that has probably much more clean code22:48
n4nd0sonney2k: hey! I read before you talked with blackburn about QDA23:05
n4nd0sonney2k: I have been reading into it so I could implement it in shogun23:06
n4nd0sonney2k: but I am not really sure if I relate what I have read about it what with that you said before23:06
n4nd0sonney2k: so it seems that QDA and LDA are similar in that they assume that the feature vectors follow a normal distribution, but LDA assumes that the distributions for all the classes have the same covariances while QDA doesn't make that assumption23:08
n4nd0sonney2k: is that right this far?23:08
@sonney2kI guess so - at least LDA when cov matrices are considered the same the problem becomes linear23:17
n4nd0sonney2k: ok, so I understand that23:21
n4nd0sonney2k: but is it then equivalent to use LDA using polynomial features?23:21
n4nd0I mean, can we just make polynomial features from the original ones (e.g. if we have at the beginning x1 and x2, we expand the feature vectors so they also contain x1?, x2? and x1?x2)23:23
n4nd0would solving that with LDA be equivalent to QDA?23:23
@sonney2kn4nd0, it must very close but I am not sure if it is exactly the same23:26
@sonney2kbest description about LDA/QDA I found is https://onlinecourses.science.psu.edu/stat857/book/export/html/1723:27
n4nd0sonney2k: cool, thank you very much, I was using this reference http://www.slac.stanford.edu/cgi-wrap/getdoc/slac-pub-4389.pdf23:31
n4nd0I have some trouble when it gets into the regularization part23:32
@sonney2khmmh, seems like QDA / LDA results on quad features differ but it is always mentioned that one can use it to get quadratic classifier ...23:34
n4nd0so do you think it would be interesting to add QDA in shogun? and if so how?23:39
n4nd0something similar to LDA that is already implemented using regularization?23:39
-!- wiking [~wiking@huwico/staff/wiking] has quit [Quit: wiking]23:42
--- Log closed Tue Mar 06 00:00:19 2012

Generated by irclog2html.py 2.10.0 by Marius Gedminas - find it at mg.pov.lt!