IRC logs of #shogun for Monday, 2012-03-05

--- Log opened Mon Mar 05 00:00:19 2012
n4nd0	in high school people that go into letters may study Latin and/or Greek	00:00
blackburn	I knew about 200-300 words I guess	00:01
blackburn	it impacts sometimes when you understand a word you have never seen because it is similar to some latin	00:02
n4nd0	yeah	00:02
n4nd0	I had a teacher in Swedish who had some knowledge in Spanish and Italian	00:03
n4nd0	not enough for a fluent conversation but he could read quite a bit	00:03
n4nd0	he claimed that he never studied those, just Latin	00:03
n4nd0	really curious!	00:04
blackburn	like meta-language	00:04
blackburn	:)	00:04
blackburn	uh 3 am	00:05
blackburn	I guess I have to sleep a little :)	00:05
n4nd0	oh that's late	00:05
n4nd0	"just" 12 here	00:05
n4nd0	good night then	00:06
blackburn	I wish it was 12 here	00:06
blackburn	:)	00:06
blackburn	good night	00:06
-!- blackburn [~qdrgsm@31.28.32.139] has quit [Quit: Leaving.]		00:06
-!- n4nd0 [~nando@s83-179-44-135.cust.tele2.se] has quit [Ping timeout: 276 seconds]		01:08
-!- axitkhurana [~akshit@14.98.227.233] has joined #shogun		01:47
-!- axitkhurana [~akshit@14.98.227.233] has left #shogun []		01:47
-!- vikram360 [~vikram360@117.192.171.117] has quit [Read error: Connection reset by peer]		02:12
-!- n4nd0 [~nando@s83-179-44-135.cust.tele2.se] has joined #shogun		08:29
CIA-64	shogun: Soeren Sonnenburg master * rd3f6438 / (4 files in 2 dirs):	09:07
CIA-64	shogun: Mahalanobis distance fixes	09:07
CIA-64	shogun: - use mean of all examples	09:07
CIA-64	shogun: - improve documentation	09:07
CIA-64	shogun: - serialization support - http://git.io/0kJS3w	09:07
-!- sonne\|work [~sonnenbu@194.78.35.195] has joined #shogun		09:10
sonne\|work	n4nd0: please have a look at my mahalanobis commit	09:11
sonne\|work	this is what I meant - but I didn't have time to check it thoroughly would be great if you could do it	09:11
sonne\|work	thanks!	09:11
n4nd0	sonne\|work: sure I will check it, give me some minutes	09:20
sonne\|work	n4nd0: you basically did it like I had in mind but missed to compute the mean over both lhs/rhs and some minor issues (serialization / documentation)	09:21
n4nd0	sonne\|work: I will take it a look so I can do it better next time	09:22
-!- wiking [~wiking@huwico/staff/wiking] has joined #shogun		09:23
sonne\|work	n4nd0: keep in mind that not everything I do is correct so have a critical eye on it - I am open for discussion :)	09:23
-!- wiking [~wiking@huwico/staff/wiking] has quit [Quit: wiking]		09:30
-!- wiking [~wiking@huwico/staff/wiking] has joined #shogun		09:30
-!- wiking [~wiking@huwico/staff/wiking] has quit [Ping timeout: 260 seconds]		09:35
n4nd0	sonne\|work: ups! is the current build working good? I just pulled and compiled but the linker is complaining in lot of points	10:26
n4nd0	multiple definition of lot of methods in shogun::MulticlassMachine	10:27
sonne\|work	n4nd0: yes - do a git clean -dfx to erase all files not in the repository (warning...)	10:28
n4nd0	I deleted .o files in multiclass and it worked	10:29
n4nd0	thank you :)	10:29
-!- blackburn [5bdfb203@gateway/web/freenode/ip.91.223.178.3] has joined #shogun		10:34
n4nd0	sonne\|work: so one thing in your commit is the use of (l == r)	10:35
sonne\|work	n4nd0: yeah that's sufficient	10:36
n4nd0	sonne\|work: that makes that they are considered different even if they have the same values but MahalanobisDistance is instantiated with different CSimpleFeatures	10:36
n4nd0	sonne\|work: ah ok, no problem then	10:36
blackburn	sonne\|work: I answered ;)	10:38
blackburn	sonne\|work: about ocas - it is working	10:39
sonne\|work	you fixed it?	10:39
sonne\|work	blackburn: ^	10:39
blackburn	nope, but for simple examples it was ok	10:40
blackburn	I have to check my code	10:40
blackburn	sonne\|work: well test we have says it is ok	10:40
blackburn	from tester_all I mean	10:40
sonne\|work	I had another report from sb else who also complained that it didn't work	10:40
sonne\|work	blackburn: our oversight then	10:41
blackburn	probably, I'll check later	10:41
blackburn	sonne\|work: about mc-liblinear - yes it works	10:43
blackburn	I even got better results on my data over simple OvR liblinear	10:43
sonne\|work	so 97 again now?	10:43
blackburn	sonne\|work: 96.8 but I didn't do model selection very well :)	10:46
blackburn	pretty good anyway	10:46
blackburn	such exact homogeneous map works well and I like it pretty much :) much better to use linear spaces	10:47
n4nd0	sonne\|work: I tested the results, they are right	10:52
n4nd0	sonne\|work: it actually makes sense using the whole data when l != r and take the mean over both distributions, sorry I didn't get you :S	10:54
sonne\|work	blackburn: yeah it is really fast later on!	10:54
sonne\|work	n4nd0: yeah - I thought it is the same like cov / one should use lhs and rhs if available for mean too	10:55
blackburn	sonne\|work: btw, I've added rejection strategies class	10:55
blackburn	I can't mind any not threshold based rejection strategy but it is ok to keep it modular I think	10:56
sonne\|work	add an example to show how it works...	10:56
blackburn	yeah gradually I'll do, just some rush	10:56
blackburn	sonne\|work: rejects are particularly important for me (e.g. actual accuracy can be measured w/o rejects and it should be ~1.0)	10:58
blackburn	I have seen some SVMs that trains with reject option, but it would take time to implement it..	10:58
sonne\|work	it is unclear though if you can gain a lot using this. I would suspect your simple thresholding works good enough for most cases :)	11:06
blackburn	sonne\|work: maybe some assumption that trainset should have rejected vectors that should not turn hyperplane round ;)	11:09
sonne\|work	yeah but you can control that already by giving different Cs to examples	11:10
blackburn	true	11:11
sonne\|work	of course you would need to know which examples could be problematic	11:11
sonne\|work	probably the ones misclassified in a previous run :D	11:12
blackburn	sonne\|work: I had some idea (unrelated to classification) - can you imagine some python object that delegates some ops to lambdas?	11:12
blackburn	some example:	11:12
blackburn	PythonFeatures with get_feature_vector implemented in python	11:13
blackburn	I did not get any idea how to get it done..	11:13
sonne\|work	?	11:13
blackburn	sonne\|work: imagine Features instance with set get_feature_vector/get_dim_feature_space/etc to lambda	11:14
blackburn	I think it is impossible..	11:14
blackburn	I mean it could be custom then	11:14
sonne\|work	to lambda?	11:14
blackburn	yeah to functions	11:15
sonne\|work	I don't understand what you want to say?	11:15
blackburn	e.g. get_feature_vector = lambda x: some-sql-select	11:15
sonne\|work	autogenerated features?	11:15
blackburn	not, custom	11:15
sonne\|work	formulas	11:15
blackburn	where you can set operations	11:15
sonne\|work	custom!?!	11:15
sonne\|work	like you provide some python script?	11:15
blackburn	yes-yes	11:16
sonne\|work	that's easy	11:16
blackburn	how?	11:16
sonne\|work	just overload the get_feature_vector functions etc	11:16
sonne\|work	(from python)	11:16
blackburn	really?	11:16
blackburn	will it work??	11:16
sonne\|work	for this to work you have to enable directors for swig though	11:16
blackburn	do you find it useful? I do..	11:16
sonne\|work	well I accidentally did that in the first swig based releases	11:16
sonne\|work	things become very slow then	11:17
blackburn	that's bad	11:17
sonne\|work	so I would rather want a separate class just for that	11:17
sonne\|work	then only this class gets director functionality	11:17
sonne\|work	and get/set * can be overriden from $LANG	11:17
blackburn	damn I thought it is impossible	11:17
sonne\|work	welcome to swig	11:18
blackburn	sonne\|work: another issue (have you 2 mins more?)	11:18
sonne\|work	you can overload a C++ method from $LANG	11:18
sonne\|work	no	11:18
blackburn	bad, ok then later	11:18
blackburn	hmm nevermind, useless suggestion (I thought of integrating lapack to shogun code)	11:19
n4nd0	blackburn: hey there! hope you are not too angry after the results in the elections	11:30
n4nd0	blackburn: I wanted to ask you one thing about QDA	11:33
n4nd0	blackburn: LDA is shogun is implemented regularized so I suppose that we are interested in regularized QDA right?	11:33
blackburn	n4nd0: not angry at all - let this people live with this guy ;)	11:35
blackburn	n4nd0: is regularization there some X+delta I?	11:35
n4nd0	blackburn: do you mean in QDA or LDA?	11:36
blackburn	both? :)	11:37
blackburn	I just don't know what is the regularization there	11:37
blackburn	as for your question - I just meant that it would possibly be pretty easy to make it regularized	11:37
blackburn	or not?	11:37
n4nd0	I am not really sure right now	11:38
n4nd0	I am still reading documentation about it	11:38
n4nd0	but it seems to me that the method changes more than just a little when regularization is used	11:39
blackburn	really?	11:41
blackburn	n4nd0: I think the easiest way is to implement it just as it is in scikits ;)	11:42
n4nd0	blackburn: haha ok	11:42
n4nd0	I took a look there	11:43
n4nd0	but I didn't find documentation about how they do it	11:44
n4nd0	there is an example showing a couple of plots, and the code of course	11:44
blackburn	looks pretty straightforward..	11:49
blackburn	what makes you unhappy? ;)	11:49
n4nd0	nothing :P	11:53
-!- sonne\|work [~sonnenbu@194.78.35.195] has quit [Ping timeout: 276 seconds]		11:53
blackburn	oh we lost colonel sonnenburg	11:54
blackburn	:D	11:54
blackburn	n4nd0: http://s1-05.twitpicproxy.com/photos/large/531249569.png?key=890213	11:57
blackburn	it is for real ;)	11:59
n4nd0	oh	12:00
n4nd0	I saw some percentages but they were not that high	12:01
n4nd0	I saw something like 60 something for Putin over 70 total votations	12:01
blackburn	ah it is in chechnya	12:01
blackburn	local region	12:01
n4nd0	haha it is big local region	12:02
n4nd0	it could almost be capital in Sweden in terms of population	12:02
blackburn	small republic	12:02
n4nd0	I am guessing those numbers in black are # voters	12:03
blackburn	yes	12:03
n4nd0	ah fuck I did't recognize the name at first sight	12:03
n4nd0	I recognize it as "Chechenia"	12:03
n4nd0	it is how we pronounce it in Spanish	12:03
blackburn	there was a war as you may probably know :)	12:04
n4nd0	yeah, that's why I remember the name	12:05
n4nd0	it appeared a lot in the news	12:05
-!- sonne\|work [~sonnenbu@194.78.35.195] has joined #shogun		12:09
-!- vikram360 [~vikram360@117.192.190.106] has joined #shogun		12:16
vikram360	blackburn : and putin wins	12:16
blackburn	surprise?	12:17
blackburn	:)	12:17
-!- n4nd0 [~nando@s83-179-44-135.cust.tele2.se] has quit [Ping timeout: 252 seconds]		12:31
vikram360	nope.. but the media seems to be having a field day. 3000 official complaints with the voting.	12:35
sonne\|work	blackburn: isn't QDA the same as LDA but just on quadratic features?	13:17
blackburn	sonne\|work: what is quadratic features?	13:19
sonne\|work	all monomials of degree 2	13:19
blackburn	you probably know better? ;)	13:20
sonne\|work	x_1*x_2 x_1^2 x_2^2	13:20
sonne\|work	for 2d input vectors	13:20
blackburn	sonne\|work: well we have no such features?	13:20
sonne\|work	polynomialdotfeatures?	13:20
sonne\|work	or sth?	13:20
sonne\|work	PolyFeatures	13:21
blackburn	ah	13:21
blackburn	sonne\|work: well I don't know then, do you think QDA is useless?	13:21
sonne\|work	anyway it makes sense to make things explicit, i.e., if it is the same use LDA on simplefeatures?	13:22
sonne\|work	err imple QDA on simplefeatures by using PolyFeatures internally	13:23
blackburn	yeah i got it	13:23
-!- vikram360 [~vikram360@117.192.190.106] has quit [Read error: Connection reset by peer]		13:24
-!- sonne\|work [~sonnenbu@194.78.35.195] has quit [Ping timeout: 276 seconds]		13:30
-!- wiking [~wiking@huwico/staff/wiking] has joined #shogun		13:38
-!- sonne\|work [~sonnenbu@194.78.35.195] has joined #shogun		13:45
blackburn	sonne\|work: it took one year for me to finally understand how features are working here in shogun hahahah	13:47
sonne\|work	blackburn: anyway better check that LDA on squared features is QDA - could be that there is sth else to it :)	14:04
blackburn	sonne\|work: n4ndo will do probably ;)	14:04
blackburn	sonne\|work: I have seen interesting thing in your talk	14:04
blackburn	optimizing svm with auprc	14:04
blackburn	did you try to train svm this way?	14:06
sonne\|work	blackburn: doesn't help look at t joachims paper (best paper award ICML) - gives you like 0.00000001% :)	14:06
sonne\|work	blackburn: which talk?	14:06
blackburn	sonne\|work: http://sonnenburgs.de/soeren/talks/2006-05-02-perf-measures.pdf	14:06
sonne\|work	ohh that crap	14:07
sonne\|work	probably all wrong	14:07
blackburn	ah I see	14:07
blackburn	:D	14:07
sonne\|work	I guess best is to look at this one page in my thesis - there are all the perf measures I know of (and in shogun) in there	14:07
blackburn	I was interested in svm on last page	14:08
blackburn	sonne\|work: I can't understand why in mc svms there are (<w_n,x>+b_n) - (<w_m,x>+b_m) < 1 - \xi_m	14:08
sonne\|work	yeah it is some paper by Thorsten Joachims doing that fast but it doesn't help	14:08
blackburn	why 1? :)	14:08
sonne\|work	margin fixed to 1	14:08
sonne\|work	like in svm	14:08
blackburn	I am thinking about ECOC training of svm	14:09
sonne\|work	like in mc-svm like in structured output leraning	14:09
blackburn	and don't know how to formulate this boundary	14:09
blackburn	something makes me think there won't be 1 :)	14:09
sonne\|work	in words: f(good_x) - f(other_x) > 1- sth	14:10
sonne\|work	anyway back to work	14:10
blackburn	it looks like you work in iron mine	14:10
blackburn	:)	14:11
-!- faktotum [~cronor@fb.ml.tu-berlin.de] has joined #shogun		14:45
faktotum	Hello!	14:46
blackburn	hi	14:46
faktotum	is there the possibility to set a custom sparse kernel?	14:46
faktotum	i know there are sparse kernels and that you can set custom kernels. but how do you set custom sparse kernels?	14:46
faktotum	i'm using python module if that is of interest	14:46
blackburn	I see.. I guess it is not yet implemented	14:46
blackburn	but I think it is pretty straightforward to implement	14:47
faktotum	my current workaround is to do a cholasky K = LL* decomposition and then use L as a sparse feature vector but that is not tractable with bigger matrices	14:47
blackburn	I am not sure I understood why do you do cholesky	14:48
sonne\|work	faktotum: sounds like an easy task to add - patches welcome :)	14:48
faktotum	i will try it tonight	14:48
faktotum	blackburn: if i have K = LL* i can set my sparse features to L and then use a linear kernel. then i would end up with K as a custom sparse kernel	14:50
blackburn	whoa I see	14:50
sonne\|work	faktotum: depending on how sparse things are you could just use SGSparseMatrix	14:51
sonne\|work	but it is not fast enough I guess..	14:51
faktotum	oh that was my idea, why is it not fast enough?	14:53
blackburn	faktotum: probably it would be slow in means of checking if k_i,j is zero	15:02
faktotum	ok, but doesn't the kernel created from sparse real features have the same problem?	15:04
blackburn	it has as well..	15:05
blackburn	I guess some hash map should be here	15:05
blackburn	faktotum: anyway cholesky of some say 4000x4000 matrix is pretty slow ;)	15:17
faktotum	ha! don't try 15k x 15k!	15:23
blackburn	15k x 15k?!	15:23
blackburn	that probably takes a lot of memory :)	15:24
faktotum	chompack has a sparse cholesky decomposition implemented	15:24
blackburn	ah	15:24
blackburn	I guess dense 15K would never finish	15:25
-!- vikram360 [~vikram360@117.192.190.106] has joined #shogun		16:35
-!- blackburn [5bdfb203@gateway/web/freenode/ip.91.223.178.3] has quit [Quit: Page closed]		16:38
sonne\|work	faktotum: maybe it is good enough: basically finding the kernel row is fast but not finding the column	16:43
sonne\|work	if it is really sparse some kind of hasmap of tuples or whatever could be faster...	16:44
sonne\|work	but a lot of overhead then	16:44
sonne\|work	faktotum: so please go ahead with the sparse matrix idea - should do the job	16:49
-!- in3xes [~in3xes@180.149.49.227] has joined #shogun		16:50
-!- cronor [~cronor@141.23.80.206] has joined #shogun		16:52
-!- faktotum [~cronor@fb.ml.tu-berlin.de] has quit [Ping timeout: 260 seconds]		16:56
-!- cronor [~cronor@141.23.80.206] has quit [Remote host closed the connection]		17:00
-!- cronor [~cronor@fb.ml.tu-berlin.de] has joined #shogun		17:00
-!- cronor [~cronor@fb.ml.tu-berlin.de] has quit [Quit: cronor]		17:07
-!- cronor [~cronor@fb.ml.tu-berlin.de] has joined #shogun		17:17
-!- in3xes [~in3xes@180.149.49.227] has quit [Quit: Leaving]		17:19
-!- cronor_ [~cronor@141.23.80.206] has joined #shogun		17:21
-!- cronor [~cronor@fb.ml.tu-berlin.de] has quit [Ping timeout: 260 seconds]		17:23
-!- cronor_ is now known as cronor		17:23
-!- n4nd0 [~nando@s83-179-44-135.cust.tele2.se] has joined #shogun		17:28
-!- wiking [~wiking@huwico/staff/wiking] has quit [Remote host closed the connection]		17:50
-!- wiking [~wiking@huwico/staff/wiking] has joined #shogun		17:50
vikram360	I know this is probably a n00b question but there seems to be very little information about it: In what way is the C5.0 algorithm better than the C4.5?	17:53
-!- wiking_ [~wiking@huwico/staff/wiking] has joined #shogun		18:01
-!- wiking [~wiking@huwico/staff/wiking] has quit [Ping timeout: 244 seconds]		18:02
-!- wiking_ is now known as wiking		18:02
-!- cronor [~cronor@141.23.80.206] has quit [Quit: cronor]		18:52
-!- axitkhurana [~akshit@14.98.55.250] has joined #shogun		19:29
-!- axitkhurana [~akshit@14.98.55.250] has left #shogun []		19:29
-!- blackburn [~qdrgsm@188.168.4.3] has joined #shogun		19:47
-!- blackburn [~qdrgsm@188.168.4.3] has quit [Quit: Leaving.]		20:12
@sonney2k	vikram360, it is not clear to me either - all I know is that there were papers showing that it is better...	22:46
@sonney2k	there just was no open source impl. of c5.0 around	22:46
@sonney2k	and for c4.5 only some free for acadamic use thingy	22:47
@sonney2k	so people tried c4.5 if they could but that's it	22:48
@sonney2k	ahh btw weka has a java version of c4.5 (iirc called j45) that has probably much more clean code	22:48
n4nd0	sonney2k: hey! I read before you talked with blackburn about QDA	23:05
n4nd0	sonney2k: I have been reading into it so I could implement it in shogun	23:06
n4nd0	sonney2k: but I am not really sure if I relate what I have read about it what with that you said before	23:06
n4nd0	sonney2k: so it seems that QDA and LDA are similar in that they assume that the feature vectors follow a normal distribution, but LDA assumes that the distributions for all the classes have the same covariances while QDA doesn't make that assumption	23:08
n4nd0	sonney2k: is that right this far?	23:08
@sonney2k	I guess so - at least LDA when cov matrices are considered the same the problem becomes linear	23:17
n4nd0	sonney2k: ok, so I understand that	23:21
n4nd0	sonney2k: but is it then equivalent to use LDA using polynomial features?	23:21
n4nd0	I mean, can we just make polynomial features from the original ones (e.g. if we have at the beginning x1 and x2, we expand the feature vectors so they also contain x1?, x2? and x1?x2)	23:23
n4nd0	would solving that with LDA be equivalent to QDA?	23:23
@sonney2k	n4nd0, it must very close but I am not sure if it is exactly the same	23:26
@sonney2k	best description about LDA/QDA I found is https://onlinecourses.science.psu.edu/stat857/book/export/html/17	23:27
n4nd0	sonney2k: cool, thank you very much, I was using this reference http://www.slac.stanford.edu/cgi-wrap/getdoc/slac-pub-4389.pdf	23:31
n4nd0	I have some trouble when it gets into the regularization part	23:32
@sonney2k	hmmh, seems like QDA / LDA results on quad features differ but it is always mentioned that one can use it to get quadratic classifier ...	23:34
n4nd0	so do you think it would be interesting to add QDA in shogun? and if so how?	23:39
n4nd0	something similar to LDA that is already implemented using regularization?	23:39
-!- wiking [~wiking@huwico/staff/wiking] has quit [Quit: wiking]		23:42
--- Log closed Tue Mar 06 00:00:19 2012

Generated by irclog2html.py 2.10.0 by Marius Gedminas - find it at mg.pov.lt!