IRC logs of #shogun for Tuesday, 2012-02-14

--- Log opened Tue Feb 14 00:00:19 2012
-!- blackburn [~qdrgsm@188.168.4.209] has quit [Quit: Leaving.]		02:21
-!- dfrx [~f-x@inet-hqmc07-o.oracle.com] has joined #shogun		05:39
-!- n4nd0 [~n4nd0@s83-179-44-135.cust.tele2.se] has joined #shogun		07:47
-!- n4nd0 [~n4nd0@s83-179-44-135.cust.tele2.se] has quit [Quit: Leaving]		07:53
-!- n4nd0 [~nando@s83-179-44-135.cust.tele2.se] has joined #shogun		07:53
-!- n4nd0 [~nando@s83-179-44-135.cust.tele2.se] has quit [Quit: leaving]		09:27
-!- karlnapf [~heiko@host86-180-120-146.range86-180.btcentralplus.com] has joined #shogun		11:43
karlnapf	sonney2k, around?	11:44
sonne\|work	karlnapf: yes	11:48
karlnapf	sonne\|work, hi, nice to see that you are still alive :) hope everything is going well. Do you have a minute?	11:49
sonne\|work	we will soon go for lunch but before that yes	11:49
sonne\|work	(and afterwards too)	11:50
karlnapf	ok then quick:	11:50
karlnapf	its currently not possible to do modelselection with custom kernels	11:50
sonne\|work	that is true	11:50
karlnapf	since the splitting is done on the features	11:50
sonne\|work	ok	11:51
karlnapf	And I think the only way to do this is to be possible to specify indices for training	11:51
karlnapf	(and applying)	11:51
karlnapf	but for this, the data has to stay fixed	11:51
sonne\|work	true	11:51
karlnapf	with the current apply() methods this does not work	11:52
karlnapf	since apply() works an all features, apply(CFeatures) changes the features	11:52
karlnapf	same thing for train	11:52
sonne\|work	but if you set the subset before it would -right?	11:52
karlnapf	subset on features?	11:52
sonne\|work	yes	11:52
karlnapf	but the custom kernel has no features	11:53
karlnapf	The values it returns are not based on features	11:53
sonne\|work	it has DummyFeatures	11:53
karlnapf	I know but the way it returns the value does not involve these	11:54
sonne\|work	(a NOP just returning #lhs / #rhs)	11:54
karlnapf	I worked around this in my repo by setting a subse not to the features but to the kernel	11:54
sonne\|work	but one could change this right?	11:54
sonne\|work	argh	11:54
sonne\|work	the others leave just now	11:54
karlnapf	oh	11:55
karlnapf	ok :)	11:55
sonne\|work	so hope you are still around in 1 hour?	11:55
karlnapf	well lets continue later then :)	11:55
karlnapf	probably, but not completely sure	11:55
karlnapf	see you then	11:55
-!- dfrx [~f-x@inet-hqmc07-o.oracle.com] has left #shogun []		11:59
-!- n4nd0 [~n4nd0@n145-p102.kthopen.kth.se] has joined #shogun		12:01
-!- Netsplit .net <-> .split quits: @sonney2k, CIA-18, shogun-buildbot_		12:32
-!- Netsplit over, joins: shogun-buildbot_, CIA-18, @sonney2k		12:32
sonne\|work	karlnapf: Re	13:04
CIA-18	shogun: Soeren Sonnenburg master * rcda7657 / src/shogun/mathematics/Math.h : minor source code beautification - http://git.io/zLy5Kw	13:12
CIA-18	shogun: Soeren Sonnenburg master * ra044c79 / src/shogun/lib/DataType.h : introduce clone() function to SGVector - http://git.io/tclRpQ	13:12
CIA-18	shogun: Soeren Sonnenburg master * r107e9f9 / (src/shogun/features/Labels.cpp src/shogun/features/Labels.h):	13:12
CIA-18	shogun: Remove m_num_classes from CLabels and change get_num_(unique,)classes()	13:12
CIA-18	shogun: This adds a more efficient mechanism to determine the number of classes.	13:12
CIA-18	shogun: Instead of adding labels to a set, perform a sort + unique count. This	13:12
CIA-18	shogun: fixes an issue when using labels for regression (and crazily slow	13:12
CIA-18	shogun: behaviour). - http://git.io/xR86Sw	13:12
CIA-18	shogun: Soeren Sonnenburg master * r7cf941e / (2 files): fix warning in multiclass accuracy - http://git.io/Lqmb0Q	13:12
karlnapf	sonne\|work hi	13:15
sonne\|work	karlnapf: hi :)	13:16
karlnapf	so, where were we?	13:16
karlnapf	so your point is to modify the customKernel so that it somehow maps the subset indices of the underlying features	13:17
karlnapf	then modelselection would work with it	13:17
karlnapf	Still it would be nice when you could tell the x-val to reuse a kernel matrix in different runs	13:19
karlnapf	but the problem is that it only has a CMachine reference	13:19
karlnapf	so no kernels at all there.	13:19
karlnapf	sonne\|work dont you think this might be handy? Then one wouldnt have to precompute all the matrices duing model selection by hand before but just pass a flag or so	13:26
-!- n4nd0 [~n4nd0@n145-p102.kthopen.kth.se] has quit [Quit: Leaving]		13:27
-!- nando [~nando@n145-p102.kthopen.kth.se] has joined #shogun		13:27
-!- nando is now known as n4nd0		13:28
sonne\|work	karlnapf: sorry got interrupted	13:37
karlnapf	np	13:37
karlnapf	I have to leave in about 20 min, if you are busy till then lets just discuss by mail/github	13:39
sonne\|work	karlnapf: I guess from the user perspective it is	13:39
sonne\|work	all a user wants is give the thing data get result as fast as possible - do whatever necessary inbetween	13:40
sonne\|work	so this means precompute matrix / re-use kernel cache etc	13:40
karlnapf	yes	13:41
sonne\|work	problem is that it makes things very complex	13:42
sonne\|work	karlnapf: Don't you think that this becomes a bit too tough to control?	13:42
karlnapf	how do you mean that?	13:42
karlnapf	I mean all that has to be done is to replace the current kernel by a custom kernel with precomputed matrix and restore afterwards	13:43
karlnapf	the rest of the framework just uses the kernel as normal and does not even know thats its now a custom kernel	13:43
karlnapf	I mean you are right this adds complexity	13:44
sonne\|work	yeah but what happens if the training process is interrupted?	13:44
sonne\|work	ctrl +c	13:44
karlnapf	and then?	13:44
sonne\|work	then the object has a customkernel assigned to it	13:45
karlnapf	Oh, yes, but if this locking would be used	13:45
sonne\|work	not the one it had before	13:45
-!- wiking [~wiking@huwico/staff/wiking] has quit [Read error: Connection reset by peer]		13:45
-!- wiking [~wiking@huwico/staff/wiking] has joined #shogun		13:45
karlnapf	it should not be possible to do any of the normal stuff if the machine is locked	13:45
karlnapf	and unlocking restores the old configuration	13:46
sonne\|work	I understand hmmhh. so lets summarize the benefits	13:46
karlnapf	for example train is only implemented in CMachine	13:46
karlnapf	there could be a check	13:46
sonne\|work	hmmhh and the kernel machine could do the kernel precomputing (if e.g. < 5000 examples / sufficiently many kernel matrices need computation)	13:47
sonne\|work	otherwise attempt kernel matrix cache re-use	13:47
karlnapf	I dont have too much overview in the kernel cache	13:49
karlnapf	but yes, the precomputation is done in KernelMachine	13:49
karlnapf	other machines could do different things (dont have a case in mind yet)	13:49
karlnapf	This also automatically makes x-val over custom kernels possible	13:50
karlnapf	because you can simply create a custom kernel from a custom kernel	13:50
sonne\|work	but this requires twice the memory then right?	13:51
karlnapf	no just use the same	13:51
sonne\|work	ahh you get just the ptr for custom kernel right? not a copy	13:51
sonne\|work	?	13:51
karlnapf	yes	13:51
karlnapf	BTW do you know whats this about with the float32 matrices in custom kernel?	13:51
karlnapf	because that causes problems with the above procedure	13:52
sonne\|work	I mean get_kernel_matrix returns the copy	13:52
sonne\|work	ohh	13:52
sonne\|work	for efficiency	13:52
karlnapf	k	13:52
sonne\|work	customkernel tries to save memory	13:52
karlnapf	well CustomKernel would have to override the get_kernel_matrix method	13:52
sonne\|work	that is why float instead of double	13:52
karlnapf	ok	13:52
sonne\|work	so we need a copy too then :(	13:52
karlnapf	no	13:52
karlnapf	the CustomKernel can be created from a float32	13:53
sonne\|work	yeah but when you do get kernel matrix?	13:53
sonne\|work	ok you could get a float32 variant	13:53
sonne\|work	should be ok I guess	13:53
karlnapf	yes, also the get_kernel_matrix is only called from a CustomKernel who wants to have the same matrix	13:53
karlnapf	no calling of this method from outside	13:53
karlnapf	if it is, a copy has to be returned, but this makes not really sense	13:54
karlnapf	I added this method to CustomKernel	13:54
karlnapf	SGMatrix<float32_t> get_kernel_matrix()	13:55
karlnapf	{	13:55
karlnapf	return kmatrix;	13:55
karlnapf	}	13:55
sonne\|work	I see	13:55
karlnapf	then when the Constructor CCustomKernel(CKernel*) is used his is called	13:55
sonne\|work	for customkernels...	13:55
karlnapf	only problem is with the free in destrucor because we dont want to second custom kernel to delete the oiginal matrix	13:55
karlnapf	yes, this is just to make the x-val work on custom kernels	13:56
sonne\|work	argh yes	13:56
sonne\|work	I wonder if it was the right decision to not have reference counts for SGVector/matrix objects	13:57
sonne\|work	instead of the do_free flag	13:57
karlnapf	yes the counting would be nice here :)	13:57
sonne\|work	there are other places where this is annoying	13:57
karlnapf	I dont like the do_free too much since I never know whether its used clean, so I tend to use the destroy_vector method and make sure that I know whats going on (where the vector comes from etc)	13:58
karlnapf	but anyway	13:58
karlnapf	the matrix could only be deleted if a flag is set or so	13:59
sonne\|work	which means we should replace it by ref counts	13:59
karlnapf	Why dont you use a matrix class btw?	13:59
sonne\|work	so the current x-val will call CMethod.data_lock ?	13:59
karlnapf	no	14:00
sonne\|work	you mean sth else instead of SGMatrix?	14:00
karlnapf	yes	14:00
sonne\|work	who has to lock the data?	14:00
karlnapf	two possibilities:	14:00
sonne\|work	shouldn't x-val do it?	14:00
karlnapf	no because sometimes you want to do multiple x-vals on same kernel (for example for C)	14:00
karlnapf	so either the grid-search class does it for everxy parameter change	14:01
sonne\|work	Re SGMatrix - this is only meant as a simple struct to store the data plus some minor helpers no real matrix class (we use blas etc for that underneath whereever possible)	14:01
karlnapf	or, the user (optionally) if he only wants to to x-val	14:01
karlnapf	ah ok	14:01
karlnapf	xval just check whether data is locked and then uses the corresponding train/apply methods	14:02
sonne\|work	but the missing reference count is really a design flaw	14:02
karlnapf	yes would be nice to have but I guess this is a lot of work	14:02
sonne\|work	not too much but I don't have any time :(	14:03
sonne\|work	so is it best to do it in grid search?	14:03
karlnapf	yes, I think so	14:03
karlnapf	one could even only do it if a kernel parameter has changed, but thats too complicated for now I think	14:03
sonne\|work	I mean it would be nice to do the locking automagically (optionally forced on/off of course (some enum MODSEL_LOCK_AUTO / OFF / ON)	14:04
karlnapf	yes	14:04
karlnapf	that would be nice	14:04
karlnapf	for example if train_locked is called, its locked automatically	14:04
karlnapf	but the problem is	14:04
karlnapf	that the unlocking has to be done by hand	14:04
sonne\|work	nahh	14:04
karlnapf	because machine does not know when something has changed	14:04
sonne\|work	doesn't the grid-search on top know?	14:05
karlnapf	yes	14:05
karlnapf	it currently does a lock before evaluation and unlock after	14:05
karlnapf	(if the flag was set)	14:05
karlnapf	otherwise it just does the old way	14:05
sonne\|work	so it should set the flag itself	14:06
karlnapf	yes it does	14:06
karlnapf	Gridsearch gets a boolean wheather it should lock, then everythign is done internally	14:06
karlnapf	xval does not	14:07
karlnapf	since user may have locked before	14:07
karlnapf	if you just want to evaluate a machine you lock it and then perform xval	14:07
karlnapf	if xval would always lock then you would have double computations	14:07
karlnapf	also:	14:08
karlnapf	imagine you have a kernel which does not change but svm params that do and you want to do a search	14:08
sonne\|work	couldn't xval test if things are locked and only then lock?	14:08
karlnapf	then you lock machine before and then tell grisearch to not lock	14:08
karlnapf	kernel is not recomputed, but machine is locked all he time so its fast	14:08
karlnapf	yes it could	14:09
karlnapf	I mean all kinds of user-friendly stuff could be added for this	14:09
karlnapf	if it would automatically lock if not yet locked	14:09
karlnapf	gridserach would just have to unlokc after each iteration	14:09
sonne\|work	ok - I would want to hide all this locking from the user	14:09
sonne\|work	and only add some property to gridsearch/xval that one can manually set	14:10
sonne\|work	to override our decision	14:10
karlnapf	yes one flag in select_model method	14:10
karlnapf	the old examples still all run without anything changes (except for some signatures)	14:10
karlnapf	if somebody does not know about all this, everything is as it was	14:11
karlnapf	about the hiding the stuff. Since the user has to understand when locking is good when its not, I like that it has to be done manually	14:11
sonne\|work	well we should do the best guess	14:11
karlnapf	for example in the case of ixed kernel search for SVM-C	14:12
karlnapf	The locking would be stupid	14:12
karlnapf	since kernel does not change	14:12
sonne\|work	so for many parameter-combinations / small kernels	14:12
karlnapf	I mean locking in every iteration	14:12
karlnapf	yes	14:12
sonne\|work	yeah but you know this in grid search	14:12
karlnapf	yes, so just lock mannually before once and then tell grid-search to not lock	14:13
sonne\|work	wait - it makes sense there to precompute the kernel	14:13
karlnapf	yes but only once	14:13
sonne\|work	but only once right?	14:13
karlnapf	not for every C	14:13
sonne\|work	yeah but this we need to determine automagically somehow	14:13
karlnapf	how?	14:13
sonne\|work	locking makes only sense if features change	14:14
sonne\|work	so do it once for constant set of features	14:14
karlnapf	the problem is that the modelselection does not distinguish between kernel parameters and machine parameters	14:14
karlnapf	I thought of extracting the parameter combinations where the kernel params are fixed and then only lock for these	14:15
karlnapf	but then you would have to add knowledge about possible subclasses there	14:15
karlnapf	what about other machines, how to they do the locking?	14:15
karlnapf	for example we also got the distance matrices	14:15
karlnapf	Thats why I prefered the manually locking before there	14:16
sonne\|work	seems like we need another flag - parameter changes data representation	14:16
sonne\|work	manual locking is very tough for the user to get right	14:16
sonne\|work	x-val/grid search is already pretty complicated	14:16
sonne\|work	so we should hide that stuff if possible	14:17
karlnapf	In the basic case he does not have to do anything	14:17
karlnapf	I mean for model selection	14:17
karlnapf	only if he uses x-val manually	14:17
karlnapf	and if he wants to save more time while model selection	14:17
karlnapf	the x-val manually case could be done automatically though	14:17
karlnapf	x-val should always (flag) try to lock if its not done yet	14:18
sonne\|work	exactly	14:18
karlnapf	ah, but what about the unlocking	14:18
karlnapf	this cannot be done automatically for xval	14:18
sonne\|work	unlock if it has locked it	14:18
sonne\|work	why not?	14:18
karlnapf	Because of this case where a grid-search was performed on locked data	14:19
karlnapf	then the kernel would always be recomputed	14:19
sonne\|work	yeah but it didn't lock it itself then	14:19
karlnapf	eventhough it does not change	14:19
karlnapf	oh yes	14:19
karlnapf	Should be ok then	14:20
sonne\|work	x-val just stores that it has to unlock later	14:20
CIA-18	shogun: Soeren Sonnenburg master * r3649f3b / (31 files in 9 dirs):	14:20
CIA-18	shogun: Merge pull request #367 from karlnapf/master	14:20
CIA-18	shogun: A draft for training on fixed kernel matrices/data in general - http://git.io/qv89yQ	14:20
karlnapf	Ok I will change the stuff we talked about soon	14:21
sonne\|work	we don't have to work it out all at once - step by step is ok.	14:21
karlnapf	yes	14:21
karlnapf	I have locally added parallelization of apply_locked	14:21
karlnapf	in KernelMachine	14:22
sonne\|work	I would love to have this much easier nested list way of specifying how grid search has to be done (from the python side)	14:22
karlnapf	yes	14:22
karlnapf	scicit is nice there	14:22
wiking	sonne\|work: just sent you my benchmark results... if you have any better idea how to do the benchmarking let me know...	14:22
sonne\|work	yes - not as powerful but nice	14:22
karlnapf	but the framework itself is not so nice as ours :)	14:22
karlnapf	yes,	14:22
karlnapf	but this could be done for us	14:23
sonne\|work	and after all the most common case is what counts	14:23
karlnapf	Say another thing	14:23
sonne\|work	so if we make it simple for the most common cases	14:23
sonne\|work	things would be good enough	14:23
karlnapf	I try to use shogun for university projects, and I really run across a lot of bugs/segfaults/no error messages	14:23
karlnapf	from python	14:23
karlnapf	for example in MultiClassSVM I found code which never was wrong but not niticed since it never was used before	14:24
karlnapf	so I thought of what about separating the examples and tests	14:24
karlnapf	and add some more tests which try to cover all the code	14:24
karlnapf	at least for new stuff	14:24
sonne\|work	wiking: why don't you just compute a kernel matrix that has say 10000 * 10000 elements - single threaded maybe. then you could directly measure.	14:24
karlnapf	?	14:25
sonne\|work	karlnapf: I used multiclasssvm just fine?!? and so did blackburn	14:25
sonne\|work	what happened?	14:25
karlnapf	use apply(int32_t)	14:25
sonne\|work	karlnapf: we don't have resources in splitting up examples / tests	14:25
sonne\|work	too much work	14:25
wiking	sonne\|work: this seemed to be much faster as this is really just about that function and it's implementation w/o any wrapping stuff...	14:26
sonne\|work	we could have more examples and enable the the tests	14:26
karlnapf	two problems: a) does not set the kernel to svm before applying b) does not initialise the votes vector with zeros so if one class gets zero votes you return uninitialized emory	14:26
sonne\|work	karlnapf: I never used apply(int32_t) (neither blackburn :)	14:26
sonne\|work	pretty simple explanation :)	14:26
karlnapf	thats what I meant with code coverage tests :)	14:27
karlnapf	anyway, I have to go now	14:27
karlnapf	nice discussion, see you later! :)	14:27
sonne\|work	karlnapf: well if we had an example for apply(int)	14:27
sonne\|work	it would have shown the bug in the first place	14:27
sonne\|work	but we don't...	14:27
karlnapf	yes, I will add one :)	14:27
sonne\|work	thx	14:27
sonne\|work	cu	14:27
karlnapf	There are more bugs in MultiClassSVM I am just fixing them cause I need to use it for uni :)	14:27
karlnapf	bye bye!	14:28
sonne\|work	karlnapf: that is how it should be	14:28
sonne\|work	I am doing the same in code you and others touched :D	14:29
karlnapf	ok, thats opensource then :D	14:30
-!- n4nd0 [~nando@n145-p102.kthopen.kth.se] has quit [Quit: leaving]		14:42
-!- karlnapf [~heiko@host86-180-120-146.range86-180.btcentralplus.com] has left #shogun []		14:52
-!- n4nd0 [82ede32a@gateway/web/freenode/ip.130.237.227.42] has joined #shogun		16:12
-!- n4nd0 [82ede32a@gateway/web/freenode/ip.130.237.227.42] has quit [Quit: Page closed]		16:58
CIA-18	shogun: Soeren Sonnenburg master * r3e3db13 / (33 files in 14 dirs):	17:38
CIA-18	shogun: add linear least squares and ridge regression	17:38
CIA-18	shogun: This add linear ridge regression and a convenience class	17:38
CIA-18	shogun: CLeastSquaresRegression calling CLinearRidgeRegression with	17:38
CIA-18	shogun: regularization parameter tau=0. To not cause confusion KRR is	17:38
CIA-18	shogun: renamed to KernelRidgeRegression throughout examples/code. - http://git.io/pQJ0OA	17:38
sonne\|work	wiking: https://gist.github.com/1828072	17:53
wiking	echeckoing	17:53
wiking	ok so checking :)	17:53
sonne\|work	wiking: there really is no difference sometimes one wins sometimes the other bu really a few s	17:53
sonne\|work	wiking: btw you need to use gettimeofday to get highres timings	17:53
wiking	heheheh	17:55
wiking	so tie? :)	17:55
wiking	none of us gets a drink? :P	17:55
sonne\|work	JS1: 134.6803036	17:55
sonne\|work	JS2: 134.7876117	17:55
sonne\|work	here	17:55
wiking	anyhow i was meaning to ask if you guys would be interested still in latent-structural svm implementation..	17:55
sonne\|work	but when I run it a couple of times it might very well be vise versa	17:55
wiking	well i guess that's when the pipeline kicks in	17:57
sonne\|work	wiking: problem might be that alex has no time to mentor...	17:57
sonne\|work	I need to work on that gsoc ideas list (have already a couple of submissions from mentors) ... structured output learning / multiclass / optimization framework would be what I would want to see this year but I am not supermentoring this year :)	17:59
sonne\|work	anyway gtg	17:59
sonne\|work	cu	17:59
wiking	ca	18:00
wiking	cya	18:00
-!- Netsplit .net <-> .split quits: @sonney2k, shogun-buildbot_, CIA-18		18:55
-!- Netsplit over, joins: CIA-18, @sonney2k		18:56
-!- shogun-t1olbox [~shogun@7nn.de] has quit [Ping timeout: 260 seconds]		19:09
--- Log closed Tue Feb 14 19:09:54 2012
--- Log opened Tue Feb 14 19:10:01 2012
-!- shogun-toolbox [~shogun@7nn.de] has joined #shogun		19:10
-!- Irssi: #shogun: Total of 8 nicks [1 ops, 0 halfops, 0 voices, 7 normal]		19:10
-!- Irssi: Join to #shogun was synced in 7 secs		19:10
-!- Netsplit .net <-> .split quits: shogun-t1olbox		19:15
-!- blackburn1 [~qdrgsm@188.168.4.71] has joined #shogun		19:23
-!- Netsplit .net <-> .split quits: blackburn		19:27
-!- blackburn1 [~qdrgsm@188.168.4.71] has quit [Ping timeout: 248 seconds]		19:44
-!- n4nd0 [~nando@s83-179-44-135.cust.tele2.se] has joined #shogun		20:22
n4nd0	hi there	20:37
n4nd0	so, is it gsoc a topic to start talking about?	20:37
n4nd0	any topics that will be interesting this year? I have checked the ideas on the webpage for gsoc 2011	20:39
shogun-buildbot_	build #149 of nightly_none is complete: Failure [failed compile] Build details are at http://www.shogun-toolbox.org/buildbot/builders/nightly_none/builds/149	21:05
-!- n4nd0 [~nando@s83-179-44-135.cust.tele2.se] has quit [Ping timeout: 240 seconds]		21:08
shogun-buildbot_	build #135 of nightly_default is complete: Failure [failed compile] Build details are at http://www.shogun-toolbox.org/buildbot/builders/nightly_default/builds/135	21:15
shogun-buildbot_	build #148 of nightly_all is complete: Failure [failed compile] Build details are at http://www.shogun-toolbox.org/buildbot/builders/nightly_all/builds/148	21:26
-!- wiking [~wiking@huwico/staff/wiking] has quit [Quit: wiking]		21:42
-!- wiking [~wiking@huwico/staff/wiking] has joined #shogun		21:51
-!- nando [~nando@s83-179-44-135.cust.tele2.se] has joined #shogun		22:11
-!- nando is now known as n4nd0		22:11
-!- wiking [~wiking@huwico/staff/wiking] has quit [Quit: wiking]		23:33
-!- wiking [~wiking@huwico/staff/wiking] has joined #shogun		23:35
-!- blackburn [~qdrgsm@188.168.4.152] has joined #shogun		23:36
--- Log closed Wed Feb 15 00:00:19 2012

Generated by irclog2html.py 2.10.0 by Marius Gedminas - find it at mg.pov.lt!