IRC logs of #shogun for Saturday, 2011-09-24

--- Log opened Sat Sep 24 00:00:21 2011
-!- serialhex [~quassel@99-101-148-183.lightspeed.wepbfl.sbcglobal.net] has quit [Ping timeout: 252 seconds]		00:07
-!- serialhex [~quassel@99-101-148-183.lightspeed.wepbfl.sbcglobal.net] has joined #shogun		00:15
-!- blackburn [~blackburn@31.28.44.65] has joined #shogun		12:13
blackburn	sonney2k: can it be wrong includes or so?	12:39
blackburn	about arpack dsymv	12:39
@sonney2k	blackburn, I think we should first check if his LDA really works	19:30
@sonney2k	and what his configure output is	19:30
blackburn	sonney2k: he tested LDA and all good	19:32
@sonney2k	he didn't say that	19:32
blackburn	with example from python_modular	19:32
blackburn	he told me in private haha	19:32
blackburn	The lda example (shogun/examples/documented/python_modular/classifier_lda_modular.py) seems to work. I had to change the data path to add toy.	19:32
blackburn	I.e. ../data/toy/fm_train_real.dat etc	19:32
blackburn	sonney2k: the only difference I noticed - I was including cblas in arpack.h and arpack.cpp	19:34
blackburn	while it is included in lapack.h	19:36
blackburn	it is not necessary	19:36
CIA-3	shogun: Sergey Lisitsyn master * r826ff44 / (2 files): Removed unnecessary includes in arpack.{cpp,h} - http://git.io/36Q4Cg	19:36
blackburn	and removed in commit ^	19:36
@sonney2k	that all doesn't make sense	19:52
@sonney2k	I will ask cheng again	19:52
CIA-3	shogun: Soeren Sonnenburg master * rbe97a43 / (3 files in 3 dirs): fix mixup of epsilon / tube epsilon in libsvr examples - http://git.io/Ty7s-A	19:57
blackburn	sonney2k: we've got at least one commit per day during Sep 16 - today	19:59
@sonney2k	and we should keep it like this	20:00
blackburn	pace many libs don't have :)	20:00
blackburn	sonney2k: what is the tube epsilon?	20:05
@sonney2k	the epsilon tube for the epsilon insensitve loss in support vectore regression	20:05
blackburn	oookk	20:06
blackburn	new algo to go into shogun in 10 minutes!	20:25
blackburn	:D	20:25
CIA-3	shogun: Sergey Lisitsyn master * r3308301 / (6 files in 3 dirs): Introduced DiffusionMaps dimension reduction preprocessor - http://git.io/5M4CKw	20:45
blackburn	sonney2k: vodka!	20:46
@sonney2k	heh	20:46
@sonney2k	blackburn, btw which dim red algo would you recommend to visualize data :)	20:46
blackburn	sonney2k: depends what is the data	20:47
@sonney2k	some real-valued inputs	20:47
@sonney2k	not much more prior knowledge really - hard to classify	20:47
blackburn	if there could be any underlying manifold - you can try LTSA	20:48
blackburn	it is pretty fast and robust	20:48
blackburn	well but MDS/Isomap would be useful too	20:48
@sonney2k	any thoughts on PCA / kPCA - shouldn't I do these first?	20:49
blackburn	why not :)	20:49
blackburn	and we have kernel PCA, kernel LLE, kernel diffusion maps	20:50
@sonney2k	thx	20:51
@sonney2k	blackburn, btw recently someone here on IRC asked me about what our feature roadmap for shogun is	20:51
@sonney2k	I am running a bit out of ideas what we want to focus on	20:51
blackburn	I'm still focused on dim reduction	20:51
blackburn	no idea what is your focused on :)	20:51
@sonney2k	shogun does a lot of stuff nowadays - so it is really not clear to me how to really improve it	20:52
@sonney2k	some more dim red methods are not really 'the big picture'	20:52
@sonney2k	and I also only have small things, like parallelize more code, cleanups, model selection w/ nice syntax etc	20:52
blackburn	sonney2k: I don't know about my plans on the long run	20:53
blackburn	sonney2k: http://www.kongregate.com/games/banthar/hell-tetris	20:54
@sonney2k	blackburn, for example features like massive parallel or mpi or neural networks or gaussian processes or wahtever	20:56
@sonney2k	blackburn, ^game is this even possible	20:56
blackburn	sonney2k: oh I hate neural networks :D	20:57
blackburn	no idea if it is possible	20:57
blackburn	I'm afraid I can't plan such a grand new features	20:57
@sonney2k	blackburn, then it is not likely that you impl. them :D	20:57
@sonney2k	for gsoc next year (if we want to participate) we need to	20:57
blackburn	gaussian processes is something chris like very much :)	20:58
blackburn	sonney2k: I was thinking about high performance computing things but I don't know if it is even possible to do	20:59
blackburn	without serious architecture changes, etc	21:00
@sonney2k	blackburn, the problem with GPs is that it needs some matrix inverse (so the standard alg's are n^3)	21:00
@sonney2k	and you need s.o. really deep into it	21:00
@sonney2k	(I am not)	21:00
blackburn	sonney2k: most of dimreduction algos are n^3 :)	21:01
@sonney2k	hpc stuff is sth you cannot do in general for all of shogun	21:01
@sonney2k	so this is sth special for certain algos	21:01
@sonney2k	blackburn, thinking about it - I guess I would most prefer to develop shogun in two ways	21:02
@sonney2k	1) large scale / hpc stuff of whatever kind	21:02
@sonney2k	2) breadth - many ml baseline algorithms (not necessarily fast)	21:02
@sonney2k	so one always has some baseline to play with	21:03
blackburn	1) is preferrable for me but	21:03
@sonney2k	lets call it the 'hammer'	21:03
@sonney2k	and then if one knows what the baseline is can do 1) stuff on top of it	21:03
blackburn	I guess not MPI, but OpenCL, etc	21:04
blackburn	sonney2k: I've lost an idea. what to call 'hammer'?	21:04
@sonney2k	I wouldn't want to get each and every algorithm in there - but maybe only the most successful ones	21:04
@sonney2k	hammer == 2)	21:04
@sonney2k	blackburn, one gsoc project could be some kind of opencv interfacing	21:05
blackburn	opencv?	21:05
@sonney2k	the big computer vision lib	21:05
blackburn	I know	21:05
blackburn	but surprised	21:05
@sonney2k	why?	21:05
blackburn	they have they own impls	21:05
blackburn	of some algos	21:05
@sonney2k	one could get features from opencv and do some training on top of it	21:05
@sonney2k	with some nice example	21:06
blackburn	sonney2k: I guess it would be better to have shogun in opencv	21:06
blackburn	you could ask opencv guys about it	21:07
@sonney2k	blackburn ?	21:07
@sonney2k	I don't understand	21:07
@sonney2k	shogun in opencv?	21:07
@sonney2k	what does that mean	21:07
blackburn	I mean it would be nice to become a machine learning library for opencv	21:07
@sonney2k	I think you can already use it for that purpose	21:08
blackburn	sonney2k: but have a nice interface in opencv to shogun would be nice	21:09
blackburn	in more transparent way or so	21:09
@sonney2k	I have no idea what that could be - I mean opencv produces any kind of feature representation one can think of	21:10
@sonney2k	so using that representation + some shogun algo would work already	21:10
blackburn	sonney2k: that would be nice to treat OpenCV images as features of shogun somehow	21:11
blackburn	no idea about specific ways to do it	21:12
@sonney2k	blackburn, well I will meet gary at the gsoc mentors meeting - I will ask him	21:12
blackburn	sonney2k: will you go to mentors meeting?	21:12
@sonney2k	yes	21:12
blackburn	nice	21:12
blackburn	chris too?	21:12
@sonney2k	yes us two	21:12
blackburn	I see	21:13
blackburn	sonney2k: will normalization of this kind:	21:13
blackburn	X = X - min(X(:));	21:13
blackburn	X = X / max(X(:));	21:13
blackburn	change the kernel matrix?	21:13
blackburn	*gaussian kernel	21:13
@sonney2k	yes	21:13
blackburn	how significant?	21:14
@sonney2k	wait first one not	21:14
@sonney2k	translation invaraint but not scale	21:14
@sonney2k	because you need to rescale kernel width	21:14
blackburn	aha, I see	21:14
blackburn	I guess it would be better to normalize features too	21:15
@sonney2k	blackburn, so anyone else I should talk to at the mentors meeting?	21:17
@sonney2k	the orange guys maybe?	21:18
@sonney2k	such that they could somehow reuse what we provide in shogun in their guy?	21:18
@sonney2k	gui	21:18
blackburn	sonney2k: well sure, ask if they want to collaborate	21:19
@sonney2k	anyone else you could think of?	21:19
blackburn	no idea	21:19
blackburn	sonney2k: and no idea how to collaborate with scikits guys	21:21
@sonney2k	blackburn, well I have one idea - we could provide some interface functions if it helps them to use our methods	21:22
@sonney2k	so only interfacing	21:22
@sonney2k	no more	21:22
blackburn	sonney2k: they have as much as we have	21:22
@sonney2k	they have some other things	21:23
@sonney2k	too	21:23
blackburn	e.g. they have gaussian processes	21:23
@sonney2k	but not really large scale and only python	21:23
@sonney2k	yes	21:23
blackburn	I think it is not the way they would like	21:23
blackburn	it will make things more complex	21:23
blackburn	while orange core is developed in C++ it would be useful to have some bridge	21:24
@sonney2k	yeah, they have different focus	21:24
blackburn	I will take a look how they do the decomposition	21:24
blackburn	aha I see	21:25
@sonney2k	I think we should implement the array interface for shogun features http://docs.scipy.org/doc/numpy/reference/arrays.interface.html	21:25
blackburn	sonney2k: agree	21:25
blackburn	sonney2k: okay you definitely could ask orange if they want to collaborate	21:26
blackburn	they have svms and some classifiers	21:26
blackburn	but at least they haven't any of fastest C++ dim reduction preprocessors :D	21:27
@sonney2k	blackburn, btw one nice addition would be boosting algorithms	21:28
@sonney2k	http://mloss.org/software/view/246/	21:28
blackburn	sonney2k: how can we integrate that?	21:28
blackburn	with code or interfacing?	21:28
@sonney2k	use their code	21:29
@sonney2k	and modify it for our purposes	21:29
blackburn	is it ok?	21:29
@sonney2k	why not?	21:30
@sonney2k	you can always do that	21:30
@sonney2k	it is open source	21:30
@sonney2k	and gpl	21:30
blackburn	sonney2k: I don't know, asking	21:30
@sonney2k	blackburn, anyone can use code from shogun for their purpose and release the software under gpl terms	21:32
blackburn	sonney2k: I know but I don't like this way of develop	21:33
blackburn	I know it is not possible to interface to any library we want to have to	21:33
blackburn	but simply don't like :)	21:33
@sonney2k	blackburn, true - but I spend like a month or so discussing with the MB guys and we simply have different ideas of how things should work	21:34
blackburn	I see	21:34
@sonney2k	I even merged multiboost at some stage	21:34
@sonney2k	wrote swig wrappers etc	21:34
@sonney2k	but they do a lot of things very differently and the project is big too	21:35
@sonney2k	so in the end I gave up	21:35
@sonney2k	(due to lack of time to pursue this way to involved endeavor)	21:36
blackburn	bad	21:36
@sonney2k	not bad good	21:42
@sonney2k	!	21:42
blackburn	:)	21:42
@sonney2k	this way we made some progress instead of wasting time for endless communication :)	21:43
blackburn	sonney2k: do you know what is faster: computing svd of A OR AA' and computing of eigenvectors?	21:44
@sonney2k	no idea	21:45
@sonney2k	what is the complexity of svd / eig ?	21:45
blackburn	don't know, I'm worried about AA' step	21:46
blackburn	it is n^3	21:46
blackburn	but SVD for 3000x3000 took 236s	21:46
blackburn	too bad for shogun :)	21:46
@sonney2k	blackburn, if we had numpy array interface compatibility one could do things like	21:50
@sonney2k	x=RealFeatures(sth)	21:50
@sonney2k	x+=3	21:50
blackburn	sonney2k: fantastic	21:50
blackburn	we definetely should have it	21:50
@sonney2k	and even any normal numpy operations	21:50
@sonney2k	it is very easy to do	21:50
@sonney2k	we only need to prove a dict called __array_interface__	21:51
@sonney2k	with these fields filled	21:51
@sonney2k	http://docs.scipy.org/doc/numpy/reference/arrays.interface.html#__array_interface__	21:51
@sonney2k	I guess that is what is needed to directly work with scikits.learn	21:54
blackburn	how?	21:55
blackburn	oops..	21:57
blackburn	sonney2k: would you mind to place dimreduction techniques in another folder/module?	21:58
@sonney2k	which and do would it communicate with preprocessors?	21:59
blackburn	I had some idea but I forgot	22:00
blackburn	ah	22:00
blackburn	there could be a Machine for this purposes	22:00
blackburn	and some Preprocessor proxy	22:00
* sonney2k starts to implement the array interface		22:00
blackburn	I don't know if it is better hmm	22:00
serialhex	blackburn: drive by raspberry!!! :P	22:45
blackburn	serialhex: hi	22:46
blackburn	:)	22:46
serialhex	how are you??	22:46
blackburn	fine	22:52
blackburn	and you?	22:52
blackburn	shit, still slow	23:06
blackburn	sonney2k: AA' + eigenvectors is faster in practice	23:10
@sonney2k	ok	23:11
blackburn	~260s vs ~60s	23:11
CIA-3	shogun: Sergey Lisitsyn master * r544c920 / src/shogun/preprocessor/DiffusionMaps.cpp : Improved performance and fixed Diffusion Maps - http://git.io/W3e1wg	23:12
blackburn	sonney2k: one unresolved problem still: how to use preprocessors that are possible to apply both to strings and simplefeatures	23:16
blackburn	in kernel LLE I did it with returning new feature matrix if given features arenot simple	23:17
@sonney2k	that is not what preprocs where intended for	23:23
blackburn	sonney2k: I know, but it is really useful when embedding strings into euclidean space	23:24
@sonney2k	a workaround would be to introduce obtain_from_functions that do get type X as argument and return type Y	23:24
blackburn	ehh?	23:24
@sonney2k	btw, the array interface should no longer be used but instead http://docs.python.org/dev/c-api/buffer.html#Py_buffer	23:25
@sonney2k	CSimpleFeatures<float64_t> obtain_from_string(CStringFeatures<char>)	23:26
@sonney2k	or so	23:26
@sonney2k	or instead of CStringFeatures* just CKernel	23:26
blackburn	apply_to_string_features?	23:26
blackburn	no, kernel is worse	23:26
@sonney2k	not necessarily	23:27
blackburn	because every dimreduction preprocessor have its own kernel	23:27
@sonney2k	it encapsulates the feature type	23:27
@sonney2k	for example for kCPA it would be ok	23:27
@sonney2k	kPCA	23:27
blackburn	there is already apply_to_string_features for kPCA	23:27
@sonney2k	yeah but there is not apply_to_sparse_features and not apply_to_whatever	23:29
blackburn	hmm yes	23:29
blackburn	obtain_features	23:29
@sonney2k	_from_generic_kernel	23:29
blackburn	I would even add this method to dimreductionpreprocessor interface	23:29
@sonney2k	I don't know but I need to sleep now	23:30
@sonney2k	cu	23:30
blackburn	see you	23:30
--- Log closed Sun Sep 25 00:00:25 2011

Generated by irclog2html.py 2.10.0 by Marius Gedminas - find it at mg.pov.lt!