--- Log opened Sat Sep 24 00:00:21 2011 | ||
-!- serialhex [~quassel@99-101-148-183.lightspeed.wepbfl.sbcglobal.net] has quit [Ping timeout: 252 seconds] | 00:07 | |
-!- serialhex [~quassel@99-101-148-183.lightspeed.wepbfl.sbcglobal.net] has joined #shogun | 00:15 | |
-!- blackburn [~blackburn@31.28.44.65] has joined #shogun | 12:13 | |
blackburn | sonney2k: can it be wrong includes or so? | 12:39 |
---|---|---|
blackburn | about arpack dsymv | 12:39 |
@sonney2k | blackburn, I think we should first check if his LDA really works | 19:30 |
@sonney2k | and what his configure output is | 19:30 |
blackburn | sonney2k: he tested LDA and all good | 19:32 |
@sonney2k | he didn't say that | 19:32 |
blackburn | with example from python_modular | 19:32 |
blackburn | he told me in private haha | 19:32 |
blackburn | The lda example (shogun/examples/documented/python_modular/classifier_lda_modular.py) seems to work. I had to change the data path to add toy. | 19:32 |
blackburn | I.e. ../data/toy/fm_train_real.dat etc | 19:32 |
blackburn | sonney2k: the only difference I noticed - I was including cblas in arpack.h and arpack.cpp | 19:34 |
blackburn | while it is included in lapack.h | 19:36 |
blackburn | it is not necessary | 19:36 |
CIA-3 | shogun: Sergey Lisitsyn master * r826ff44 / (2 files): Removed unnecessary includes in arpack.{cpp,h} - http://git.io/36Q4Cg | 19:36 |
blackburn | and removed in commit ^ | 19:36 |
@sonney2k | that all doesn't make sense | 19:52 |
@sonney2k | I will ask cheng again | 19:52 |
CIA-3 | shogun: Soeren Sonnenburg master * rbe97a43 / (3 files in 3 dirs): fix mixup of epsilon / tube epsilon in libsvr examples - http://git.io/Ty7s-A | 19:57 |
blackburn | sonney2k: we've got at least one commit per day during Sep 16 - today | 19:59 |
@sonney2k | and we should keep it like this | 20:00 |
blackburn | pace many libs don't have :) | 20:00 |
blackburn | sonney2k: what is the tube epsilon? | 20:05 |
@sonney2k | the epsilon tube for the epsilon insensitve loss in support vectore regression | 20:05 |
blackburn | oookk | 20:06 |
blackburn | new algo to go into shogun in 10 minutes! | 20:25 |
blackburn | :D | 20:25 |
CIA-3 | shogun: Sergey Lisitsyn master * r3308301 / (6 files in 3 dirs): Introduced DiffusionMaps dimension reduction preprocessor - http://git.io/5M4CKw | 20:45 |
blackburn | sonney2k: vodka! | 20:46 |
@sonney2k | heh | 20:46 |
@sonney2k | blackburn, btw which dim red algo would you recommend to visualize data :) | 20:46 |
blackburn | sonney2k: depends what is the data | 20:47 |
@sonney2k | some real-valued inputs | 20:47 |
@sonney2k | not much more prior knowledge really - hard to classify | 20:47 |
blackburn | if there could be any underlying manifold - you can try LTSA | 20:48 |
blackburn | it is pretty fast and robust | 20:48 |
blackburn | well but MDS/Isomap would be useful too | 20:48 |
@sonney2k | any thoughts on PCA / kPCA - shouldn't I do these first? | 20:49 |
blackburn | why not :) | 20:49 |
blackburn | and we have kernel PCA, kernel LLE, kernel diffusion maps | 20:50 |
@sonney2k | thx | 20:51 |
@sonney2k | blackburn, btw recently someone here on IRC asked me about what our feature roadmap for shogun is | 20:51 |
@sonney2k | I am running a bit out of ideas what we want to focus on | 20:51 |
blackburn | I'm still focused on dim reduction | 20:51 |
blackburn | no idea what is your focused on :) | 20:51 |
@sonney2k | shogun does a lot of stuff nowadays - so it is really not clear to me how to really improve it | 20:52 |
@sonney2k | some more dim red methods are not really 'the big picture' | 20:52 |
@sonney2k | and I also only have small things, like parallelize more code, cleanups, model selection w/ nice syntax etc | 20:52 |
blackburn | sonney2k: I don't know about my plans on the long run | 20:53 |
blackburn | sonney2k: http://www.kongregate.com/games/banthar/hell-tetris | 20:54 |
@sonney2k | blackburn, for example features like massive parallel or mpi or neural networks or gaussian processes or wahtever | 20:56 |
@sonney2k | blackburn, ^game is this even possible | 20:56 |
blackburn | sonney2k: oh I hate neural networks :D | 20:57 |
blackburn | no idea if it is possible | 20:57 |
blackburn | I'm afraid I can't plan such a grand new features | 20:57 |
@sonney2k | blackburn, then it is not likely that you impl. them :D | 20:57 |
@sonney2k | for gsoc next year (if we want to participate) we need to | 20:57 |
blackburn | gaussian processes is something chris like very much :) | 20:58 |
blackburn | sonney2k: I was thinking about high performance computing things but I don't know if it is even possible to do | 20:59 |
blackburn | without serious architecture changes, etc | 21:00 |
@sonney2k | blackburn, the problem with GPs is that it needs some matrix inverse (so the standard alg's are n^3) | 21:00 |
@sonney2k | and you need s.o. really deep into it | 21:00 |
@sonney2k | (I am not) | 21:00 |
blackburn | sonney2k: most of dimreduction algos are n^3 :) | 21:01 |
@sonney2k | hpc stuff is sth you cannot do in general for all of shogun | 21:01 |
@sonney2k | so this is sth special for certain algos | 21:01 |
@sonney2k | blackburn, thinking about it - I guess I would most prefer to develop shogun in two ways | 21:02 |
@sonney2k | 1) large scale / hpc stuff of whatever kind | 21:02 |
@sonney2k | 2) breadth - many ml baseline algorithms (not necessarily fast) | 21:02 |
@sonney2k | so one always has some baseline to play with | 21:03 |
blackburn | 1) is preferrable for me but | 21:03 |
@sonney2k | lets call it the 'hammer' | 21:03 |
@sonney2k | and then if one knows what the baseline is can do 1) stuff on top of it | 21:03 |
blackburn | I guess not MPI, but OpenCL, etc | 21:04 |
blackburn | sonney2k: I've lost an idea. what to call 'hammer'? | 21:04 |
@sonney2k | I wouldn't want to get each and every algorithm in there - but maybe only the most successful ones | 21:04 |
@sonney2k | hammer == 2) | 21:04 |
@sonney2k | blackburn, one gsoc project could be some kind of opencv interfacing | 21:05 |
blackburn | opencv? | 21:05 |
@sonney2k | the big computer vision lib | 21:05 |
blackburn | I know | 21:05 |
blackburn | but surprised | 21:05 |
@sonney2k | why? | 21:05 |
blackburn | they have they own impls | 21:05 |
blackburn | of some algos | 21:05 |
@sonney2k | one could get features from opencv and do some training on top of it | 21:05 |
@sonney2k | with some nice example | 21:06 |
blackburn | sonney2k: I guess it would be better to have shogun in opencv | 21:06 |
blackburn | you could ask opencv guys about it | 21:07 |
@sonney2k | blackburn ? | 21:07 |
@sonney2k | I don't understand | 21:07 |
@sonney2k | shogun in opencv? | 21:07 |
@sonney2k | what does that mean | 21:07 |
blackburn | I mean it would be nice to become a machine learning library for opencv | 21:07 |
@sonney2k | I think you can already use it for that purpose | 21:08 |
blackburn | sonney2k: but have a nice interface in opencv to shogun would be nice | 21:09 |
blackburn | in more transparent way or so | 21:09 |
@sonney2k | I have no idea what that could be - I mean opencv produces any kind of feature representation one can think of | 21:10 |
@sonney2k | so using that representation + some shogun algo would work already | 21:10 |
blackburn | sonney2k: that would be nice to treat OpenCV images as features of shogun somehow | 21:11 |
blackburn | no idea about specific ways to do it | 21:12 |
@sonney2k | blackburn, well I will meet gary at the gsoc mentors meeting - I will ask him | 21:12 |
blackburn | sonney2k: will you go to mentors meeting? | 21:12 |
@sonney2k | yes | 21:12 |
blackburn | nice | 21:12 |
blackburn | chris too? | 21:12 |
@sonney2k | yes us two | 21:12 |
blackburn | I see | 21:13 |
blackburn | sonney2k: will normalization of this kind: | 21:13 |
blackburn | X = X - min(X(:)); | 21:13 |
blackburn | X = X / max(X(:)); | 21:13 |
blackburn | change the kernel matrix? | 21:13 |
blackburn | *gaussian kernel | 21:13 |
@sonney2k | yes | 21:13 |
blackburn | how significant? | 21:14 |
@sonney2k | wait first one not | 21:14 |
@sonney2k | translation invaraint but not scale | 21:14 |
@sonney2k | because you need to rescale kernel width | 21:14 |
blackburn | aha, I see | 21:14 |
blackburn | I guess it would be better to normalize features too | 21:15 |
@sonney2k | blackburn, so anyone else I should talk to at the mentors meeting? | 21:17 |
@sonney2k | the orange guys maybe? | 21:18 |
@sonney2k | such that they could somehow reuse what we provide in shogun in their guy? | 21:18 |
@sonney2k | gui | 21:18 |
blackburn | sonney2k: well sure, ask if they want to collaborate | 21:19 |
@sonney2k | anyone else you could think of? | 21:19 |
blackburn | no idea | 21:19 |
blackburn | sonney2k: and no idea how to collaborate with scikits guys | 21:21 |
@sonney2k | blackburn, well I have one idea - we could provide some interface functions if it helps them to use our methods | 21:22 |
@sonney2k | so only interfacing | 21:22 |
@sonney2k | no more | 21:22 |
blackburn | sonney2k: they have as much as we have | 21:22 |
@sonney2k | they have some other things | 21:23 |
@sonney2k | too | 21:23 |
blackburn | e.g. they have gaussian processes | 21:23 |
@sonney2k | but not really large scale and only python | 21:23 |
@sonney2k | yes | 21:23 |
blackburn | I think it is not the way they would like | 21:23 |
blackburn | it will make things more complex | 21:23 |
blackburn | while orange core is developed in C++ it would be useful to have some bridge | 21:24 |
@sonney2k | yeah, they have different focus | 21:24 |
blackburn | I will take a look how they do the decomposition | 21:24 |
blackburn | aha I see | 21:25 |
@sonney2k | I think we should implement the array interface for shogun features http://docs.scipy.org/doc/numpy/reference/arrays.interface.html | 21:25 |
blackburn | sonney2k: agree | 21:25 |
blackburn | sonney2k: okay you definitely could ask orange if they want to collaborate | 21:26 |
blackburn | they have svms and some classifiers | 21:26 |
blackburn | but at least they haven't any of fastest C++ dim reduction preprocessors :D | 21:27 |
@sonney2k | blackburn, btw one nice addition would be boosting algorithms | 21:28 |
@sonney2k | http://mloss.org/software/view/246/ | 21:28 |
blackburn | sonney2k: how can we integrate that? | 21:28 |
blackburn | with code or interfacing? | 21:28 |
@sonney2k | use their code | 21:29 |
@sonney2k | and modify it for our purposes | 21:29 |
blackburn | is it ok? | 21:29 |
@sonney2k | why not? | 21:30 |
@sonney2k | you can always do that | 21:30 |
@sonney2k | it is open source | 21:30 |
@sonney2k | and gpl | 21:30 |
blackburn | sonney2k: I don't know, asking | 21:30 |
@sonney2k | blackburn, anyone can use code from shogun for their purpose and release the software under gpl terms | 21:32 |
blackburn | sonney2k: I know but I don't like this way of develop | 21:33 |
blackburn | I know it is not possible to interface to any library we want to have to | 21:33 |
blackburn | but simply don't like :) | 21:33 |
@sonney2k | blackburn, true - but I spend like a month or so discussing with the MB guys and we simply have different ideas of how things should work | 21:34 |
blackburn | I see | 21:34 |
@sonney2k | I even merged multiboost at some stage | 21:34 |
@sonney2k | wrote swig wrappers etc | 21:34 |
@sonney2k | but they do a lot of things very differently and the project is big too | 21:35 |
@sonney2k | so in the end I gave up | 21:35 |
@sonney2k | (due to lack of time to pursue this way to involved endeavor) | 21:36 |
blackburn | bad | 21:36 |
@sonney2k | not bad good | 21:42 |
@sonney2k | ! | 21:42 |
blackburn | :) | 21:42 |
@sonney2k | this way we made some progress instead of wasting time for endless communication :) | 21:43 |
blackburn | sonney2k: do you know what is faster: computing svd of A OR AA' and computing of eigenvectors? | 21:44 |
@sonney2k | no idea | 21:45 |
@sonney2k | what is the complexity of svd / eig ? | 21:45 |
blackburn | don't know, I'm worried about AA' step | 21:46 |
blackburn | it is n^3 | 21:46 |
blackburn | but SVD for 3000x3000 took 236s | 21:46 |
blackburn | too bad for shogun :) | 21:46 |
@sonney2k | blackburn, if we had numpy array interface compatibility one could do things like | 21:50 |
@sonney2k | x=RealFeatures(sth) | 21:50 |
@sonney2k | x+=3 | 21:50 |
blackburn | sonney2k: fantastic | 21:50 |
blackburn | we definetely should have it | 21:50 |
@sonney2k | and even any normal numpy operations | 21:50 |
@sonney2k | it is very easy to do | 21:50 |
@sonney2k | we only need to prove a dict called __array_interface__ | 21:51 |
@sonney2k | with these fields filled | 21:51 |
@sonney2k | http://docs.scipy.org/doc/numpy/reference/arrays.interface.html#__array_interface__ | 21:51 |
@sonney2k | I guess that is what is needed to directly work with scikits.learn | 21:54 |
blackburn | how? | 21:55 |
blackburn | oops.. | 21:57 |
blackburn | sonney2k: would you mind to place dimreduction techniques in another folder/module? | 21:58 |
@sonney2k | which and do would it communicate with preprocessors? | 21:59 |
blackburn | I had some idea but I forgot | 22:00 |
blackburn | ah | 22:00 |
blackburn | there could be a Machine for this purposes | 22:00 |
blackburn | and some Preprocessor proxy | 22:00 |
* sonney2k starts to implement the array interface | 22:00 | |
blackburn | I don't know if it is better hmm | 22:00 |
serialhex | blackburn: drive by raspberry!!! :P | 22:45 |
blackburn | serialhex: hi | 22:46 |
blackburn | :) | 22:46 |
serialhex | how are you?? | 22:46 |
blackburn | fine | 22:52 |
blackburn | and you? | 22:52 |
blackburn | shit, still slow | 23:06 |
blackburn | sonney2k: AA' + eigenvectors is faster in practice | 23:10 |
@sonney2k | ok | 23:11 |
blackburn | ~260s vs ~60s | 23:11 |
CIA-3 | shogun: Sergey Lisitsyn master * r544c920 / src/shogun/preprocessor/DiffusionMaps.cpp : Improved performance and fixed Diffusion Maps - http://git.io/W3e1wg | 23:12 |
blackburn | sonney2k: one unresolved problem still: how to use preprocessors that are possible to apply both to strings and simplefeatures | 23:16 |
blackburn | in kernel LLE I did it with returning new feature matrix if given features arenot simple | 23:17 |
@sonney2k | that is not what preprocs where intended for | 23:23 |
blackburn | sonney2k: I know, but it is really useful when embedding strings into euclidean space | 23:24 |
@sonney2k | a workaround would be to introduce obtain_from_functions that do get type X as argument and return type Y | 23:24 |
blackburn | ehh? | 23:24 |
@sonney2k | btw, the array interface should no longer be used but instead http://docs.python.org/dev/c-api/buffer.html#Py_buffer | 23:25 |
@sonney2k | CSimpleFeatures<float64_t> obtain_from_string(CStringFeatures<char>) | 23:26 |
@sonney2k | or so | 23:26 |
@sonney2k | or instead of CStringFeatures* just CKernel | 23:26 |
blackburn | apply_to_string_features? | 23:26 |
blackburn | no, kernel is worse | 23:26 |
@sonney2k | not necessarily | 23:27 |
blackburn | because every dimreduction preprocessor have its own kernel | 23:27 |
@sonney2k | it encapsulates the feature type | 23:27 |
@sonney2k | for example for kCPA it would be ok | 23:27 |
@sonney2k | kPCA | 23:27 |
blackburn | there is already apply_to_string_features for kPCA | 23:27 |
@sonney2k | yeah but there is not apply_to_sparse_features and not apply_to_whatever | 23:29 |
blackburn | hmm yes | 23:29 |
blackburn | obtain_features | 23:29 |
@sonney2k | _from_generic_kernel | 23:29 |
blackburn | I would even add this method to dimreductionpreprocessor interface | 23:29 |
@sonney2k | I don't know but I need to sleep now | 23:30 |
@sonney2k | cu | 23:30 |
blackburn | see you | 23:30 |
--- Log closed Sun Sep 25 00:00:25 2011 |
Generated by irclog2html.py 2.10.0 by Marius Gedminas - find it at mg.pov.lt!