--- Log opened Thu Apr 28 00:00:01 2011 | ||
--- Day changed Thu Apr 28 2011 | ||
blackburn | serialhex: there is many fields we can work in :) | 00:00 |
---|---|---|
blackburn | there are! | 00:00 |
blackburn | there are | 00:00 |
blackburn | there are | 00:00 |
serialhex | sweet! | 00:00 |
serialhex | it's an interesting article & i reccomend reading it if you have a chance | 00:00 |
blackburn | I will :) | 00:01 |
serialhex | he has a bunch more published papers for free on his site: http://www.fil.ion.ucl.ac.uk/~karl/ | 00:01 |
* serialhex loves free science papers!!! | 00:01 | |
blackburn | bookmarked | 00:01 |
serialhex | i think this will be _really_ cool to implement when the peson who is doing the realtime stuff gets done... then it can do on-line learning and be really spiffy!!! | 00:02 |
serialhex | i have a bunch more stuff on this hiding on my HDD if you like... or you can search for it yourself :D | 00:04 |
blackburn | :) | 00:04 |
blackburn | okay | 00:05 |
* serialhex knows 10^6 more theory than implementation... but still knows very little theory :( | 00:06 | |
* blackburn is angry with java EE :) | 00:07 | |
serialhex | what? the JEE stuff isnt working right?? | 00:08 |
blackburn | now working but it is such pain in ass | 00:09 |
serialhex | yeah, yesterday i went to the college and grabbed a book on designing algorithms, and the only one they had was in java... i'm not happy that i'm going to have to learn java right now :( | 00:10 |
blackburn | java se is good, I like it very much | 00:11 |
* serialhex ; hates; semicolons; after; every; stinking; line; of; code; i; type; | 00:12 | |
blackburn | but EE is not java SE, it is something terrible :) for example one error in SQL query causing to throw ~60-70 exceptions in EJB 2.0 | 00:12 |
serialhex | oooh... thats ugly!!! | 00:12 |
blackburn | but it is enterprise, bla-bla :) | 00:13 |
serialhex | yeah, there's a ruby EE also... supposedly more secure & efficient but it's ruby 1.8.7 (the newest is 1.9.2) | 00:14 |
blackburn | I'm now learning java EE at netcracker corp. | 00:14 |
serialhex | so it's missing some cool new features, but it's the same thing: Enterprise Edition | 00:14 |
serialhex | netcracker corp?? | 00:16 |
blackburn | yeap, they have some courses on java ee here | 00:16 |
serialhex | cool | 00:17 |
blackburn | oh it is 02-22 here | 00:22 |
serialhex | damn... sleep time?? | 00:23 |
blackburn | yeah it is | 00:23 |
serialhex | aiit, g'nite! | 00:23 |
blackburn | see you | 00:24 |
-!- blackburn [~qdrgsm@188.168.2.98] has quit [Quit: Leaving.] | 00:24 | |
-!- f-x [~gen@180.149.49.227] has quit [Quit: leaving] | 00:36 | |
-!- serialhex-10 [~androirc@99.101.149.136] has joined #shogun | 00:49 | |
-!- sploving [~root@124.16.139.196] has left #shogun [] | 03:34 | |
-!- serialhex [~quassel@99-101-149-136.lightspeed.wepbfl.sbcglobal.net] has quit [Remote host closed the connection] | 04:02 | |
-!- serialhex [~quassel@99-101-149-136.lightspeed.wepbfl.sbcglobal.net] has joined #shogun | 04:04 | |
CIA-90 | shogun: Soeren Sonnenburg master * r479c5cd / src/modular/SGBase.i : comment init code for SWIGR - http://bit.ly/isxdMf | 04:59 |
serialhex | sonney2k: isnt it a little early for you to be making commits?? | 05:10 |
@sonney2k | serialhex, too late to complain | 05:10 |
serialhex | ahh, ok :D | 05:11 |
@sonney2k | awake now ... and yay it really compiles for all languages now | 05:11 |
serialhex | your kid have you awake this early in the am? | 05:11 |
serialhex | sweet!! | 05:11 |
@sonney2k | no just couldn't sleep | 05:14 |
@sonney2k | I guess I had to check if you had the typemaps ready ;-) | 05:15 |
serialhex | ahh... i'm about to head to sleep myself, going to start shogun compiling then catch some zzzz's | 05:15 |
serialhex | noooo... not even close yet :P | 05:15 |
serialhex | for some reason the /configure script defaults to enabling 'libshogun libshogunui cmdline python python_modular' but not ruby / java modular interfaces... is that intentional? | 05:17 |
@sonney2k | serialhex, yes | 05:18 |
serialhex | ok, just making sure | 05:18 |
@sonney2k | you need to manually specify new languages on the cmdline | 05:18 |
@sonney2k | only rock-stable ones are enabled by default | 05:18 |
serialhex | i did | 05:18 |
@sonney2k | ./configure --interfaces=libshogun,ruby_modular ? | 05:18 |
serialhex | erm... --interfaces=libshogun,libshogunui,python,cmdline,ruby_modular,java_modular,python_modular | 05:19 |
@sonney2k | why would you want all the others? | 05:19 |
serialhex | the whole shebang, cause i'm going to sleep & i can exercize my dominance of my pc this way :D | 05:19 |
@bettyboo | funny | 05:19 |
serialhex | omfg... thats crazy!! | 05:20 |
@sonney2k | ./configure --interfaces=libshogun,libshogunui,python,r,octave,cmdline,octave_modular,r_modular,lua_modular,java_modular,csharp_modular,java_modular,matlab | 05:20 |
@sonney2k | if you have all of that | 05:20 |
serialhex | i dont have the rest, i'm compiling what i have on my system | 05:20 |
@sonney2k | I guess no matlab maybe in your case | 05:20 |
serialhex | no matlab, octave or lua (yet) | 05:21 |
@sonney2k | it doesnt take that long | 05:21 |
serialhex | i figure it's another system to debug against if anything goes wrong | 05:21 |
serialhex | ...i'm running a 3ghz p4 it takes a while | 05:21 |
serialhex | i've already had some problems with swig & rvm not wanting to play nicely together... ubuntu dosnt have the latest version of ruby and i want to be able to use the new features, rvm lets me switch between rubies but for some reason it's not playing nicely :( | 05:23 |
serialhex | i dont know who to complain to, the ruby people, the swig people, or the rvm people (rvm == ruby version manager) | 05:24 |
serialhex | but, thats something for tomorrow, as i mist sleep now and see the doctor in the am! | 05:24 |
@sonney2k | serialhex, have a nice sleep! | 05:25 |
* serialhex feels like this right now: http://xkcd.com/676/ | 05:25 | |
CIA-90 | shogun: Soeren Sonnenburg master * rfec837d / README : update readme to mention the new interfaces - http://bit.ly/lG4bdE | 05:25 |
@sonney2k | heh | 05:27 |
CIA-90 | shogun: Soeren Sonnenburg master * rb10f479 / src/modular/SGBase.i : | 05:44 |
CIA-90 | shogun: use #if defined and friends to make the file more readable and to only | 05:44 |
CIA-90 | shogun: include init functions in the csharp and java modular interfaces. - http://bit.ly/mD0rj7 | 05:44 |
-!- serialhex [~quassel@99-101-149-136.lightspeed.wepbfl.sbcglobal.net] has quit [Remote host closed the connection] | 06:07 | |
-!- serialhex [~quassel@99-101-149-136.lightspeed.wepbfl.sbcglobal.net] has joined #shogun | 06:07 | |
-!- ameerkat [~ameerkat@184-98-140-155.phnx.qwest.net] has joined #shogun | 06:13 | |
-!- serialhex-10 [~androirc@99.101.149.136] has quit [Quit: AndroIRC] | 06:18 | |
-!- serialhex-10 [~androirc@99-101-149-136.lightspeed.wepbfl.sbcglobal.net] has joined #shogun | 06:18 | |
-!- serialhex [~quassel@99-101-149-136.lightspeed.wepbfl.sbcglobal.net] has quit [Remote host closed the connection] | 07:35 | |
-!- serialhex [~quassel@99-101-149-136.lightspeed.wepbfl.sbcglobal.net] has joined #shogun | 07:38 | |
-!- akhil_ [75d35896@gateway/web/freenode/ip.117.211.88.150] has joined #shogun | 08:10 | |
-!- akhil_ [75d35896@gateway/web/freenode/ip.117.211.88.150] has quit [Quit: Page closed] | 08:25 | |
-!- Tanmoy [75d35896@gateway/web/freenode/ip.117.211.88.150] has joined #shogun | 09:23 | |
-!- Tanmoy [75d35896@gateway/web/freenode/ip.117.211.88.150] has quit [Client Quit] | 09:25 | |
-!- blackburn [~qdrgsm@188.168.4.9] has joined #shogun | 09:36 | |
@sonney2k | blackburn about time ;-) | 09:38 |
@bettyboo | :) sonney2k | 09:38 |
blackburn | yeah? | 09:38 |
@sonney2k | Compiles now | 09:38 |
blackburn | sonney2k: what was wrong with evaluation? | 09:38 |
@sonney2k | Not with evaluation but general problem | 09:39 |
blackburn | ah, I see | 09:40 |
blackburn | now inspecting commits | 09:40 |
@sonney2k | Actually in swig and R | 09:40 |
@sonney2k | Anyways going swimming now | 09:42 |
@sonney2k | Later | 09:42 |
-!- ameerkat [~ameerkat@184-98-140-155.phnx.qwest.net] has quit [Ping timeout: 248 seconds] | 10:02 | |
blackburn | found some wonderful matlab implementation of ALL algos I proposed | 10:27 |
blackburn | the task became simplier to me :) | 10:28 |
-!- blackburn [~qdrgsm@188.168.4.9] has quit [Quit: Leaving.] | 10:52 | |
@sonney2k | Blackburn yeah just do them all :-) | 11:51 |
* sonney2k wonders whether blackburn meant this one http://homepage.tudelft.nl/19j49/Matlab_Toolbox_for_Dimensionality_Reduction.html | 13:05 | |
alesis-novik | Good day | 14:22 |
alesis-novik | 3 exams done, 5 to go | 14:22 |
-!- mlsec_ [~mlsec@bane.ml.tu-berlin.de] has joined #shogun | 14:56 | |
@sonney2k | alesis-novik, did things go better in the last days? | 14:56 |
alesis-novik | Yeah, they did :) | 14:56 |
-!- mode/#shogun [+o mlsec_] by ChanServ | 14:57 | |
-!- heiko [~heiko@134.91.10.201] has joined #shogun | 14:57 | |
-!- Netsplit *.net <-> *.split quits: @mlsec | 14:59 | |
-!- mlsec_ is now known as mlsec | 14:59 | |
-!- mlsec [~mlsec@bane.ml.tu-berlin.de] has quit [Client Quit] | 14:59 | |
-!- mlsec [~mlsec@bane.ml.tu-berlin.de] has joined #shogun | 15:00 | |
-!- mode/#shogun [+o mlsec] by ChanServ | 15:00 | |
@sonney2k | very good | 15:01 |
alesis-novik | Next exam on Monday and the good thing is that I got ready for some of it by getting ready for GSoC | 15:03 |
alesis-novik | Because it has things like PCA, FA, EM in it | 15:03 |
@sonney2k | heh | 15:06 |
@sonney2k | heiko, around? | 15:06 |
heiko | hej yes | 15:06 |
heiko | just read the logs from yesterday | 15:06 |
@sonney2k | ok | 15:07 |
@sonney2k | any questions? | 15:07 |
heiko | nope | 15:08 |
@sonney2k | if not then we should talk about the project :) | 15:08 |
heiko | sorry that I couldnt be there yesterday, it was just too short before when I read the mail | 15:08 |
heiko | alright | 15:08 |
heiko | I was just going to start implementing the things that we talked about yesterday | 15:09 |
@sonney2k | since I am not availabe for irc starting from next week - would it be possible to talk on the phone? | 15:09 |
heiko | yes | 15:09 |
heiko | actually, I prefer talking to chatting :) | 15:09 |
@sonney2k | I can probably call while driving a baby around :) | 15:09 |
@sonney2k | ok | 15:09 |
heiko | alright, nice :) | 15:09 |
@sonney2k | then email me how I can contact you | 15:09 |
@sonney2k | besides I think the safe things to do are of course | 15:10 |
heiko | It might also be that I am in Berlin in the next month | 15:10 |
@sonney2k | 1) implement the register_param() function in all classes that registers both x-val and real params | 15:10 |
heiko | (for some dayas) | 15:10 |
@sonney2k | 2) continue the feature subset thing for all feature types | 15:11 |
@sonney2k | and write examples for the feature subsetting thing in python_modular + do some tests | 15:11 |
heiko | yes, these should be straightforward | 15:12 |
@sonney2k | so whenever you are stuck and cannot reach me work on those :) | 15:12 |
@sonney2k | but lets continue with the plan | 15:12 |
heiko | ok | 15:13 |
@sonney2k | say we have means to actually set the values for all the parameters | 15:13 |
heiko | yes, this should work | 15:13 |
@sonney2k | then we more high level need a class say CModelSelection that can generate parameter values from a list or ranges of values | 15:14 |
heiko | yes, and there was the idea with the base class from which all methods of model selection inherit from | 15:15 |
@sonney2k | so we need a) a way to actually register parameters to perform model selection over | 15:15 |
@sonney2k | b) to iterate/sample from these | 15:16 |
@sonney2k | c) different data splitting schemes | 15:16 |
@sonney2k | d) performance measure | 15:16 |
@sonney2k | blackburn aka Sergey has done a great job and did d) for us already | 15:16 |
heiko | what has he done? | 15:17 |
@sonney2k | implemented all the performance measures I can think of | 15:17 |
heiko | alright :) | 15:17 |
@sonney2k | there is a class CEvaluation | 15:17 |
@sonney2k | (that provides an interface evaluate(CLabel* predicted, CLabel* truth) | 15:17 |
@sonney2k | and returns a real values score | 15:18 |
@sonney2k | that we can use to measure if sth is good or not | 15:18 |
@sonney2k | so I think the CModelSelection class should have a function - set_evaluation_criteria(CEvaluation* crit) | 15:18 |
@sonney2k | then it needs a function set_labels(), set_model(CClassifier* c) | 15:19 |
@sonney2k | (we will rename CClassifier to CModel at some point - so don't get confused - regression clustering etc all are derived from that) | 15:19 |
heiko | ok, no problem :) | 15:20 |
@sonney2k | and then I guess we either have CModelSelection as interface and implement all the schemes in derived classes or have a separate class doing this - up to you (I am fine with deriving things) | 15:22 |
@sonney2k | so the critical point here is to define a means to select parameters and ranges | 15:23 |
-!- f-x [b49531e5@gateway/web/freenode/ip.180.149.49.229] has joined #shogun | 15:25 | |
@sonney2k | heiko, did I write too much? | 15:25 |
heiko | no | 15:25 |
heiko | I checked out another project: | 15:25 |
heiko | there it went like this for alle the params | 15:25 |
heiko | you provided min, max, step | 15:26 |
@sonney2k | (that works only for real-values) | 15:26 |
heiko | and a possible function that produces the steps | 15:26 |
heiko | yes of course | 15:26 |
heiko | just searching where I had to documentation of this ... | 15:27 |
@sonney2k | we want sth. where we can specify e.g. for an SVM a number of different kernels and their kernel parameters even | 15:27 |
heiko | ok | 15:28 |
heiko | then this is kind of hard problem: selecting parameters/ranges | 15:28 |
@sonney2k | heiko, not too hard actually | 15:28 |
@sonney2k | just think of nested lists | 15:29 |
@sonney2k | with all the values explicitly available | 15:29 |
@sonney2k | then one has to traverse the list recursively and generate all the possible values | 15:29 |
heiko | ok nice idea | 15:29 |
heiko | and the lists are generated in our model selection class | 15:30 |
heiko | so, should all objects that contain x-val parameters know about their possible ranges and steps? | 15:32 |
@sonney2k | yes or we for now assume that they are provided somehow | 15:32 |
@sonney2k | heiko, they have to somehow | 15:32 |
heiko | so lets have an example: this list consists of two kernels | 15:33 |
heiko | each kernel has some parameters | 15:33 |
@sonney2k | while I see that this is feasible for e.g. floating point numbers I have no idea how this will work in the general case | 15:33 |
@sonney2k | yes | 15:33 |
heiko | No idea also, actually, it is new to me to perform model selection for something different than real/natural numbers | 15:34 |
heiko | but ok | 15:34 |
@sonney2k | e.g. [ [ GaussianKernel, [ 0.1, 1, 10]], [ ... ]] | 15:34 |
f-x | sonney2k: hey! have some really important assignments to submit today and tomorrow... so i think i can be active from tomorrow night only | 15:35 |
f-x | is it okay? | 15:35 |
@sonney2k | f-x, sure! Just resume work when you have time and start to communicate again! | 15:36 |
heiko | sonney2k, this looks quite similar to | 15:37 |
heiko | http://scikit-learn.sourceforge.net/auto_examples/grid_search_digits.html | 15:37 |
f-x | sonney2k: great! be back tomorrow. | 15:37 |
f-x | see ya | 15:37 |
heiko | bye f-x | 15:37 |
-!- f-x [b49531e5@gateway/web/freenode/ip.180.149.49.229] has quit [Quit: Work!!!] | 15:38 | |
@sonney2k | heiko, since it would be too difficult to parse such [] things from any language I would use shogun's list object and to register these things | 15:38 |
@sonney2k | I think we need some range object then and some constant's too | 15:39 |
heiko | yes true | 15:39 |
heiko | just meant the list | 15:39 |
@sonney2k | so one would say | 15:39 |
@sonney2k | p=ModelSelectionParameters() | 15:40 |
@sonney2k | kp=ModelSelectionParameters() | 15:40 |
@sonney2k | kp.add(GaussianKernel) | 15:40 |
@sonney2k | kp.add_logrange('width', -2,2) | 15:41 |
@sonney2k | p.add(kp) | 15:41 |
@sonney2k | etc | 15:41 |
heiko | this is really cool like this ! :) | 15:42 |
@bettyboo | HA | 15:42 |
@sonney2k | for certain languages one could make convenience functions to really work like the gridsearch above - but this here is the most general case | 15:43 |
@sonney2k | btw, when will you be in berlin and where in berlin actually? | 15:44 |
heiko | this is not perfectly clear, probably somewhere in may. Somewhere in Schöneberg, Kreuzberg or Mitte | 15:45 |
heiko | I visit some friends and want to apply for a Master at the TU | 15:45 |
heiko | isnt there a problem with these range-setters in the ModelSelectionParameters class above? | 15:46 |
heiko | it has to be somehow clear to which parameter the ranges belong | 15:46 |
@sonney2k | it should be kp.add('kernel', GaussianKernel) | 15:47 |
heiko | (I though of perhaps make a visit at your office, if you are interested :) | 15:48 |
@sonney2k | but I am not there | 15:48 |
@sonney2k | I am from july on only again | 15:48 |
heiko | ok, well we will see | 15:48 |
heiko | so now we have this tree of ModelSelectionParameter instances | 15:49 |
heiko | and the ModelSelection class is able to generate all parameter combinations from these | 15:49 |
@sonney2k | yeah you do a depth first traversal | 15:50 |
heiko | to get all parameters | 15:51 |
heiko | but for actuall cross.-validation, these have to be multiplicated to get all combinations | 15:51 |
heiko | (cartesian-multiplication) | 15:52 |
CIA-90 | shogun: Soeren Sonnenburg master * r2be31a4 / (6 files in 2 dirs): Merge branches 'streaming' and 'master' of git://github.com/frx/shogun - http://bit.ly/iseEiD | 15:53 |
@sonney2k | heiko, you are right - so we need multiple trees like this | 15:54 |
@sonney2k | hmmhh that needs some more thought | 15:56 |
heiko | yes, perhaps mixing all the parameters that are to be selected with their ranges and stuff in one structure is critical | 15:57 |
@sonney2k | it should somehow reflect the tuned_parameters in scikits | 15:58 |
@sonney2k | lets think | 15:58 |
@sonney2k | if we have nested lists | 15:59 |
@sonney2k | the top level in the list could be independent parameters / or not | 15:59 |
heiko | by independent you mean? | 15:59 |
heiko | for example kernel? | 16:00 |
heiko | different kernels?= | 16:00 |
@sonney2k | heiko, parameters that you can simultaneously select without conflict | 16:00 |
heiko | can you give an example? | 16:02 |
@sonney2k | heiko, like [ 'kernel', 'C'] | 16:04 |
@sonney2k | if we have ranges for both, they could be selected simultaneously | 16:05 |
@sonney2k | I mean your parameter multiplication explosion | 16:05 |
heiko | simultaneously means both values are changed at the same time? | 16:05 |
@sonney2k | to get all combinarionts | 16:05 |
@sonney2k | yes | 16:05 |
heiko | but if you change your kernel, | 16:06 |
heiko | dont you have to reconsider your C choice | 16:06 |
heiko | because the feature space changes? | 16:06 |
@sonney2k | yes that is why all combinations | 16:06 |
heiko | ok, then I misunderstood you | 16:06 |
@sonney2k | so #C * #kernels(+kernel parameter values) | 16:06 |
heiko | and dependent parameters? | 16:07 |
@sonney2k | e.g. kernel -> GaussianKernel, PolyKernel ... | 16:08 |
@sonney2k | and parameters attached | 16:08 |
heiko | ah alright now i get you | 16:08 |
heiko | so you mean certain parameters imply other parameters | 16:08 |
heiko | and therefore, not all combination sof all parameters | 16:09 |
heiko | are "valid" | 16:09 |
heiko | if one would just use the tree | 16:09 |
@sonney2k | yes and you can only select one not multiple of them at the same time | 16:09 |
@sonney2k | yesz | 16:09 |
@sonney2k | I am looking at the nested lists again | 16:09 |
@sonney2k | (leaving this problem aside for now) | 16:10 |
heiko | ok | 16:10 |
@sonney2k | in the top level of the list one would only expect parameters for the respective CModel | 16:10 |
@sonney2k | in one level below parameters for a parameter of CModel and so on | 16:10 |
@sonney2k | the problem now is that there could be 'kernel', GaussianKernel and 'kernel', PolyKernel | 16:11 |
@sonney2k | so what needs to restructure this list to have all kernels in one list | 16:12 |
heiko | 'kernel' in set_range you mean? | 16:12 |
@sonney2k | p=ModelSelectionParameters() | 16:12 |
@sonney2k | kp=ModelSelectionParameters() | 16:12 |
@sonney2k | err | 16:13 |
@sonney2k | p.add('kernel', GaussianKernel) | 16:13 |
@sonney2k | p.add('kernel', PolyKernel) | 16:13 |
heiko | what do you mean by 'so what needs to restructure this list to have all kernels in one list'? | 16:19 |
@sonney2k | heiko, ok new try - now 'much easier' | 16:20 |
@sonney2k | [['kernel', [ [ GaussianKernel, ['width', [ 1,2,3 ]] ], [ PolyKernel, ['degree', [1,2]] ] ]], ['C', [0.1, 1, 10]] | 16:20 |
@sonney2k | we do it in a way that parameters that are independent from each other have to be specified in one go | 16:21 |
@sonney2k | so in the example, 'kernel' can be one of GaussianKernel or PolyKernel with respective parameters | 16:21 |
@sonney2k | and C is independent of this so any value of C can be combined with 'kernel' | 16:22 |
heiko | ok, this sounds good | 16:24 |
-!- serialhex [~quassel@99-101-149-136.lightspeed.wepbfl.sbcglobal.net] has quit [Read error: Connection reset by peer] | 16:26 | |
@sonney2k | let me draft a ModelSelectionParametrers() example | 16:26 |
-!- serialhex [~quassel@99-101-149-136.lightspeed.wepbfl.sbcglobal.net] has joined #shogun | 16:27 | |
@sonney2k | par=ModelSelectionParameter() | 16:28 |
@sonney2k | kernel_par=ModelSelectionParameter() | 16:28 |
@sonney2k | gaussian_par=ModelSelectionParameter() | 16:28 |
@sonney2k | gaussian_par.add_range('width', 1,3, 3) | 16:28 |
@sonney2k | poly_par=ModelSelectionParameter() | 16:28 |
@sonney2k | poly_par.add_range('degree', 1,2,2) | 16:28 |
@sonney2k | kernel_par.add('GaussianKernel', gaussian_par) | 16:28 |
@sonney2k | kernel_par.add('PolyKernel', poly_par) | 16:28 |
@sonney2k | par.add('kernel', kernel_par) | 16:29 |
@sonney2k | par.add_logrange('C', -1,1, 3) | 16:29 |
@sonney2k | to explain | 16:30 |
@sonney2k | you create par - the master parameter | 16:30 |
heiko | yes, think I get it | 16:30 |
@sonney2k | then you create a kernel parameter that will contain gaussian and poly and add that later and also add C later | 16:31 |
@sonney2k | since gaussian/poly have paramters too we need also params for them | 16:31 |
@sonney2k | ok | 16:31 |
heiko | possible combinations are represented by a path from an element in par to a leaf | 16:33 |
heiko | and all possible paths for every element in the top level have to be combined | 16:33 |
-!- serialhex-10 [~androirc@99-101-149-136.lightspeed.wepbfl.sbcglobal.net] has quit [Read error: Operation timed out] | 16:35 | |
@sonney2k | heiko, now the messy(tm) part is what happens when the kernel itself has another independent parameter, e.g. say the way a kernel is normalized | 16:36 |
heiko | sounds messy(tm) ;) | 16:37 |
heiko | let me get a sheet of paper ... | 16:37 |
@sonney2k | I have a workaround idea for that | 16:39 |
@sonney2k | I mean it is complicated enough already in the example above | 16:40 |
@sonney2k | luckily people will usually only do model selection over one paramter range and maybe one more parameter | 16:40 |
heiko | becomes unfeasible quite fast if you go for more | 16:41 |
@sonney2k | so the workaround would be to put 'kernel.normalizer' in there and parse the '.' or assume that people give us simply kernels with different normalizers anyways | 16:41 |
@sonney2k | then this is not necessary at all | 16:41 |
@sonney2k | in the tuned_parameters of scikits learn parameter names are flattened it seems ... no idea what they do if they have two objects with same paramter names | 16:43 |
-!- ameerkat [~ameerkat@184-98-140-155.phnx.qwest.net] has joined #shogun | 16:43 | |
heiko | sorry phone ringing ... | 16:51 |
@sonney2k | heiko, back? | 17:02 |
heiko | 2mins | 17:03 |
heiko | re | 17:05 |
@sonney2k | I have a more clear picture now | 17:06 |
heiko | ok give it to me | 17:06 |
heiko | err, tell me ;) | 17:06 |
@sonney2k | in my example above we have two ways | 17:06 |
@sonney2k | of adding a parameter | 17:07 |
@sonney2k | one is adding a [ <name>, <range|param_list> ] | 17:07 |
@sonney2k | and one is adding a [ <value>, <param_list> ] | 17:08 |
@sonney2k | whenever we add a new name we know these are independent things so they will need 'expansion' later on | 17:09 |
@sonney2k | the others are just nested values for a certain <name> | 17:09 |
heiko | yes | 17:12 |
heiko | so this may be applied to the kernel.normalizer | 17:12 |
@sonney2k | so that makes clear how/when to recurse | 17:12 |
@sonney2k | that would even work with the kernel normalizer then | 17:13 |
@sonney2k | it is just another nested thing | 17:14 |
heiko | i am a bit unsure with the line | 17:14 |
heiko | kernel_par.add('GaussianKernel', gaussian_par) | 17:14 |
heiko | because it is a new name | 17:15 |
@sonney2k | http://dpaste.com/536807/ | 17:15 |
@sonney2k | the way how lists are nested | 17:15 |
heiko | ok and the name addings are only for 'kernel' and 'C' | 17:15 |
@sonney2k | yes | 17:16 |
@sonney2k | and if one adds another name it will need a recursion | 17:16 |
heiko | ok | 17:19 |
@sonney2k | is that more or less clear? | 17:21 |
heiko | then now we have this list | 17:21 |
heiko | oh, yes | 17:21 |
@sonney2k | nested list yes | 17:21 |
heiko | well ok, then it should be possible to set these parameters to the shogun objects | 17:23 |
@sonney2k | yeah and you need to open a recursion whenever you hit a new name for a parameter | 17:24 |
@sonney2k | I guess we define just a few for the beginning: float64_t and SGObject | 17:25 |
heiko | yes | 17:26 |
heiko | the next step should be to evaluate the classifier | 17:26 |
heiko | s | 17:26 |
@sonney2k | then the modsel object can generate a new values and these are registered in a CParamter object that is then applied to the shogun object | 17:26 |
@sonney2k | heiko, the classifier always has a function train() | 17:27 |
@sonney2k | and classify() | 17:27 |
heiko | yes | 17:27 |
@sonney2k | classify() returns a label object | 17:27 |
@sonney2k | and then you call the evaluate() function of your evaluation object | 17:27 |
heiko | by evaluate I meant to test it on data and calculate a performance measure | 17:28 |
heiko | ok, same thing | 17:28 |
heiko | but the question before that is on which data to train and to test | 17:28 |
heiko | so the spliiting of the data | 17:28 |
@sonney2k | yes you change the data splitting dependent on which ModelSelection() class you are in | 17:29 |
@sonney2k | NFoldModelSelection, LOOModelSelection or so | 17:30 |
heiko | isnt the way the data is split up independent from the model selection process? | 17:30 |
heiko | i mean, for example the grid search just iterates over all parameters and selects the classifier with the best acheived performance measure | 17:30 |
@sonney2k | maybe we have a name clash here | 17:30 |
heiko | but only the evaluation of a classifier is dependent on the way the data is split up | 17:31 |
@sonney2k | well the search part is the same it is only that you use different data splits | 17:31 |
@sonney2k | yes | 17:31 |
heiko | wouldnt it be good to seperate this? | 17:31 |
heiko | like: | 17:31 |
@sonney2k | as I said either have different ModSel objects one for each way you split the data or set the data splitting via some extra class | 17:32 |
heiko | having a class that evaluates a classifier based on a certain strategy. | 17:32 |
heiko | and this (or another) class is used by the search class | 17:32 |
heiko | i think i would prefer a base class for splitting up the data and specializations of this class are used by any search class | 17:34 |
@sonney2k | I am fine with both approaches | 17:34 |
heiko | ok | 17:34 |
@sonney2k | (I was assuming that this is what the ModelSelection class does) | 17:35 |
heiko | but there also might be different methods of selecting a model | 17:35 |
@sonney2k | apart from that it looks like we have a more concrete plan or? | 17:35 |
heiko | like grid-search, bisection, gradient descent etc | 17:35 |
heiko | yes | 17:35 |
@sonney2k | heiko, now I understand grid-search etc | 17:35 |
@sonney2k | yes | 17:35 |
@sonney2k | ok then do it as you proposed | 17:35 |
@sonney2k | any problem you see right now? | 17:36 |
@sonney2k | there will probably many when it comes to the details but for now high-level wise? | 17:36 |
heiko | with the overall process? | 17:36 |
@sonney2k | yeah | 17:36 |
heiko | no this looks all quite good to me | 17:37 |
@sonney2k | because I see one potential bug in all kernel based methods now... | 17:37 |
@sonney2k | that will kill things | 17:37 |
@sonney2k | do you want to know or was that enough for today? | 17:38 |
heiko | no tell me :) | 17:38 |
heiko | if you have time, I have | 17:38 |
@sonney2k | currently kernel machines only remember the training indices but don't keep the actual data around | 17:39 |
heiko | oh | 17:39 |
@sonney2k | so that means our feature subsetting thing will kabooom | 17:39 |
heiko | and changing the data all time .. | 17:39 |
heiko | well | 17:39 |
heiko | and now? | 17:40 |
@sonney2k | so start with linear classifiers :) | 17:40 |
heiko | hehe ;) | 17:40 |
heiko | what about storing pointers to data? | 17:41 |
heiko | my first intend | 17:41 |
@sonney2k | doesn't work - features can be on-the-fly computed | 17:41 |
@sonney2k | so pointers are not really valid | 17:41 |
@sonney2k | I think kernel machines have to actually really store the training data for all alphas != 0 | 17:42 |
heiko | the SV-alphas you mean, right? | 17:42 |
@sonney2k | yes | 17:42 |
@sonney2k | or somehow have a view of the features | 17:42 |
heiko | what about some mapping? | 17:43 |
heiko | mmh, but this only works in the other direction | 17:44 |
@sonney2k | or multiple views on data | 17:44 |
heiko | ah ok I understadn | 17:45 |
heiko | hmm, so the machine has an internal map | 17:46 |
heiko | these view must be located in the features, right? | 17:50 |
heiko | because the learning machine gets access via the features, who operate on the subset | 17:50 |
heiko | quite complicated. | 17:50 |
heiko | what about storing the data? | 17:50 |
heiko | too much stuff? | 17:51 |
heiko | i do not know how large problems become, i my BA I had some thousend SVs and the feature space dimension was huge, but there might be problems with storing the data doubled | 17:53 |
heiko | so the views would be a more performant approach, while being more complicated | 17:53 |
heiko | to implement and to use | 17:53 |
@sonney2k | heiko, and the view cannot be stored internally in the features / or one would have to select the view all the time | 17:54 |
@sonney2k | messy | 17:54 |
@sonney2k | one could argue that kernel machines are not largescale anyway and so just duplicate the data | 17:55 |
heiko | what if a way to avoid the subset of features is added and used by kernel machines? | 17:55 |
@sonney2k | but that is also hacky or? | 17:55 |
@sonney2k | then you have to do that for all machines differntly | 17:56 |
heiko | yes | 17:56 |
heiko | not to cool | 17:56 |
heiko | so storing | 17:57 |
@sonney2k | or maybe write an email to the mailinglist - maybe someone has a better idea | 17:57 |
@sonney2k | or | 17:58 |
@sonney2k | one has a trainign and test view on data | 17:58 |
@sonney2k | though I would very much prefer that a classifier is self contained after training, that is has all the things necessary to be called later on with some test features. | 17:59 |
@sonney2k | the problem then of course is that e.g. KNN has to copy all the data | 17:59 |
@sonney2k | and the kernel machine in the end too | 18:00 |
heiko | nearest neighbour? | 18:00 |
@sonney2k | yes | 18:00 |
heiko | true | 18:00 |
@sonney2k | anyway got to go | 18:00 |
heiko | ok, lets talk about it later | 18:00 |
heiko | have a nice evening and thanks for all your time | 18:00 |
@sonney2k | so ask on the ML and try to draw a picture of the plan or document it somehow | 18:01 |
@sonney2k | otherwise channel logs :) | 18:01 |
@sonney2k | try to get into the certain parts and see if it all still makes sense after a night of sleep | 18:01 |
@sonney2k | ok | 18:01 |
@sonney2k | l8r | 18:01 |
@sonney2k | thanks for the discussion! | 18:01 |
* sonney2k Re | 18:24 | |
-!- ameerkat [~ameerkat@184-98-140-155.phnx.qwest.net] has quit [Ping timeout: 248 seconds] | 18:41 | |
-!- alesis-novik [~alesis@188.74.87.84] has quit [Quit: I'll be Bach] | 18:56 | |
-!- heiko [~heiko@134.91.10.201] has quit [Ping timeout: 258 seconds] | 19:20 | |
-!- heiko [~heiko@134.91.55.152] has joined #shogun | 19:28 | |
-!- akhil_ [75d35896@gateway/web/freenode/ip.117.211.88.150] has joined #shogun | 19:37 | |
-!- blackburn [~qdrgsm@188.168.2.109] has joined #shogun | 19:39 | |
blackburn | hello | 19:44 |
blackburn | now reading student google group | 19:45 |
blackburn | it is something terrible | 19:45 |
-!- akhil_ [75d35896@gateway/web/freenode/ip.117.211.88.150] has quit [Ping timeout: 252 seconds] | 20:03 | |
@sonney2k | blackburn, what is going on in there? | 20:43 |
blackburn | sonney2k: guys are asking all the way about 'enrollment proof' | 20:44 |
blackburn | and there is A LOT of messages like 'I haven't received any email asking me to proof that. WHAT SHOULD I DO??!!!' | 20:44 |
blackburn | these guys are really mad :D | 20:44 |
@bettyboo | hehe! | 20:44 |
@sonney2k | oh well | 20:46 |
-!- heiko [~heiko@134.91.55.152] has left #shogun [] | 20:48 | |
@sonney2k | blackburn, is it this toolbox http://homepage.tudelft.nl/19j49/Matlab_Toolbox_for_Dimensionality_Reduction.html ? | 21:01 |
blackburn | hm | 21:01 |
blackburn | it isn't | 21:01 |
@sonney2k | 33 methods! | 21:02 |
blackburn | but that link could be useful too | 21:02 |
blackburn | sonney2k: thank you | 21:03 |
blackburn | downloaded | 21:03 |
blackburn | with time all of these methods could be implemented.. | 21:04 |
@sonney2k | so you will have a busy summer :D | 21:05 |
blackburn | hey! I will not work 80 hours a week :) | 21:05 |
@sonney2k | just 33 methods - how many weeks are there? | 21:06 |
@sonney2k | err days | 21:06 |
@sonney2k | 1 method 3 days? | 21:07 |
@sonney2k | :) | 21:07 |
blackburn | oh well one method per 3 days :D | 21:07 |
blackburn | so then let google paid me 33K | 21:07 |
@sonney2k | 1100K per 3 days? | 21:08 |
@sonney2k | I would wish I get this much money :D | 21:09 |
@sonney2k | er 1.1K | 21:09 |
blackburn | yeah I wish to get it too :D | 21:09 |
blackburn | with expenses I have now I could not work for 11 years :D | 21:11 |
@bettyboo | haha blackburn | 21:11 |
@sonney2k | blackburn, lets see what you say when you are 11 years older | 21:12 |
@sonney2k | or even my age :) | 21:12 |
blackburn | sonney2k: of course, I will say it is not sufficient in next 2 years | 21:12 |
blackburn | sonney2k: your age is quite similar to me 11 years older | 21:16 |
@sonney2k | similar is not sufficient :) | 21:17 |
@sonney2k | anyway check licenses of the toolboxes you use or ask the author if they are not compatible and you just port the dim red method | 21:18 |
blackburn | sonney2k: eh. but is port clashes with license? | 21:18 |
@sonney2k | if you use their source code and are not allowed to do so - I think so | 21:19 |
blackburn | I mean it just the same way to do it | 21:19 |
blackburn | sonney2k: ok, I will check for it | 21:20 |
blackburn | now I want to make port to python, I could better comprehend it | 21:20 |
@sonney2k | waste of time - I mean you can use octave to run the code and it is as simple to read as python or even better | 21:21 |
blackburn | sonney2k: do you suggest to port it right in C++? | 21:22 |
blackburn | and.. why should I use octave, I used matlab to run it | 21:23 |
@sonney2k | blackburn, yes - but of course the methods you are really interested in you should try to understant oo | 21:23 |
@sonney2k | didn't know you have matlab | 21:23 |
@sonney2k | expensive... | 21:23 |
blackburn | sonney2k: just recall where I live | 21:24 |
@sonney2k | I see university ;) | 21:24 |
@sonney2k | anyways doesn't matter | 21:24 |
blackburn | I mean I have no problem to crack it, nobody cares there.. :D | 21:24 |
@bettyboo | blackburn, rotfl | 21:24 |
@sonney2k | but seriously look at the code - it is pretty easy to understand. | 21:25 |
blackburn | sonney2k: yeah it is | 21:25 |
@sonney2k | and I guess when he has 33 dim red methods implemented he must have had some plan | 21:25 |
blackburn | sonney2k: I already checked for papers about | 21:26 |
blackburn | I have papers about algos I proposed | 21:26 |
blackburn | sonney2k: well my plan to port it is really wasting of time | 21:27 |
blackburn | now understand it :) | 21:27 |
blackburn | sonney2k: http://www.math.ucla.edu/~wittman/mani/ that one has more 'beauty' code | 21:28 |
blackburn | not 33 but has ones I proposed | 21:29 |
@sonney2k | are you joking? | 21:30 |
@sonney2k | I mean you meant mani.m right? | 21:31 |
* sonney2k couldn't understand this code - only the one from the 33 examples | 21:34 | |
blackburn | sonney2k: it is because there are some GUI issues | 21:42 |
@sonney2k | it is only gui right? | 21:43 |
blackburn | sonney2k: nope, there are methods too | 21:43 |
blackburn | sonney2k: for example line 876 for hessian lle | 21:43 |
blackburn | at least it generates some nice pictures :D | 21:44 |
@bettyboo | hoho | 21:44 |
@sonney2k | I see - but yeah I am not a gui guy :) | 21:46 |
blackburn | oh I really want to start working on it | 21:48 |
blackburn | but now have to write some ejbLoad(), ejbStore(), ... | 21:48 |
@sonney2k | blackburn, enjoy! | 21:53 |
blackburn | sonney2k: now you are joking :D | 21:53 |
@sonney2k | BTW, when will you finally have holidays? | 21:53 |
blackburn | 15-20 june | 21:53 |
blackburn | but in june I only have 3 exams and nothing more | 21:54 |
@sonney2k | ohh so you are pretty busy - will be tough for you to get any work done ... | 21:55 |
blackburn | sonney2k: anyway I will done everything, do not worry :) | 21:56 |
@sonney2k | yeah, I hope you will manage | 21:57 |
blackburn | sonney2k: now I just have to finish my java ee project and other studies is pretty easy to me | 21:58 |
@sonney2k | just do it! | 21:59 |
blackburn | sonney2k: yeah, working now :) | 22:00 |
@sonney2k | blackburn, good luck and good night | 22:40 |
--- Log closed Fri Apr 29 00:00:37 2011 |
Generated by irclog2html.py 2.10.0 by Marius Gedminas - find it at mg.pov.lt!