IRC logs of #shogun for Wednesday, 2011-04-13

--- Log opened Wed Apr 13 00:00:36 2011
-!- skydiver [c255a037@gateway/web/freenode/ip.194.85.160.55] has quit [Quit: Page closed]00:22
-!- akhil_ [75d35896@gateway/web/freenode/ip.117.211.88.150] has joined #shogun03:38
-!- akhil_ [75d35896@gateway/web/freenode/ip.117.211.88.150] has quit [Quit: Page closed]04:53
-!- oiwah [~oiwah@i114-181-177-143.s04.a013.ap.plala.or.jp] has joined #shogun06:13
-!- siddharth_ [~siddharth@117.211.88.150] has quit [Read error: Connection reset by peer]06:32
@sonney2kHmmhh no news yet06:34
-!- siddharth [~siddharth@117.211.88.150] has joined #shogun06:38
serialhexyou're up early sonney2k06:48
alesis-noviksonney2k, I was wondering if you took a look at the PDF function I sent a while back?06:56
-!- oiwah [~oiwah@i114-181-177-143.s04.a013.ap.plala.or.jp] has quit [Quit: Leaving...]07:14
@sonney2kalesis-novik, I must have missed that email(?)07:33
@sonney2kserialhex, too exciting... but no news still07:34
alesis-noviksonney2k, I sent an email quite a while back, given that the code had more to do with my potential project than with anything else in shogun07:34
serialhexyeah, i'm going to sleep in a few so hopefully there will be news when i wake!07:35
@sonney2kalesis-novik, then I certainly must have missed it07:35
@sonney2khmmhh people at google seem to have as long working hours as I do ;-)07:40
alesis-novikWell, going to sleep07:41
@sonney2kalesis-novik, what was the subject of your email?07:41
alesis-novikThe chain was "GSoC and Shogun questions"07:41
@sonney2kalesis-novik, hmmhh I just have one email from you then - no pdfs though07:43
@sonney2kwait now07:43
@sonney2kfound07:43
@sonney2kalesis-novik, want to hear comments now or go to bed?07:44
alesis-novikbed can wait07:44
@sonney2kCould you derive your pdf from CDistribution?07:47
@sonney2kwas used for discrete data but still...07:48
@sonney2kif you do this and implement the interface then e.g. computing the fisher / top kernel from your pdf becomes possible07:49
-!- yayo3 [~jakub@ip-78-45-113-245.net.upcbroadband.cz] has joined #shogun07:50
alesis-novikI'm not sure how the Gaussian PDF should build on CDistribution07:51
@sonney2khttp://www.shogun-toolbox.org/doc/classshogun_1_1CDistribution.html07:53
@sonney2kI mean if I understand correctly you assume the converiance matrix / mean to be given07:53
@sonney2kand you supply a function to compute p(x)07:53
@sonney2kcompute_PDF07:54
alesis-novikyes07:54
@sonney2kthat would be the get_likelihood_example on in CDistribution07:54
@sonney2kCFeatures would then be some CDotFeatures and you would have to call get_feature_vector to actually get the vector07:55
@sonney2kreturning model parameters is not so difficult eather07:55
@sonney2kjust the covariance matrix and mean flattened07:55
@sonney2k(I mean as long vector)07:55
@sonney2kand then only derivatives and you are done07:56
@sonney2k(ok training is missing but if you assume that it is done outside - it is ok)07:56
@sonney2kat least for now.07:58
@sonney2kalesis-novik, patches in this direction are highly welcome :)07:59
alesis-novikok, I'll adapt that tomorrow. So then I don't really need the general PDF class07:59
@sonney2kyes - sounds good07:59
yayo3by the way08:06
yayo3shogun is supposed to include logistic regression, according to feature matrix from web page08:07
yayo3but I can't find it. there must be some trick :)08:07
yayo3(or I'm blind, anyway(08:07
alesis-novikWhen in doubt, grep it08:08
alesis-novikAnyway, goodnight08:08
serialhexsleepz iz teh r0x0rz!!!  nite all!08:13
@sonney2kserialhex, good nite08:14
@sonney2kyayo3, it is hidden in liblinear08:15
@sonney2kyou need to use the logistic loss then08:15
yayo3yeah I used the grep :) but LR can predict probability of each class, and I don't think there's an interface for that. at least I can't find one08:32
yayo3anyway I'm off to school. see ya08:32
-!- yayo3 [~jakub@ip-78-45-113-245.net.upcbroadband.cz] has left #shogun []08:32
-!- blackburn [~qdrgsm@188.168.4.116] has joined #shogun08:36
-!- akhil_ [75d35896@gateway/web/freenode/ip.117.211.88.150] has joined #shogun09:05
blackburnsonney2k: waiting for ya :)09:06
@sonney2kblackburn, morning!09:06
CIA-8shogun: Soeren Sonnenburg master * r6e53c49 / src/libshogun/kernel/InverseMultiQuadricKernel.cpp : don't assert anything when registering parameters - http://bit.ly/dWPVrw09:07
CIA-8shogun: Soeren Sonnenburg master * rc010bae / (4 files in 2 dirs): Introduce Multiquadric kernel. Thanks Joanna Stocka for the patch. - http://bit.ly/i8TFEo09:07
CIA-8shogun: Soeren Sonnenburg master * rb0e729e / src/libshogun/kernel/Kernel.cpp : Fix duplicate case entry for MultiQuadric Kernel - http://bit.ly/fVy4s609:07
CIA-8shogun: Soeren Sonnenburg master * r388d634 / (17 files): minor code reshuffling and verious coding style fixes in the recently added kernels - http://bit.ly/eXS1KE09:07
blackburnMorgen :)09:07
@sonney2kblackburn, If you'd like to know sth about #slots - no news yet09:07
-!- Tanmoy [75d35896@gateway/web/freenode/ip.117.211.88.150] has joined #shogun09:08
@sonney2kHmmhh I had to do quite some more polishing of all the new kernels - reviewing code is quite some work (I know now)09:09
blackburnI see, commit of 17 files :D09:09
@sonney2kanyways - only very minor stuff09:09
blackburnhow many new kernels do you have now?09:10
@sonney2kI lost track09:10
@sonney2kand actually before we don't have examples in all languages - I won't bother counting09:10
@sonney2kthese examples are used as test-suite btw (at least for the python interfaces)09:11
@sonney2k(which makes live easy - just write one example and the testsuite will use it :)09:12
Tanmoy@sonney2k i was just seeing the list of kernels many of the list follows from here09:12
Tanmoyhttp://crsouza.blogspot.com/2010/03/kernel-functions-for-machine-learning.html#circular09:12
Tanmoyexcept few exceptions09:12
@sonney2kTanmoy, yes - blackburn's fault :D09:12
@bettyboohe!! sonney2k09:12
Tanmoyill send in a list of mine09:13
Tanmoythough some of them needs to be checked ...had some doubts in multiquadratic and circular kernels09:13
blackburneh?09:13
@sonney2ksimilarly easy would be to program distributions derived from CDistributions and also performance measures like accuracy etc09:14
@sonney2kblackburn, no don't look so innocent - I mean this in a strictly positive way :)09:15
blackburnnot pretty sure what I did wrong :)09:15
@sonney2kTanmoy, I think some of them are actually not positive09:15
@sonney2kbut hey - I am using SVMs for 10 years now and never needed them... well I didn't know that I need them.09:16
-!- skydiver [c255a037@gateway/web/freenode/ip.194.85.160.55] has joined #shogun09:16
blackburnah, you mean that some kernels are 'bad'? sure it is09:16
@sonney2kblackburn, well they are not positive definite09:16
@sonney2kbut empirically they probably still are09:16
Tanmoywell thereotically are shld or need not be +ve def09:17
blackburnaha, Mercer's theorem, etc09:17
@sonney2kblackburn, yes... an easy way to check is  - just to generate some random vectors and then computing eigenvalues - checking if they are postive...09:17
Tanmoyx\transpose A x09:18
blackburnsonney2k: btw, about tests, may be I'll write it now09:18
@sonney2k(I had negative ones even with gaussian kernels - welcome to numerics)09:18
@sonney2kblackburn, for the new kernels or what?09:18
blackburnyeap, new kernels09:18
@sonney2kblackburn, that would be nice09:18
Tanmoyi was thinking of extending kernels for Graphs09:18
Tanmoylike diffusion kernel09:18
blackburnsonney2k:  examples/undocumented/ , right?09:18
blackburnbut only for python :D09:19
@sonney2kblackburn, yes - put a *short* description in examples/description - it is automatically prepended09:19
@sonney2kblackburn, yeah python / python_modular is sufficient09:19
@sonney2kthat will test both interfaces (static and modular) and that is sufficient to show that it works correctly09:20
blackburnsonney2k: eh, static?09:20
blackburnthought that we don't have static for our new kernels09:20
blackburnonly siddharth wrote one09:21
@sonney2kblackburn, configure --interfaces=libshogun,libshogunui,python for example09:21
@sonney2kblackburn, I can add that in 15 minutes or so09:21
blackburnok09:21
blackburn310-450, so we have 15 new kernels now09:22
blackburn1/15 :D09:27
blackburnsonney2k: btw, is it good practice? CWaveletKernel(CDotFeatures* l, CDotFeatures* r, int32_t size,float64_t Wdilation, float64_t Wtranslation);09:27
blackburnint32_t size09:28
blackburndon't sure why I should set cache size when using it in python09:28
blackburn4 tested so far..09:37
-!- Tanmoy [75d35896@gateway/web/freenode/ip.117.211.88.150] has quit [Quit: Page closed]09:59
@sonney2kblackburn, true - not needed10:05
@sonney2kremove that cache size arg10:05
blackburnokay10:05
blackburnsonney2k: will it be okay if I will commit only modular tests this time?10:08
@sonney2kblackburn, sure10:08
@sonney2kblackburn, I am already greateful that you are doing this10:08
blackburnqdrgsm@blackburn-R519:~/Documents/GSoC-SHOGUN/shogun_myfork/shogun/examples/undocumented/python_modular$ python kernel_exponential_modular.py10:08
blackburnExponential10:08
blackburnSegmentation fault10:08
@sonney2kheh :)10:09
@bettybooyeah ;D10:09
@sonney2kso serialization doesn't work or so10:09
blackburnsonney2k: there is no distance in CExponentialKernel constructors10:10
blackburnbut for some reason it is used10:10
@sonney2kblackburn, I am not even sure if any of these 'kernels' is a valid kernel when the distance is not euclidian10:10
@sonney2kbut hey - do you want to check or shall I10:11
@sonney2k?10:11
blackburnsonney2k: sure it wil not in e.g. hamming distance or etc10:11
blackburnwill*10:11
blackburnsonney2k: I'm talking about code of ExponentialKernel, don't about its math10:11
-!- heiko [~heiko@infole-06.uni-duisburg.de] has joined #shogun10:11
blackburnsonney2k: think it will be valid in case of minkowski metric, but not sure at all10:12
@sonney2kblackburn, I think we also have sth like kerneldistance that tries to make a distance from a kernel - so one can do this infintiely long :D10:14
blackburnanyway, I will fix ExponentiaKernel because it doesn't work :)10:14
blackburnWHAT THE F--K10:18
blackburnCExponentialKernel::CExponentialKernel(int32_t size, float64_t w)10:18
blackburn: CDotKernel(size)10:18
blackburn{10:18
blackburninit();10:18
blackburnwidth=w;10:18
blackburnASSERT(distance);10:18
blackburnSG_REF(distance);10:18
blackburn}10:18
blackburnwhy do it asserting and referencing NULL?!10:18
-!- blackburn was kicked from #shogun by bettyboo [flood]10:18
-!- blackburn [~qdrgsm@188.168.4.116] has joined #shogun10:18
@sonney2kblackburn, yeah ok... bug10:20
blackburnsonney2k: not your I think :)10:21
blackburnsonney2k: removed that constructor10:21
blackburnsonney2k: 22 forks so far10:24
@sonney2klets see if this trend goes on when we know the nr of slots10:25
blackburnsonney2k: btw, may be later we should refactor these kernels to not use distance10:26
blackburnit seems to a hard job proving it's 'mercerness' :D for all the distances10:26
@sonney2kmaybe... but hey these kernels are so exotic... if someone uses them he should really know what he is doing10:27
blackburnhehe10:28
blackburni broke something..10:28
@sonney2kso not only me can do that :D10:28
blackburnImportError: /usr/local/lib/python2.6/dist-packages/shogun/_Kernel.so: undefined symbol: _ZN6shogun18CExponentialKernelC1Eid10:28
blackburnehhh10:30
@sonney2kforgot make install?10:30
@sonney2k(or setting LD_LIBRARY_PATH)10:30
blackburnnope10:30
blackburnhehe10:30
@bettybooyeah ;D10:30
blackburnmy kernels become unavailable10:30
blackburnhmm10:31
blackburnmy shogun became unavailable, funny10:31
blackburnsonney2k: what LD_LIBRARY_PATH is?10:32
@sonney2kblackburn, the pyth the dynamic linker is searching for libraries (like libshogun.so*)10:32
blackburnah10:33
blackburnseems it is working10:33
blackburnbut shogun not for some reason :)10:33
blackburnwill try to reconfig it and make another install10:34
blackburnsonney2k: why googlers don't informate you about slots?10:35
@sonney2kwell they have 175 orgs to go through10:35
@sonney2kthat takes time10:35
blackburnah10:35
blackburnwhen they will? today and may be tomorrow?10:35
@sonney2kwhen they are done10:36
blackburnok :)10:36
@sonney2kthis is the official statement over at #gsoc: <gsocbot> sonney2k: "slots" is Slot allocation is done manually by Chris DiBona and Carol Smith, be a good org, play nice on the mentor list and #gsoc, ask for a non-crazy-high number of slots, and you'll probably get what you ask. Note that non-crazy-high for new orgs is around 1 or 2.10:36
blackburnahaha10:37
blackburn1 or 2, nice, with 60 candidates10:37
@sonney2kI mean they will see that and they will see that we have about 15 mentors assigned so they might be nice to us *I hope*10:38
blackburnsonney2k: if they will give you 2 slots ask they to choose these two ones :D10:38
@sonney2kbut we have to understand them: I mean we are a new org and they just have no data about us so we could potentially screw up completely10:40
blackburnsonney2k: yeap10:40
blackburnbut there are a few reasons to believe in you too10:40
* sonney2k wonders what these could be10:41
blackburne.g. you really have a choice with so many candidates available10:41
blackburnand I think your mentors have some authority (more than some php-ers from some open-source project), but I don't know at all10:43
blackburnjust my (may be silly) opinion10:43
heikohello10:44
@sonney2kwe are all above 18 too ;-)10:44
@sonney2khi10:44
blackburnsonney2k: haha, all possible gsocers are10:45
heikosonney2k: i would like to talk about the cross-val framework a bit, so you have a moment?10:45
@sonney2kshoot10:45
heikook, as you mentioned there at least two new classes needed:10:46
heikoone to store parameters of a particular learning machine, live kernel, data, C-param, etc10:46
heikothen there is another one that gets the overall cross-val parameters and generates multiple instances of the first one10:47
heikoat last, there has to be a class which gets the overall cross-val params and uses the second class to create mnultiple machines10:48
heikoand runs these in an order based on the cross-val params10:48
heikoevaluates and etxc10:48
@sonney2kexeactly10:48
heikook10:48
heikoand parameters for cross-val are like10:48
@sonney2kthe problem I see so far is where to draw the line / what is a parameter.10:49
heikook10:49
heikoi thought of:10:49
@sonney2kI mean in the usual setup one fixes the data set10:50
@sonney2kand then trains on subsets of it10:50
heikofold (n), split type(stratified, normal etc), optimizeby (performance measures), parameters to optimize, and the serach type10:50
-!- akhil_ [75d35896@gateway/web/freenode/ip.117.211.88.150] has left #shogun []10:50
heikoyou mean whether the subset of data that is used for a fold is a parameter?10:51
@sonney2kyes, like e.g. a kernel parameter10:52
blackburnsonney2k: can RealFeatures (in python) be used as SimpleFeatures (internally)?10:52
@sonney2kI think we should assume to have a big data set that is fixed (data is no parameter)10:52
@sonney2kblackburn, in C++ you mean - yes10:53
heikoand then the CParameterSetting class just stores indizes of the data to use for training/test?10:53
blackburnsonney2k: ah, my fault!10:53
@sonney2kheiko, yes.10:54
@sonney2kheiko, one would need to have specific features for that though that don't exist just yet10:54
@sonney2k:10:55
heikospecific features?10:55
@sonney2kNot sure how to call them... Features that take a feature object as input and then just access vectors (etc) as subset10:56
@sonney2kor alternatively the feature framework needs to be extended such that one can set a permutation10:56
heikoah ok I understand10:57
@sonney2kotherwise it will not be transparent to the learning algorithm on which features to train / predict on10:57
heikodo you think the permutation should be done in the feature framework?10:57
heikowouldnt it be better to have like a new class that extracts subsets from a feature object?10:58
@sonney2kheiko, the problem then is that you have to write it for all kinds of features, like stringfeatures, sparse,simple(aka dense) ....10:58
heikoI will have a look, one minute10:59
blackburnsomeone forgot to integrate CirculateKernel in Kernel.i :D10:59
@sonney2kblackburn, these bastards ;-)10:59
blackburn2 kernels left11:00
blackburnall but Exponential and Circular are (at least) working11:00
@sonney2kpretty good11:02
blackburnbut now working11:02
blackburnahhahah11:03
blackburnInverseMultiquadric11:03
blackburnoh, it seems to be 1/InverseMultiquadric11:03
blackburnsonney2k: you now have really exotic kernels :D11:03
blackburnsonney2k: may I commit all tests in one?11:04
@sonney2kyes11:05
@sonney2khttp://www.shogun-toolbox.org/doc/classshogun_1_1CFeatures.html11:05
@sonney2kheiko, ^^11:06
heikoalready there :)11:06
heikoI see, the actual data is stored in the subclasses11:07
@sonney2kheiko, yes.11:07
-!- skydiver [c255a037@gateway/web/freenode/ip.194.85.160.55] has quit [Quit: Page closed]11:08
heikoso either an extractor class for each feature type, or one for all, or extend the feature classes11:08
blackburnsonney2k: could you say some example of string kernel (need now some template to test heiko's kernel)?11:09
heikoyou are testing my kernel? :)11:09
blackburnheiko: I'm writing tests for all new kernels now11:09
blackburnincluding yours :)11:10
heikonice11:10
@sonney2kblackburn, I am not sure what heikos kernel does (does it work on DNA?) if so look at the weighteddegreekernel11:10
@sonney2kheiko^^?11:10
blackburnsonney2k: thank you11:10
heikoit does work on any kind of strings11:10
heikoalphabet does not play a role11:10
heikoalso on histograms of SIFT-features in computer-vision :)11:11
@sonney2kok then blackburn will work11:11
@sonney2kheh11:11
blackburnjust getting rush with need of study numerical methods and want to finish it earlier :)11:11
blackburnheiko: you forgot to integrate your kernel in Kernel.i, I will do it now11:12
heikooh did not know, thanks11:12
@sonney2kheiko, I have one more idea regarding feature splitting11:14
heikoyes?11:14
blackburnheiko: what should I use as delta and theta?11:14
@sonney2kpreprocs11:14
blackburnwill (1, 2) and (2, 1) be good?11:14
@sonney2kheiko, forget it - preprocs cannot change the index11:15
heikoyes, but then the kernel will not perform well :)11:15
blackburnjust say proper :)11:15
@sonney2k:)11:15
heikoblackburn: 5 and 5 is ok, thsi strongly depends on the data11:15
blackburnokay11:16
blackburn5,5 and 6,6, is it ok?11:16
heikosonney2k, does the index really have to be changed, i mean we are talking about sets11:16
heikoblackburn, yes, but as said, depends on data, these are legal at least :)11:16
@sonney2kheiko, so I don't see any other option than extending dotfeatures/combinedfeatures/Stringfeatures11:16
blackburnheiko: thank you11:17
@sonney2kheiko, the problem is that you want to e.g. train an SVM on a certain subset of the data.11:17
heikoblackburn, ah one think i forgot, do not set these lasrger than your word lengths11:17
@sonney2khow does the svm know what this subset is?11:17
@sonney2kit cannot...11:17
heikoyes I understand11:18
@sonney2kI mean there are only 2 options: tell the learning machine which data indices to use or the learnign machine always uses all features11:18
heikoI always just generated cunstom kernel matrices and extracted the corresponding lines, but the matrices were precomputed :)11:18
@sonney2kthen one needs data splits11:19
@sonney2kgreat example showing that this idea won't work for custom kernels11:19
@sonney2khowever, I guess when you craft a learner that does not need training examples then one simply cannot do any data splitting anyways11:21
blackburnoh ich bin mÃŒde with all these examples11:21
@sonney2kblackburn, daway daway rabotatch :)11:21
@bettyboolol ;D11:21
heiko;)11:21
blackburnsonney2k: da rabotayu ya11:21
blackburnsonney2k: vse uzhe dodelal pochti11:22
blackburnsonney2k: and do you saying you forgot russian?! :)11:22
@sonney2ksonney2k, one would then probably use different kernels as parameter inputs11:22
blackburnDONE11:22
@sonney2kI cannot read latin written russian11:22
blackburnа такПй ЌПжешь?11:23
@sonney2kbut I cannot produce these cyrillic letters though11:23
heikobrb11:24
@sonney2kblackburn, yes11:24
@sonney2kblackburn, thanks11:24
@sonney2kI will now catch the train to go to work11:24
blackburnsonney2k: thanks for ..?11:24
@sonney2kfor the kernel examples11:24
blackburnaha, I will do a pull request11:24
@sonney2kit is not a good idea to start work early and then arrive late at the place where you are supposed to work.11:25
blackburn:)11:25
@sonney2kanyway l8r11:26
blackburnsee you11:26
heikore11:27
heikogood ride11:27
heikohey blackburn, does the current git compile for you?11:29
blackburnheiko: seems so, but I don't pretty sure I have all of these last commits11:30
blackburnI'll commit my tests and check it11:30
heikomy python_modular does not compile11:30
blackburnwhat is the error?11:31
heikono rule to create target  Classifier.h, needed by Classifier_wrap.cxx11:32
blackburneh11:32
blackburndo you have classifier/Classifier.h file?11:32
blackburnyou could do11:33
heikoi have one11:33
blackburngit checkout libshogun/classifier/Classifier.h11:33
blackburnhmm11:33
blackburnok, I have up-to-date repo and it is compiling for me11:34
blackburnwith python_modular enabled11:34
heikommh11:34
heikoI have problems with git anyway, perhaps that is the reason11:35
heikogit fetch upstream just does not upste my local copy11:35
blackburnah11:35
blackburndid you done11:35
blackburngit merge upstream/master11:35
blackburn?11:35
heikoyes11:36
heikosais already up to date11:36
blackburnhehe11:36
blackburnstrange11:36
heikoyes11:36
heikoindeed11:36
heikohow do you checkout the repo?11:37
heikogit clone git@github.com:karlnapf/shogun.git11:38
heikogit remote add upstream git clone git@github.com:karlnapf/shogun.git11:38
blackburnehhh11:38
heikoi mean git remote add upstream git clone git@github.com:karlnapf/shogun.git11:38
blackburnwhat did you do with this line?11:38
heikoargh11:38
heikogit remote add upstream git@github.com:karlnapf/shogun.git11:38
blackburnyou should git remote add upstream https://github.com/shogun-toolbox/shogun.git11:38
heikommh got the line from github ... i will try11:39
heikoa man, there we go11:40
blackburnwith that line you add remote for main repo11:40
heikoi added my fork as remote11:40
blackburnaha11:40
blackburnyou did11:40
heikowell , thanks :)11:40
blackburnnp :)11:40
@bettybooblackburn, he11:40
blackburnbettyboo: make doc11:40
@bettybooblackburn: over 60 students applied so far - there is now way to make a fair ranking - and we seriously cannot handly more than 10 of you :`-(11:40
blackburnbettyboo: will I go thru?11:41
@bettybooblackburn: however, I guess when you craft a learner that does not need training examples then one simply cannot do any data splitting anyways11:41
blackburnbettyboo: do you think so?11:41
@bettybooblackburn: why do you think so?11:41
blackburnhehehe11:41
heikohey bettyboo, are you also a mentor ?11:46
@bettybooheiko: since you have so many people interested, you could organize a mentoring project outside the GSoC framework11:46
heikowhat?11:47
heikosince I have so many people interested?11:47
blackburnheiko: it is a bot :)11:47
heikooh, hehe :)11:48
heikoit just succeded the turing test :D11:48
heikobettyboo, what is your purpose?11:48
@bettybooheiko: @sonney2k well the reason after scanning thru wat i felt is because the sparsity DS which u have defined is strong11:48
blackburnbettyboo: SVM kNN sparse dense int main11:49
@bettybooblackburn: SVMs up until 97 from what I remember dealt with STRICTLY linear seperable data11:49
blackburnbettyboo: SVM data11:49
@bettybooblackburn: the problem is that you want to e.g. train an SVM on a certain subset of the data.11:49
@bettyboosure11:49
heikobettyboo: svms are smarter than you11:49
@bettybooheiko: you dare killing our CIA bot!11:49
-!- heiko [~heiko@infole-06.uni-duisburg.de] has quit [Quit: Leaving.]11:52
-!- heiko [~heiko@infole-06.uni-duisburg.de] has joined #shogun11:52
@sonney2kSeems like slot notifications are available too bad I don't have access to melange right now.11:56
blackburncrap!11:56
blackburnsonney2k: when will you have access to it? :)11:58
@sonney2kI am just reading on # gsoc that other new orgs got 2 slots though they had 125 applications11:59
heikooh mann12:00
blackburnfck12:00
siddharthzzzz....sad12:01
siddharthmaybe they had less mentors12:02
@sonney2k5 slots12:02
@sonney2kpuh at least12:02
blackburnsonney2k: you have 5 slots?12:02
@sonney2kbetter than 2 right - seems to be an exceptionally high number for new orgs12:03
@sonney2kyes12:03
blackburnoh12:03
blackburnsonney2k: IIRC you may request some more if you want12:03
blackburn:D12:03
siddharthyeah :P12:03
heikohehe :)12:04
@sonney2kis 7% acceptance rate not a good deal o_O ?12:04
blackburnпПехалО12:05
blackburn:D12:05
@sonney2kwith only 5 slots to choose from we have to be dead sure that the students that get it are 100% successful. If they are we might get more slots next year12:05
@sonney2kI guess that is what google wants12:06
* blackburn mind what to do to get higher :)12:07
* siddharth will have to finish SGD-QN fast :P12:08
blackburnsonney2k: is it a good practice to start working on our projects12:09
blackburn?12:09
@sonney2knext monday the mentors will have a phone conf deciding about candidates12:09
blackburnit seems no, but asking :)12:09
@sonney2kblackburn, it is a good thing if you intend to finish a usuable chunk of it no matter whether you get a slot from google or not12:10
blackburnsonney2k: okay, will not do it, better improve something12:11
@sonney2kin the end google is doing this to get long term contributors into the projects12:11
@sonney2kblackburn, have a look at performance measures if you like12:11
@sonney2kthe computation of the auRPC is highly suboptimal12:11
blackburnsonney2k: okay, will do it if have sufficient time, thank you12:11
heikoany other suboptimal stuff ? :)12:12
@sonney2kheiko, still there?12:12
@bettyboohihi12:12
heikoyes12:12
@sonney2kHow about doing the subset thing?12:12
heikoyes thought of it.12:12
heikofor every feature class?12:12
siddharthsonney2k, and should I work on SGD-QN?12:12
blackburnsonney2k: will you say something about us after that next monday?12:12
blackburnor we have to wait until 25, April?12:13
@sonney2kheiko, I think you can put the code for that in CFeatures and then just call it from whatever sub-class.12:13
@sonney2kwe were asked to to say anything.12:14
blackburnto not*?12:14
@sonney2kit also depends whether any of you applied at some other organization12:14
@sonney2kto not12:14
@sonney2kyes12:14
blackburnokay12:14
blackburnit will be awful week :D12:14
@sonney2kheiko, I think you only need an int32_t* subset array and implement get/set function for that12:15
heikolike12:16
heikoget_feature_subset(int32_t* inds) ?12:16
@sonney2kheiko, and then whenever sth calls get_feature_vector() do the subset magic / change the number of available vectors virtually and give an warning when someone wants to access the whole feature matrix12:16
heikook12:17
@sonney2kset_feature_subset(int32_t* inds, int32_t num_inds);12:17
@sonney2ksiddharth, yes of course12:18
blackburnsonney2k: have to go now, tests are in my pull request :)12:19
blackburnsee you12:19
@sonney2kblackburn, thanks will look at them12:20
heikosonney2k: thanks for the tipp, I will start now and probably bother you again in the next time :)12:20
@bettyboo:>12:20
heikoblackburn, bye12:20
-!- blackburn [~qdrgsm@188.168.4.116] has quit [Quit: Leaving.]12:20
@sonney2kthe further we see that you have a plan the more likely you will be in...12:20
* siddharth speeding up things12:21
siddharthsonney2k, can you tell what does Fvector and Svector refer to in this code?12:38
siddharthFvector==feature vector?12:38
@sonney2knot sure12:39
@sonney2kmaybe dense and sparse feature vector?12:39
@sonney2ksiddharth, antoine should know12:39
@sonney2kask him12:39
siddharthok...where can I find his email address?12:39
@sonney2kthere is a link to his homepage on the ideas list12:40
siddharthya got it12:41
siddharthhe has protected his email from spam :P12:41
-!- akhil_ [75d35896@gateway/web/freenode/ip.117.211.88.150] has joined #shogun12:57
heikosonney2k: are you there?13:07
heikocurrently changing StringFeatures. beside get_feature_vector, also all other functions related to feature vectors have to be changed, right13:09
heiko(get_num_vetors, get_max_vector_length ...)13:09
heikoi think its best to define get-num_vectors like this and add a new virtual method:13:14
heikovirtual int32_t get_num_vectors() { return subset_inds == NULL ? get_num_vectors_all() : num_subset_inds; }13:14
heikovirtual int32_t get_num_vectors_all()=0;, which has to implemented in all feature classes13:14
heikothe question is what should happen when set_feature_subset is called13:42
heikoshould ALL following actions be working on the subset? Or only get_feature_vector13:42
heikoto name a few: get_transposed, get_features, copy_features, set_feature_vector ...13:43
heikoI think they should ALL be applied to the subset13:43
heikothen the subset option may be removed with reset_feature_subset13:43
heikowhat do you think?13:43
heikohowever, certain functions, like cleanup() do have to stay the same (work on all features, not only on the subset)13:46
heikook i stop spaming now :)13:47
@sonney2kheiko, yes you are right. I would for now add SG_NOTIMPLEMENTED; for functions other then get_feature_vector()/ free_feature_vector() when the index array is set.14:08
CIA-8shogun: Sergey Lisitsyn master * r42ec6c9 / (2 files): Fixed ExponentialKernel - http://bit.ly/i3Ktma14:26
CIA-8shogun: Sergey Lisitsyn master * rcce0b5d / (14 files): Added python_modular examples for kernels introduced earlier - http://bit.ly/fgrUmi14:26
CIA-8shogun: Sergey Lisitsyn master * re805a2c / src/modular/Kernel.i : Integrated some kernels to Kernel.i - http://bit.ly/gGmyTd14:26
CIA-8shogun: Soeren Sonnenburg master * r1371376 / (2 files): rename width to m_width too - http://bit.ly/fYsv1u14:26
CIA-8shogun: Soeren Sonnenburg master * r8725706 / (2 files): Draft a histogram function. - http://bit.ly/gaFY4y14:26
-!- yayo3 [9320e890@gateway/web/freenode/ip.147.32.232.144] has joined #shogun14:37
-!- skydiver [4deac315@gateway/web/freenode/ip.77.234.195.21] has joined #shogun14:58
-!- yayo3 [9320e890@gateway/web/freenode/ip.147.32.232.144] has quit [Quit: Page closed]15:03
-!- skydiver [4deac315@gateway/web/freenode/ip.77.234.195.21] has quit [Ping timeout: 252 seconds]15:24
-!- gxr_ [c07c1afa@gateway/web/freenode/ip.192.124.26.250] has joined #shogun17:17
gxr_#topic17:18
-!- gxr_ [c07c1afa@gateway/web/freenode/ip.192.124.26.250] has quit [Client Quit]17:18
-!- akhil_ [75d35896@gateway/web/freenode/ip.117.211.88.150] has quit [Quit: Page closed]17:27
-!- blackburn [~qdrgsm@188.168.5.124] has joined #shogun17:39
blackburnhi, how it's going there?17:39
-!- siddharth [~siddharth@117.211.88.150] has quit [Remote host closed the connection]17:54
-!- siddharth [~siddharth@117.211.88.150] has joined #shogun17:59
-!- ChanServ changed the topic of #shogun to: Shogun Machine Learning Toolbox | We have been accepted for GSoC 2011 with 5 slots | GSoC Timeline http://bit.ly/gy7Pdi | This channel is logged.18:18
-!- heiko [~heiko@infole-06.uni-duisburg.de] has left #shogun []18:45
-!- yayo3 [~jakub@ip-78-45-113-245.net.upcbroadband.cz] has joined #shogun19:17
* serialhex watches a tumbleweed pass by20:10
-!- skydiver [c255a037@gateway/web/freenode/ip.194.85.160.55] has joined #shogun20:33
@sonney2k:)21:09
blackburnsonney2k: is there something new? :)21:11
blackburnsonney2k: now looking on PerfomanceMeasures, it's so intricated!21:12
@sonney2kblackburn, yes it is unnecessarily complex21:15
@sonney2kblackburn, new in what sense?21:15
blackburnsonney2k: gsoc :) or there will not be any changes?21:16
blackburnnot only about slots, but something another may be21:16
@sonney2kno - there won't be any news for a while I suspect21:17
serialhexwell fie on them!!!  fie on google for not giving us information!!! fie fie fie!!!21:18
@sonney2ksorry, but what more information do you need?21:21
@sonney2k ^^ #topic21:21
blackburnmay be they will say about the matter of life21:22
blackburnor what happened with elvis21:23
@sonney2klook we are really lucky to get 5 slots - other new orgs wit even more applications got just 221:23
alesis-noviksonney2k, is there code in shogun for calculating means and covariances? Because that could be in the training for my Gaussian thing21:23
blackburnsonney2k: we are the lucky one :)21:24
@sonney2kalesis-novik, partially in CMath::mean ... please put CMath::cov there too21:24
alesis-novikthen training for it would be just computing the mean and covariance, because that's what the parameters really are21:25
@sonney2kalesis-novik, yes - makes a lot of sense :)21:27
@bettyboo:>21:27
@sonney2kahh betty is back too21:27
@bettyboosonney2k: betty is always there for us :)21:27
@sonney2kwell said21:27
alesis-novikbettyboo,  achieved singularity yet?21:28
@bettybooalesis-novik: ah, joking, see now :)21:28
yayo3what does "fie" stand for?21:29
blackburnwe need to do something with regression21:30
yayo3or agression21:31
@sonney2kyayo3, fie?21:31
blackburnor clustering21:31
blackburnwe have strict framework only for classifying21:31
yayo3sonney2k: serialhex said that21:32
@sonney2kblackburn, yes the naming is confusing21:32
yayo3blackburn: well there's at least clustering interface. not a regression one I think21:32
@sonney2kand I don't want to derive from more than one base class either21:32
blackburnyayo3: there is no clustering interface21:33
serialhexi dont know what 'fie' really means, i just use it to cuss at people when i'm trying not to cuss :P21:33
@sonney2kthe regression one is derived from classifier - it jus uses the same names21:33
yayo3also it would be nice to have ProbabilisticClassifier (or something) that has methods for returning class probabilities21:34
blackburnsonney2k: now thinking about dimreduction domain21:34
serialhexyayo3: fie21:34
serialhex(archaic) Used to express distaste, disgust, or outrage.21:34
serialhexFie upon you, you devilish fool!21:34
yayo3that's important to some applications. however, since I was pretty sure I say clustering interface, it's possible I just missed it21:34
blackburnsonney2k: do you like the way it used now?21:34
@sonney2kblackburn, no I don't21:35
@sonney2kshogun all started with a HMM I implemented a decade back and then an SVM21:35
@sonney2kso only distributions and classifier are somehow well represented21:35
@sonney2kbut there are problems like: we have CDistanceMachine and CKernelMachine - IIRC they both derive from classifier21:36
blackburnsonney2k: aha, so we will think about refactoring it21:36
@sonney2kblackburn, I am very open for suggestions21:36
@sonney2kit is not as easy though21:36
blackburnsonney2k: the only thing I know should stay - it should be preproc21:37
blackburnor not? :D21:37
@bettyboo;D21:37
yayo3that's hard21:37
@sonney2kfor example if you have a kernel machine and you derive a SVM from it - kernel machine has to be derived from classifier21:37
yayo3one method can be probabilistic and nonprobabilistic, kernel and regresion at the same time21:38
@sonney2kat the same time support vector regression / kPCA etc should derive from kernel machine21:38
@sonney2kyes that is the problem21:38
blackburnsonney2k: eh, do you think it is possible?21:38
blackburnsonney2k: I see no way to do it without multiple inheritance21:39
yayo3yeah21:39
@sonney2kblackburn, well I don't know - but I don't want multiple inheritance in shogun21:39
@sonney2kthat is creating all sorts of other problems21:39
yayo3if the dependencies are not a tree in real world, it's pretty much impossible to make model without them21:40
alesis-novikmultiple inheritance is really difficult to manage properly in C++21:40
@sonney2kwell we could do one thing: we could name it in a more generic way21:41
@sonney2ksay Method21:41
@sonney2kand then KernelMethod21:41
@sonney2kor Machine and KernelMachine etc21:41
@sonney2kthen it does not need to be named Classifier21:41
@sonney2kor Regressor or Clusterer or so21:42
alesis-novikWell, someone will probably be implementing EM, so they will have to deal with clustering21:42
@sonney2kthen we name the train function  train()  / apply() very generally21:42
blackburnKernelMultipleKungFuClassifierMachine21:42
@sonney2kblackburn, your proposed name is much preferred :D21:43
@bettyboosonney2k, funny21:43
blackburnsonney2k: and we have one more proposal21:43
@sonney2kbettyboo, these Russians are at times21:43
@bettyboosonney2k: And sometimes I like more explanation and less brevity21:43
blackburnwe should rename all constructors to Lenin21:43
alesis-novikor we could go for the yesterday discussed one21:43
alesis-novikblackburn, yes21:43
blackburnand destructors for Stalin21:43
yayo3what about something similar to Java interfaces?21:43
blackburnsonney2k: if you will agree, we will start just now21:44
alesis-novikThe general class would be CCCP, other things (like Ukraine or Belarus) would derive from it21:44
yayo3you can use inheritance as a way to mark what the method provides, but inherit no code21:44
blackburnsonney2k: I pretty sure agree with yayo3 (being acknowledged with java se and ee too) :)21:45
blackburnKernel interface could derive only some compute functions and there could not be any problem with multiple inheritance21:46
yayo3say, can it do regression? inherit Regressor. can it do classification, inherit Classifier21:46
blackburnyayo3: why we should?21:46
* sonney2k is waiting for the finalized proposal21:47
yayo3blackburn: well it's generally non crazy way of using multiple inheritace21:47
alesis-novikwell interfaces in Java are something like abstract classes with no fields in cpp21:47
blackburnyayo3: I mean I don't know why we should discriminate regression/classifying/etc21:47
yayo3blackburn: not discriminante21:48
blackburnsonney2k: the final proposal is to rename the whole shogun21:48
serialhexit almost sounds like modules in ruby...21:48
yayo3blackburn: it can do both and therefore inherit both.21:48
blackburnyayo3: we could use 'interfaces' only for kernel or etc21:48
alesis-novikWell, if you start doing it differently for different parts of shogun, it will get messy and hard to get in to21:49
* serialhex just noticed that shogun got 5 slots *SQUEE*21:49
yayo3or that. also I think it would be nice to have the inheritance tree as flat as possible21:49
blackburnwe have to do some umls :D21:49
* sonney2k dies21:50
serialhexumls??21:50
* serialhex is lost21:50
alesis-novikUMLs21:50
blackburnyeap21:50
* serialhex is googling21:50
alesis-novikunified modelling language, was it?21:51
blackburnsonney2k: we demand stalin and UMLs!21:51
blackburnalesis-novik: yeap, I mean the class diagram could be more impressive than words :)21:51
@bettybooblackburn, ;>21:51
yayo3I'll make a quick example of what I have in mind and drop a link21:51
alesis-novikblackburn, I was just remembering what UML actually stands for :)21:51
serialhexooh, umls look like fun! (NOT!!)21:51
* sonney2k feels like being in stalingrad a few decades back...21:52
blackburnjust be happy that I not proposed to use Rational Unified Process :D21:52
serialhexlol21:52
alesis-novikI'm a bad software engineer, if I'm required to present UMLs I first code everything and then generate UMLs from it21:52
yayo3alesis-novik: doesn't that mean you're good software engineer? :)21:52
alesis-novikthat means I'm a (good or not) coder21:53
blackburnI'm pretty sure that UMLs could be created prior :)21:53
alesis-novikThe whole engineering process of designing architectures is bleh for me21:53
blackburnsome of the UMLs doesn't make sense with code, state diagrams, etc21:54
alesis-novikyou SHOULD create UMLs prior in theory, I just can't be bothered to21:54
serialhexumm... i usually just make scribbly notes on a pice of coffe-staned paper...with bubbles and arrows and notes on how it should be done21:54
blackburnalesis-novik: yeap it ain't so funny at all :)21:54
serialhex...tho i've lost thoes sheets of paper more often then not :P21:55
blackburnserialhex: oh, I like to see that kind of scheme for some boring difficult tangled enterprise level system :D21:55
alesis-novikI had a course of software engineering in my undergrad where we basically had to come up with an architecture for a business21:55
serialhexOOH!!!!  i'd need lots of markers and posterboard!!!21:55
alesis-novikOne of the most boring courses for me21:56
serialhexalesis-novik, i can imagine!21:56
blackburnand balloons21:56
blackburnyou probably will need balloons21:56
serialhexOOH!!! I GET TO USE REAL BALLOONS!?!?!??21:56
alesis-novikI like coding and thinking about algorithms, not thinking about SOA and stuff like that21:56
blackburnalesis-novik: for some reasons software engineering is important too. may be for some things we discussed earlier :)21:57
serialhexok, i'm gonna need some big ones, some little ones, and some of those that the clowns use to make dogs and hats with! :P21:57
@bettybooserialhex, yep21:57
alesis-novikNo, I know it's important, it's just not something I enjoy21:57
blackburnserialhex: and always use stalin21:58
blackburnI have to stop joking about stalin :D21:58
serialhexyeah, thats going to be the base class of 'all-that-is-evil-and-wrong-with-this-world'21:58
blackburnserialhex: nope, just destruction scheme :D21:59
blackburn'here,guys, we will use stalin for garbage collecting'21:59
serialhexwell evil things usually destroy stuff... but they also screw it up in the process21:59
serialhex:P21:59
@bettyboo;D21:59
blackburnsonney2k: what about renaming shogun?22:00
blackburnwe really demand it!22:00
serialhexso i'v noticed that bettyboo's responses have gotten much better recently22:00
@bettybooserialhex: I've only gotten around to reading the paper on it and trying to understand the maths22:00
@sonney2kblackburn, into bettyboo?22:00
@bettyboosonney2k: nope22:00
serialhexHAHAHAHAHA!!!!!!22:00
@sonney2khah!22:00
blackburn^^22:00
blackburndamn bingo22:00
serialhexOMFG THAT WAS AWSOME!!!!22:00
alesis-novikyet disturbing22:01
serialhexwhoever coded her needs a raise!22:01
@sonney2kso blackburn anything against renaming Classifier to Machine?22:01
alesis-novikHail bettyboo - future ruler of mankind22:01
@sonney2kand classify to apply()22:01
@bettybooalesis-novik: says april 25 / april 22 is a conflict resolution22:01
@sonney2k?22:01
serialhexlol22:01
yayo3sonney2k: I'm working on example of the stuff we talked about earlier22:01
blackburnsonney2k: I'm pretty sure I will have no difference :D22:01
@sonney2kblackburn, ?22:02
blackburnsonney2k: I mean there is no such difference22:02
@sonney2kyou mean it does not matter? or ?22:03
blackburnyeap, that is my intricated mind produce, I think it doesn't matter22:03
blackburnsonney2k: you just want to make scheme train() / apply(), right?22:04
yayo3that still doesn't make sense in clustering22:05
blackburnyayo3: we could just use apply()22:05
CIA-8shogun: Soeren Sonnenburg master * rf6f47f3 / src/libshogun/features/SNPFeatures.cpp : Get histogram to reliably work in SNPFeatures - http://bit.ly/hEex7c22:05
@sonney2khttp://www.shogun-toolbox.org/doc/classshogun_1_1CClassifier.html22:05
blackburnsonney2k: why classifier should be machine?22:06
alesis-novikI feel machine would be less intuitive22:06
@sonney2kif you look at this you will see that what is called classifier now is just a general Method or Machine22:06
blackburnsonney2k: it is, but why machine? what should it change?22:07
@sonney2kbecause it currently is a classifer/ clustering method / regression method22:07
blackburnah22:07
@sonney2kit woudl fix the confusion that people have22:07
blackburnsonney2k: I prefer supervised/unsupervised22:07
blackburnbut it will have own problems22:07
@sonney2ksemisupervised :)22:08
@sonney2kwhat problems?22:08
serialhexhow do those algorithms work??22:08
blackburnsonney2k: for example multiple inheritance again22:08
@sonney2kblackburn, ahh kernelmachine can be supervised or unsupervised22:09
@sonney2kexactly22:09
@sonney2kSo Machine + apply is the only thing so far that is name wise not confusing22:09
blackburnsonney2k: we really should to draw scheme22:09
serialhexwhy not supervised, unsupervised & bi_supervised??22:09
alesis-noviksemisupervised is having some data labelled and some labelled as far as I know22:10
alesis-novikunlabelledŪ22:10
serialhexi got ya the first time alesis-novik22:10
blackburnsonney2k: so, don't you like multiple inheritance in shogun at all? no way?22:11
@sonney2kblackburn, no way22:11
yayo3there's probably a way to make it non-crazy22:11
@sonney2kbig problems with templates, diamonds and possibly swig wrapped interfaces22:11
alesis-noviksemi-supervised methods seem to be a good path to research in ML, because labelling everything is expensive22:12
blackburnsad :) we could realize some of these ideas with inheritance22:12
serialhexand while i dont know much about how most of this stuff works (in practice) why not have a superclass for both supervised & unsupervised... and anything else (like semi-supervised)22:12
yayo3well, the "java-like inheritace" example is here: https://gist.github.com/91830222:12
@sonney2kmy suggestion would be to keep things as they are for the time being and at the end of the summer when you are all experts do a reasonable renaming22:12
alesis-novikI can see the possible problems sonney2k, things like this should be thought of when designing the thing, so if the decision was no multiple inheritance, I can see how that might cause problems22:13
serialhexyayo3: that looks interesting22:13
blackburnsonney2k: nice plan, and we have to do some schemes, no way to understand all the details in code22:13
yayo3serialhex: that's the java way. thich can be good or bad22:13
yayo3which*22:13
blackburnyayo3: sure it good :D22:14
@bettybooyep22:14
blackburnbetty agrees22:14
@bettybooblackburn: Getting comfortable with ML concepts most definitely does not happen overnight.22:14
serialhexyayo3: yeah, ruby could do this in a different way, using modules & mixins instead of multiple inheritance (as there is none of that in ruby)22:14
blackburnoh, sorry, I thought i will learn everything this night :(22:14
* serialhex is getting afraid of bettyboo's learning rate...22:15
serialhexby the end of the summer we might just ask her for a few suggestions :P22:15
yayo3now that's some machine learning :P22:15
@bettybooyeah ;D22:15
blackburnbettyboo: make examples22:16
serialhexsee what i mean?22:16
@bettybooblackburn: addpath('../examples/documented/matlab/graphical')22:16
serialhexbettyboo: make vodka22:16
@bettybooserialhex: you just want to make scheme train() / apply(), right?22:16
serialhexno no not that bettyboo!!  make vodka!22:16
@bettybooserialhex: just see your email. yes, makes perfectly sense. I tend to even say that the dim reduction techniques open up a new field inside shogun - so I would even consider this proposal to be stronger22:16
blackburnbettyboo: like vodka?22:16
@bettybooblackburn: likes :"Features.i:303: Warning 453: Can't apply (shogun::TSparse< uint64_t > **ARGOUT_SPARSE,int32_t *DIM1,int32_t *DIM2,int64_t *NNZ). No typemaps are defined.22:16
blackburnbettyboo: make doc and vodka22:17
@bettybooblackburn: over 60 students applied so far - there is now way to make a fair ranking - and we seriously cannot handly more than 10 of you :`-(22:17
blackburnbettyboo: vodka22:17
@bettybooblackburn: we should celebrate what do you want?? vodka? yager? beer??22:17
blackburnJUST SAY IT!22:17
serialhex...she's hiding her true intelligence from us...22:17
serialhexshe knows we know... lets just hope she's a benevolent ruler22:18
blackburn"just see your email. yes, makes perfectly sense. I tend to even say that the dim reduction techniques open up a new field inside shogun - so I would even consider this proposal to be stronger"22:18
blackburnhehe have I some competitors?22:18
@sonney2kblackburn, one more pressing design decision is how to properly do a cross-validation framework in shogun.  Is data a parameter or not, how can I split up the data into several parts (other than just hacking everything into the feature base classes) and what can we do when there are no training data (just a distance or kernel matrix is available) etc...22:18
blackburnsonney2k: oh, you wrote me a letter :)22:18
yayo3sonney2k: is there any serious code reuse from inheritance now?22:18
blackburnsonney2k: about CV you will have a huuuuuuge problems with design22:19
serialhexyayo3: it seems to be the case from looking at the inheritnce pictures & functions in the shogun docs22:19
@sonney2kyayo3, sure there is22:19
@sonney2kyayo3, which letter?22:20
@sonney2kblackburn, in which way?22:20
blackburnsonney2k: in which way what?22:20
@sonney2kproblem with design22:20
@sonney2kfor CV?22:21
blackburnin the way you described, you will have to think hardly to do it with 'beauty22:21
yayo3sonney2k: so that would make "don't inherit implementation, only interfaces" way rather nonworking, right22:21
yayo3sonney2k: and I never got any letter :(22:21
@sonney2kblackburn, yes that is true22:21
@sonney2kblackburn, which letter?22:21
@sonney2kyayo3, sorry was supposed for blackburn22:21
blackburnsonney2k: just copied bettyboo answer :)22:22
@bettybooblackburn: answering*22:22
@bettybooyep!22:22
blackburnjust interested to whom it was addressed because want to work on dim.reduction :)22:22
yayo3the problem with CV is that many methods themselves split data into training and testing data.22:23
alesis-noviksonney2k, you also need to shuffle the training data, so you the whole CV would be a beast22:23
yayo3well, one of the problems with CV anyway.22:23
@sonney2kyayo3, none here - all methods in shogun just get one fixed data set to operate on22:24
@sonney2kalesis-novik, exactly22:24
@sonney2kso I think the most reasonable approach is to store some extra index array in the features22:24
@sonney2kthen look up which indices to use22:24
blackburnsonney2k: you really have to make some diagrams when discussing design, it would be easier _a lot_22:24
alesis-noviksonney2k, I think that would safe space and time for large dimensionality datasets22:25
yayo3some CV implementations just take random data. generates random indices and takes data from them22:27
yayo3or I think I saw it somewhere22:27
blackburnyayo3: anyway you have to remember what you chose22:27
@sonney2kblackburn, probably.22:27
@sonney2kalesis-novik, that is the intention22:28
alesis-novikyayo3, how would random data make sense when training? Unless it's generated from a specified distribution22:28
blackburnalesis-novik: in CV makes22:28
@sonney2kthe problem is of course a) it makes the code difficult to read (indirect access in all the feature functions) and b) it is intrusive and has to be done in all feature classes22:29
blackburnalesis-novik: sorry, not random data, random indices makes22:29
alesis-novikblackburn, random sampling of data or the actual random data?22:29
@sonney2kheiko drafted that btw already https://github.com/shogun-toolbox/shogun/pull/34/files22:29
alesis-novikblackburn, random sampling makes sense and is encouraged, I haven't heard about random data CV22:31
@sonney2kahh and forgot to say - one needs to a subset of the data too so an artificial limit of the available data22:31
blackburnalesis-novik: not understood you right, random data CV doesn't make sense22:31
alesis-novikblackburn, that's why I was asking what yayo3 meant22:32
@sonney2kI wish this could be done without hacking up each feature class separetely22:32
blackburnalesis-novik: aha, he meant sampling22:32
yayo3yeah, sampling. sorry for being obtuse22:33
@sonney2kbut an extra IndirectIndexFeature class won't work - the basic feature class has no notion of get_vector or so22:33
alesis-noviksonney2k, you can always create a new class and just take the data from CFeatures, question is the overhead and general usefulness22:33
@sonney2kalesis-novik, yes but all the algorithms need to continue to work without knowing that they operate on a subset of the data22:34
alesis-noviksonney2k, so there could be a dedicated class/algorithm for splitting data into CV folds I worry about the overhead though22:36
yayo3that's hard. reshuffling features whould be easier, but you'd need much more memory22:36
@sonney2knot something I'd like to have... I am not so rarely training on datasets that rarely once fit in memory22:37
yayo3might be good idea to do both. indirect features and creating new "classic" features, because what's faster depends on availible memory and other things22:37
@sonney2kohh well...22:39
* sonney2k reviews this patch and hopes that does not ruin all the readability in shoguns features22:39
* blackburn wonders why all are discussing one specific 'project'22:42
alesis-novikblackburn, what project did you apply to (I forget theses things)22:43
blackburnalesis-novik: dim reduction22:43
alesis-novikgood :D22:43
blackburnLLE, MDS, ISOMAP and SNE22:44
blackburn(as I hope) :)22:44
alesis-novikActually, I'm trying to (after I'm done with the Gaussian patch) get the PCA and kPCA to work22:45
alesis-novikBach in a bit22:45
@sonney2kblackburn, KFA?22:46
@mlsec Nested CV would also be cool22:47
@sonney2kheh22:55
@sonney2kseems like heiko has some good plans for that....22:56
-!- skydiver [c255a037@gateway/web/freenode/ip.194.85.160.55] has quit [Quit: Page closed]22:56
yayo3hmm. the .mainpage files are generated, right?22:56
@sonney2kyayo3, some of them yes23:05
yayo3they're pretty annoying :)23:06
yayo3I sent a little pull request (really minor stuff, I should probably hold on them and send them when there's more)23:08
blackburnsonney2k: was away, sorry, not KFA23:08
yayo3and good night23:08
-!- yayo3 [~jakub@ip-78-45-113-245.net.upcbroadband.cz] has quit [Quit: leaving]23:09
blackburnsonney2k: thought you know my proposal :D23:09
@sonney2kyayo3, you mean because they are not in git ignore23:09
@sonney2ktoo late23:09
blackburnhe could make a commit, reverting some changes23:10
@sonney2kblackburn, now we can accept only 1/5 of the students we wanted so you would of course have to do twice as much work ;-)23:10
blackburnI did one when was a conflict with some header23:10
blackburnsonney2k: oh! do you think I'm doing not much work? :)23:11
@bettybooblackburn, ;D23:11
@sonney2kblackburn, is that a trap? shall I answer yes or no?23:12
blackburnnot a trap, just interesting :)23:12
@sonney2kthen I say no - lets see what your reactions are :D23:13
@bettyboo:>23:13
* sonney2k does not know if no means yes or yes means no.23:13
blackburnahahah23:13
blackburnso what is the answer?23:14
blackburnat least I'm trying to do all what I want/have to :) SergeyLisitsyn (32 commits, 1710 additions, 78 deletions)23:14
blackburnsonney2k (2191 commits, 665702 additions, 6929 deletions)23:14
blackburnoh, 665702 hehe23:14
blackburnbut really (not a trap or nearly), do you think I'm not doing much?23:15
@sonney2kblackburn, I don't complain - not at all23:15
blackburnhow polite it is :D23:16
@sonney2kI am very happy actually23:16
blackburnI just don't want to start working on dim reduction23:16
blackburnthat's why I am not doing some very 'useful'23:17
@sonney2kblackburn, I don't really understand why you don't want to start on dim reduction methods....23:19
blackburnjust like heiko or etc23:19
blackburnsonney2k: because I want to mind it a little more23:19
@sonney2kas long as you don't have to work for days to do some changes it is ok23:19
@sonney2kok23:19
@sonney2kmakes sense23:19
blackburnsonney2k: my proposal includes searching for ideas in up-to-date articles and etc23:20
blackburnthat is what I want to do in may23:20
@sonney2kheiko has probably chosen one of the most difficult parts - it will really take time and potentially partial rewrites to do it nicely23:21
blackburnbut if it will be a positive for me, I could start design of dim reduction there23:21
blackburnhehe there is a malloc in performancemeasures23:34
@sonney2kblackburn, there sometimes has to be to interact with external languages that assume malloc'd memory23:36
blackburnfloat64_t** det=(float64_t**) malloc(sizeof(float64_t**));23:37
@sonney2kblackburn, yeah - maybe at some point we rewirte the swig interfaces to use references instead of pointers23:40
* sonney2k yawns23:41
@sonney2koff to bed - cu all tomorrow23:42
blackburnleaving too, see you23:46
-!- blackburn [~qdrgsm@188.168.5.124] has quit [Quit: Leaving.]23:46
-!- alesis-novik [~alesis@188.74.87.84] has quit [Quit: I'll be Bach]23:53
--- Log closed Thu Apr 14 00:00:36 2011

Generated by irclog2html.py 2.10.0 by Marius Gedminas - find it at mg.pov.lt!