IRC logs of #shogun for Thursday, 2019-04-25

--- Log opened Thu Apr 25 00:00:48 2019
-!- besser82 [~besser82@fedora/besser82] has quit [Quit: Freedom, Friends, Features, First [fedoraproject.org]]00:35
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has quit [Remote host closed the connection]01:19
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has joined #shogun01:21
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has quit [Ping timeout: 276 seconds]01:26
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has joined #shogun01:58
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has quit [Ping timeout: 258 seconds]02:03
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has joined #shogun02:28
-!- Moatman [~Moatman@pool-96-255-151-151.washdc.fios.verizon.net] has joined #shogun02:31
-!- Moatman [~Moatman@pool-96-255-151-151.washdc.fios.verizon.net] has quit [Remote host closed the connection]02:33
-!- Moatman [~Moatman@pool-96-255-151-151.washdc.fios.verizon.net] has joined #shogun02:34
-!- Moatman [~Moatman@pool-96-255-151-151.washdc.fios.verizon.net] has quit [Remote host closed the connection]02:35
-!- Moatman [~Moatman@pool-96-255-151-151.washdc.fios.verizon.net] has joined #shogun02:37
-!- Moatman [~Moatman@pool-96-255-151-151.washdc.fios.verizon.net] has quit [Remote host closed the connection]02:38
-!- besser82 [~besser82@fedora/besser82] has joined #shogun07:18
-!- mode/#shogun [+o besser82] by ChanServ07:18
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has quit [Remote host closed the connection]08:07
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has joined #shogun08:30
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has quit [Remote host closed the connection]09:17
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has joined #shogun09:18
-!- gf712 [905208ce@gateway/web/freenode/ip.144.82.8.206] has joined #shogun09:43
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has quit [Remote host closed the connection]09:55
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has joined #shogun09:57
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has quit [Remote host closed the connection]10:06
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has joined #shogun10:13
-!- Taivhi303j [3121a75c@gateway/web/freenode/ip.49.33.167.92] has joined #shogun10:24
-!- Taivhi303j [3121a75c@gateway/web/freenode/ip.49.33.167.92] has quit [Client Quit]10:25
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has quit [Remote host closed the connection]10:32
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has joined #shogun10:37
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has quit [Ping timeout: 246 seconds]10:41
-!- wiking [~wiking@2001:67c:10ec:5784:8000::3ff] has joined #shogun10:47
-!- HeikoS [~heiko@158.pool85-48-187.static.orange.es] has joined #shogun11:03
-!- mode/#shogun [+o HeikoS] by ChanServ11:03
gf712HeikoS: ping11:25
@HeikoSpong11:25
@HeikoShi!11:26
@HeikoShow are thing?11:26
@HeikoSs11:26
gf712hey, just saw your email11:26
gf712good good11:26
gf712so busy writing stuff for phd11:26
@HeikoSyeah I saw11:26
@HeikoSno worries11:26
@HeikoSits important :)11:26
@HeikoSand you have been quite busy for shogun before that as well!11:27
gf712I use Keras/tf a lot and turns out the "newest" lstm implementations are pretty bad11:27
gf712so had to do some tuning11:27
gf712such a pain!11:27
gf712yea, but ill try to get everything done for the shogun project next week11:27
gf712HeikoS: btw who is the ati collaborator?11:28
@HeikoShaha11:29
@HeikoSwel11:29
@HeikoSwe need a good hyperparameter tuner! :)11:29
@HeikoSlet me find the link11:30
@HeikoShttps://scott-hosking.github.io/11:30
@HeikoShe is keen11:30
@HeikoSso I will finish the applicaiton now11:30
@HeikoSand he writes us a letter of support11:30
gf712that's awesome!11:31
gf712is the research project about climate change then?11:31
@HeikoSnono11:31
@HeikoSthe way it works11:32
@HeikoSis that we do the modelselection stuff11:32
@HeikoSand they give us feedback11:32
@HeikoSfor what they would need11:32
@HeikoSin their projects11:32
@HeikoSso we will have someone part-time joining meetings and playing with the stuff we are building11:32
gf712and they use svm then?11:32
@HeikoSidk11:32
gf712sounds good!11:32
gf712ah11:32
@HeikoSthey mostly do like time series stuff11:32
gf712mhhh11:33
@HeikoSso we also want to come up with a list of requirements for climate researchers11:33
gf712I can do some stuff for time series11:33
gf712I work with that a lot, for protein sequence analysis11:33
gf712but it's all deep net stuff11:33
gf712did he mention any requirements?11:34
@HeikoSnot yet11:34
@HeikoSI mean11:34
@HeikoSwe can totally add something to the proposal11:35
@HeikoSand squeeze the modelselection stuff a bit11:35
@HeikoShaving supevised learning on time series like classification11:35
@HeikoSwould be cool11:35
@HeikoSa random forest thingi would be best to start with11:35
@HeikoSas it is easier to tune :)11:35
@HeikoSmmmh I wonder whether we should actually make their requirements part of the proposal11:35
@HeikoSwill work on it now and then see11:36
@HeikoSgf712: so you would be keen on adding new algorithms in general?11:39
@HeikoSbecause then Ill actually ask him for some ideas11:39
@HeikoSwe can then compress the 3 stages in the proposal to two11:39
@HeikoSand add one more for new algos11:39
@HeikoSsay time series11:39
gf712HeikoS: I can add some more stuff, but i think it would be dependent on the need of the collaborators11:42
gf712i.e. I can work on adding lstm with stan11:42
gf712if they use it11:43
gf712or integrate it with cudnn11:43
gf712that kind of thing11:43
gf712depends on the hardware we are aiming for11:43
gf712what sort of algorithms are you thinking?11:43
wikingCIAO BELLA!11:48
@HeikoSgf712: good points12:06
@HeikoSgf712: he mentioned that they have quite large datasets as well12:06
@HeikoSso some lazy loading stuff might be interesting as well12:06
@HeikoSwiking: saw the hostel?12:10
wikinggot mails12:10
gf712yea lazy loading would be interesting12:10
wikinghaven't got chance to check it yet12:10
wikingHeikoS: i'm cleaning shop12:10
gf712never really used it in c++ I think12:10
wikingHeikoS: this codebase is insane12:10
@HeikoSwiking: lol yeah my phone crashed opening your diff :)12:13
wikingHeikoS: yeah github gave up on some shit as well12:14
wiking:)12:14
wikingi wonder whether we should have a common place12:14
wikingfor the refactor libtooling stuff i have12:14
wikingliek a repo12:14
@HeikoSwiking: but pls check the hostel. it is quite a bit less fancy than our last meetings12:14
wikingor something12:14
wikinghahahha12:14
@HeikoShowever, it is central and the price was among the better ones12:14
wikingis there a bed?12:14
@HeikoSthere is12:14
wikingHeikoS: btw cbase?12:14
@HeikoSso all good :)12:15
wikingor where is the meeting12:15
@HeikoStomtom12:15
wikingaaah yeah12:15
wikingok12:15
wikingso there's bed12:15
wikingso fuck it12:15
@HeikoSok good12:15
wikingwe just sleep there12:15
@HeikoSbooked then12:15
wikingif we sleep12:15
wiking;D12:15
wiking(knowing berlin nights...)12:15
@HeikoS6 people bunk bed room :)12:15
@HeikoShehehe12:15
@HeikoSindeed12:15
@HeikoScool12:15
@HeikoSso we are then mostly sorted12:15
@HeikoSjust need to make sure everyone books flights soon12:15
wikingok12:15
wikingwill we have a barby ?12:16
wikingbbq12:16
wikingwe should talk with Soeren12:16
wikingthat was cool the first time we had WS12:16
wikingi guess he has equipment12:16
wikingso we can just go to a park12:16
wikingand fire it up12:16
wiking:)12:16
wikingHeikoS: must say that i was shocked that basic models ran out of box12:17
wikingwith the replacements12:17
wiking:)12:17
wikingbut yeah i need Sergey to get any sorted12:18
wikingas all tags are now broken obviously12:18
@HeikoSwiking: bbq yes12:18
@HeikoSat tomtom12:18
@HeikoSI think s?ren wants to organise it12:18
@HeikoSlol12:18
wikingbut now i've got tired of SGIO12:20
wikingso i'm dropping that12:20
wikingfuck all these ancient stuff12:20
wiking?         However, many in our community use R or Matlab so they are 'closed-off' from using things like Dask - could Shogun help with these types of users?12:21
wikingok12:21
wikingstill have access to UCL12:21
wiking?12:21
wikingneed matlab build i guess12:21
wiking:D12:21
wikinggf712: spdlog FTW... it uses fmt12:21
wikingso lets just go with that12:22
@HeikoSwiking: yes I have12:22
@HeikoSI can install a buildslave12:22
wikingHeikoS: do it!12:22
@HeikoSjust tell me what to do12:22
wikingok i'll get u the line12:22
wikingin the meanwhile12:22
wikingpip install buildbot-worker12:22
gf712wiking: fmt is so good12:22
wikingin a virtualenv or something12:22
wikinggf712: yeah we'll have that12:22
wikingfor SG_DEBUG12:22
wikingand shit12:22
gf712it will be part of c++12:22
gf712soon12:22
gf71220 I think12:22
wikingyeah saw it12:22
gf712it's also header only12:23
gf712I think?12:23
wikingdunno how it will handle our old way of shit12:23
wikingspdlog12:23
wikingyes12:23
wikingso maybe i'll need another libtooling refactor12:23
wikingfor all the macro calls12:23
wiking:DDD12:23
wikingbut that will be interesting12:23
wikingas i need to change the format string12:23
wikingin the macro12:23
gf712the format string is the same though no12:23
gf712?12:23
wikingwe do12:24
wiking"asdf %s %f %d.....12:24
wikingfmt has "asdf {} {0:3f}12:24
wikingand stuff like that12:24
gf712you can do that with fmt12:24
gf712if you want12:24
wikingok12:24
wikingcool12:24
wikingso then its backward compatible12:24
gf712but yea {} is easier12:24
wikingyeah12:24
wikingjust you know12:24
gf712mostly to control precision12:24
wikingi dont wanna patch12:24
wikingold macrocalls12:24
wiking:D12:24
gf712if you want you can replace the macros and I can do some %s {} replacements12:25
gf712should be able to do positional  {}12:25
gf712like python12:25
gf712and decrease the number of args passed around12:25
wikinglets see12:25
wikingi wanna mostly drop shit from SGIO12:26
wikingand add the whole thing into init12:26
wikingand then it'll be part of Env one day12:26
gf712so for the sinks12:26
wikingand then you can do12:26
gf712one for stderr12:26
wikingadd_sink()12:26
wikingetc12:26
wikingyeah12:26
wikingstderr and stdout12:26
gf712one for stdout ?12:26
gf712ok12:26
wikingyeah but i'll add a multisink12:26
@HeikoSwiking:  ok installed12:26
gf712and then can use it also to write to log file?12:26
wikinggf712: yeah if u add_sunk12:26
wiking*sink12:26
wikingthen you can log anywhere12:26
gf712would be cool to expose that somehow to swig?12:26
wikingbut i thought that the default logger12:26
wikingis a multisync12:27
wikingalthough12:27
wikingnow that i'm thinking12:27
wikingi guess SG_ERROR should write to stderr12:27
wikingbut others to stdout12:27
wikingok first i do the shit12:27
wikingand then think a bit12:27
gf712haha ok12:27
wikinghow to do it properly12:27
@HeikoSwhich matlab12:27
@HeikoS/opt/matlab-R2017a/bin/matlab12:27
gf712btw in notebook does stderr get displayed?12:27
wikingcurrently just wanna chuck out shit12:28
@HeikoS                            < M A T L A B (R) >12:28
@HeikoS                  Copyright 1984-2017 The MathWorks, Inc.12:28
@HeikoS                   R2017a (9.2.0.556344) 64-bit (glnxa64)12:28
wikinggf712: mmm we are actually having a trick12:28
wikingin swig12:28
wikingsg_global_print_error12:28
wikingand then for python12:28
wikingwe use python stuff12:28
wikingso actually u get the errors to your interpreter12:29
wikingnot to stdout12:29
wikingor stderr12:29
wikingbut this can be done with sinks easily12:29
gf712ah ok cool!12:31
@HeikoSgf712:  ah ok just saw scotts email12:33
gf712HeikoS: yea, im reading through it12:34
@HeikoSthe question is what to pick from those things12:34
gf712You mean from the repo he sent?12:36
@HeikoSno the email12:36
@HeikoSman it is hailing heavily here12:36
gf712you mean the one with 4 bullet points?12:36
@HeikoSyes12:36
wikingHeikoS: will get u the lines12:36
gf712in spain?12:36
wikingbut wanna get lunch12:36
@HeikoSyes madrid12:36
wikingHeikoS: GO TO THE MARKET!12:37
wikingbestestest12:37
wiking:)12:37
wikingi think either sunday or saturday12:37
@HeikoSwell close to madrid12:37
@HeikoSla pedriza12:37
wikingin rasto12:37
@HeikoScool Ill check it12:38
wikinghttp://www.madridtourist.info/rastro_market.html12:38
wikingits nice12:38
wikingwith coffee and churros :P12:38
wikingu know churros if from madrid actually12:38
@HeikoSmjam12:39
@HeikoSgf712:  I think the sea ice thingi might be something12:40
@HeikoSmultinomial logistic regressio12:41
gf712HeikoS: ok! so that needs to be implemented in shogun?12:42
@HeikoSnono12:43
@HeikoSit is just an example12:43
@HeikoSof what they do12:43
@HeikoSwhat I am after is12:43
@HeikoS"what things could we add that would be useful for them"12:43
@HeikoSthe dask thing is interesting obvisouly12:43
@HeikoSor: can we offer something that solves really large-scale problems12:43
@HeikoSsaw we added the actor stuff12:44
@HeikoSthen maybe we can also connect it to a few selected algorithms12:44
@HeikoSlike logistic regression12:44
@HeikoSI think what I will do is to rewrite the second work package12:45
@HeikoSto add algorthms of interest for them12:45
@HeikoSwithout being too specfic12:45
@HeikoSand then we can discuss this when things kick off12:45
-!- HeikoS [~heiko@158.pool85-48-187.static.orange.es] has quit [Ping timeout: 245 seconds]12:54
-!- HeikoS [~heiko@25.pool85-48-187.static.orange.es] has joined #shogun12:56
-!- mode/#shogun [+o HeikoS] by ChanServ12:56
gf712HeikoS: considering the name of the project the actor stuff is the most important13:00
gf712the most important would be to make sure they know we are willing to extend the library13:01
gf712with more algos13:01
gf712so yea, I guess nothing specific13:01
@HeikoSgf712:  you think we should re-phrase in terms of actor specific algorothms?13:01
gf712the title?13:01
@HeikoSand the initial pitch13:02
@HeikoSi.e. rather than saying this is about model selection13:02
@HeikoSwe can say it is about actor implementation13:02
@HeikoSand modelselection is one example13:02
@HeikoSbut there could be others13:02
@HeikoSidk13:02
gf712ah right13:02
gf712hmmm, I think modelselection is good13:02
gf712because it is an application13:02
gf712otherwise it becomes to cs13:02
gf712too13:02
@HeikoSokalso keep in mind that the reviewers wont know actors :)13:02
@HeikoSkk13:02
@HeikoSwith a particular focus on integrating algorithms used by the environmental sciences community.13:03
@HeikoSthis kinda makes it clear I guess?13:03
gf712basically the take away should be twofold: multi parallel software for modelselection and extension of shogun to help our collaborators?13:04
@HeikoSah wait13:04
gf712yup13:04
@HeikoSare you looking at the abstract?13:04
@HeikoSbecause I am editing that atm13:04
@HeikoSthe other docs might be a bit outdated atm13:04
gf712the one you shared?13:04
@HeikoS"ATI abstract"13:04
gf712basically we need to ensure that shogun doesn't become too niche for this one group13:06
-!- HeikoS [~heiko@25.pool85-48-187.static.orange.es] has quit [Ping timeout: 258 seconds]13:52
-!- HeikoS [~heiko@4.pool85-48-187.static.orange.es] has joined #shogun14:14
-!- mode/#shogun [+o HeikoS] by ChanServ14:14
-!- geektoni [5d22ef24@gateway/web/freenode/ip.93.34.239.36] has joined #shogun14:27
-!- HeikoS [~heiko@4.pool85-48-187.static.orange.es] has quit [Ping timeout: 245 seconds]14:27
-!- HeikoS [~heiko@237.pool85-48-187.static.orange.es] has joined #shogun15:28
-!- mode/#shogun [+o HeikoS] by ChanServ15:28
geektoniping HeikoS15:51
@HeikoSgeektoni:  hi15:51
geektoniquick question about labels15:52
geektoniI have this file here https://github.com/geektoni/geektoni.github.io/pull/1/files#diff-b006ee2ca678823a1306ba5bfd8abd7d15:52
geektoniwhich I'm using for the blog post15:52
geektonihowever15:52
geektoniif I do labels(that_file), it sees it as multiclass instead of binary15:52
geektoniwhy is that? :/15:53
geektonithe meta examples use the same kind of methods15:53
geektoniand they works pretty fine15:53
geektoniHeikoS: that's the error I'm getting https://pastebin.com/J1rzMwfi15:54
@HeikoSsure15:55
@HeikoSthe factory tries to load15:55
@HeikoSfrom specific to general15:55
@HeikoStries first as binary, if it doesnt work it tries as multiclass15:56
@HeikoSbut you could just add a conversion call in the perceptron15:56
@HeikoSlabels = binary_labels(m_labels)15:56
@HeikoSat the beginning of train15:56
geektoniHeikoS: there is already a conversion call inside the perceptron15:59
geektonihere https://github.com/shogun-toolbox/shogun/blob/develop/src/shogun/classifier/Perceptron.cpp#L6315:59
@HeikoSgeektoni: just checking the error16:00
@HeikoSah ok16:00
@HeikoSi see16:00
@HeikoSthe conversion call causes the error16:00
@HeikoSgf712: did you end up implementing the conversion from multiclass labels to binary?16:00
@HeikoSor was that another way around?16:00
@HeikoSgeektoni: not sure why it loads the labels as multiclass16:00
@HeikoSI think it doesnt in the other meta examples or?16:00
geektoniHeikoS: The other meta examples which use toy data works fine16:01
@HeikoSas in load as binary?16:01
geektoniyes yes16:01
@HeikoSso the same file is loaded as binary?16:01
@HeikoScheck the files and what is different then I guess :)16:01
geektoniyep, let's start the debugging session then16:02
@HeikoShehe16:02
@HeikoSit is probably some file formatting stuff16:02
@HeikoSgeektoni: let me know!16:11
geektoniHeikoS: sure! :)16:12
geektoniHeikoS: okay, I've found the problem16:15
geektonibasically16:15
geektonieven if you have a binary data file with let's say two classes -1 and 116:15
geektonithat file has to have those values written as float, like -1.00 or 1.0016:16
geektoniotherwise it will be considered as multiclass16:16
@HeikoSi see16:16
@HeikoSboooooo!16:16
@HeikoScan you file an issue for that?16:16
@HeikoSsucks16:16
geektoniI need to figure out where this happens inside the code though16:16
@HeikoSdont fix it16:16
@HeikoSissue it :)16:16
@HeikoSentrance task16:16
@HeikoSjust change the file16:17
@HeikoS(for now)16:17
@HeikoSunless you want to fix it, then feel free :)16:17
geektonisure!16:17
geektonilet's see how complicate it is16:17
geektoniHeikoS: lol found the problem https://github.com/shogun-toolbox/shogun/blob/develop/src/shogun/labels/BinaryLabels.cpp#L7316:22
@HeikoShow is that causing the issue?16:23
geektonibasically16:23
@HeikoSis 1 != 1.0?16:23
geektoniI guess it is16:23
@HeikoSbut m_labels is float64_t16:23
geektonithe main issue is that the binary labels are somehow hardcoded16:23
geektoniif you have a dataset made by 0 and 116:24
@HeikoSnot sure I follow :)16:24
geektoniokay so16:24
geektoniShogun considers binary labels only -1 and 116:25
geektoniso, if you have a file which contains 0 and 116:25
geektoniit won't be considered binary, but multiclass16:25
@HeikoSah yes16:25
@HeikoSi see now16:25
@HeikoSyep that is true16:25
geektoniit can be seen as a design choice16:25
@HeikoSit is16:25
@HeikoSit is not the best one though16:26
@HeikoSah yes of course16:26
@HeikoSsorry I knew that16:26
@HeikoSyou have to do +1 -116:26
@HeikoSsome algos are based on that16:26
@HeikoStheir mathematical formulation16:26
@HeikoSsvm16:26
@HeikoSperceptron16:26
@HeikoSetc16:26
geektoniI see I see16:26
geektonibut then16:26
geektoniI guess that somehow it sees 1 != 1.016:26
geektoninot sure how it is even possible16:27
@HeikoSwhat?16:28
@HeikoSnono16:28
@HeikoSit loads it as floats16:28
@HeikoSi think you can even write -1 and +116:28
@HeikoSno need for -1.0 + 1.016:29
@HeikoS(I think)16:29
@HeikoSit is the -1, +1     vs 0,116:29
@HeikoSthat distiguished atm16:29
@HeikoSs16:29
geektonibut I've tried with a dataset with only -1 and +1 and it was not working either :/16:29
@HeikoSok scrap16:30
@HeikoSbut then -1.0 +1.0 works?16:30
geektonilemme try to do it again16:30
geektoniyes, -1.0 and +1.0 works16:30
@HeikoSok16:30
@HeikoSthat is a bug16:30
gf712HeikoS: the label conversion works for my case, but it isnt merged yet because we ended up not being sure if it was the right way to go16:31
@HeikoSgf712:  ah ok16:32
@HeikoSsure16:32
@HeikoSwe found the issue here16:32
@HeikoSgeektoni: for now just use the floats in the file16:32
@HeikoSthis is something to be fixed later16:32
@HeikoSnot your problem right now I would say16:32
geektoniHeikoS: I've tried again and with just -1 and 1 it works16:34
@HeikoSaha!16:35
@HeikoSsg.rocks16:35
geektoniit is just a problem of labels different from -1 and 116:35
@HeikoSI knew it :D16:35
geektoniI probably copied over the wrong files :/16:35
geektoniso, all good now16:35
@HeikoShehe16:39
@HeikoSgreat16:39
@HeikoSbtw I really like the blog post16:39
@HeikoSvery nice16:39
@HeikoSLet me know once it is published and I will tweet it16:39
@HeikoSgf712: so re the labels16:49
@HeikoSI think we need to hand the conversion to the user16:50
@HeikoSor do it in a base class16:50
@HeikoSbut doing it inside the algorithms is not good16:50
gf712OK, I agree16:52
gf712need to rethink a bit how to make it easier for users though16:52
gf712for a python user it might not make sense16:52
@HeikoSyeah so the first q is whether the user does it or the base class16:52
@HeikoSexactly16:52
gf712because you'd expect it to be done for you16:52
gf712yeah16:52
gf712I think base class16:52
@HeikoSand then how we deal with meta algorithms16:53
@HeikoScross-validation16:53
@HeikoSit doesnt matter for that right?16:53
@HeikoSbecause users just wants the number16:53
@HeikoSbut e.g. what if the user looks at performance in the folds (using geektoni observable stuff)16:54
@HeikoSthen the labels might be different than those provided by the user16:54
@HeikoSso we need to inform the user of the conversion16:55
@HeikoSbut then if it happens in ::train16:55
@HeikoSwe get lots of conversion messages if we do model selection16:56
@HeikoSso it is always the "highest level" class that should do it16:56
@HeikoSidk16:57
@HeikoSmaybe we should give it to the user :D16:57
@HeikoSor we step back further and say16:57
@HeikoSDiscreteLabels16:57
@HeikoSwhich just contain unique values16:57
@HeikoSrather than multiclass/binary16:57
gf712I think that might be best16:57
gf712if it is feasible16:58
@HeikoSAPi wise definitely16:58
@HeikoSproblem is16:58
@HeikoSe.g. svm16:58
@HeikoSthe optimization problem is based on +1 -116:58
@HeikoSso which will be assigned to which16:58
gf712but internally you could cast it16:58
@HeikoSDiscretelabels("A", "B")16:58
gf712and keep track16:58
@HeikoSI see16:58
gf712but just in svm16:58
@HeikoSso have a map inside16:58
gf712yup16:58
@HeikoSto_binary16:58
@HeikoSto_multiclass16:58
gf712and it should be cheap16:59
gf712at runtime16:59
@HeikoSand those map the label type/value to the [-1,+1] or [0,1,2,3,4]16:59
@HeikoSyeah I mean the map can be built lazily16:59
@HeikoSif it doesnt exist yet, populate it16:59
@HeikoSin some way16:59
@HeikoSand for predictions16:59
@HeikoSit is mapped backwards?16:59
gf712I guess so?17:00
gf712so you have an internal prediction17:00
@HeikoSok and then we would just get rid of BinaryLabels MuliclassLabels17:00
@HeikoSbut would just have DenseLabels17:00
gf712that is translated to the labels17:00
gf712yea17:00
@HeikoSand make it templated even?17:00
gf712DenseLabels?17:00
@HeikoSthat is the base class17:00
@HeikoSwhich stores the actual data17:00
gf712yea, it would have to be17:00
gf712well wouldn't have to17:01
gf712but it would make it more efficient17:01
@HeikoSI am not sure whether we ever want to actually have labels as "F"17:01
@HeikoSrather than just ints?17:01
@HeikoSah17:01
@HeikoSwhat about regression17:01
@HeikoSthat is real valued17:01
@HeikoScurrently, the base class holds a float vector17:01
gf712well you need it to be templated17:01
@HeikoSand just stores as int17:02
@HeikoSthat makes some stuff easier, other stuff more complicated17:02
gf712but at the end of the day it makes it easier for the user right?17:02
@HeikoSyeah agree17:03
gf712it means that for classification we could maybe even convert something that is a string17:03
gf712to numeric labels17:03
@HeikoSwell17:03
@HeikoSI wonder17:03
@HeikoSis that something needed17:03
@HeikoSor can we expect that a user converts them to ints17:03
gf712i.e. labels {"Red", "Blue"}17:03
@HeikoSor we can offer a method to do that17:03
gf712and then is converted17:03
@HeikoSbecause the factory could accept other types17:04
@HeikoSbut internally17:04
@HeikoSwhat do we do there?17:04
@HeikoSsee what I mean?17:04
gf712we would need a check in classifiers17:04
@HeikoSlabels(["r", "b"]) -> is it converted to ints internally?17:04
gf712just thinking that that is something you can do in sklearn17:04
gf712I think17:04
gf712yes17:04
gf712it just makes life a bit easier for a user17:05
gf712im just thinking when this would go wrong17:05
@HeikoSdo they convert or actually store the strings?17:05
gf712you would need the string to convert back?17:05
@HeikoSI tend to think int values are fine17:05
gf712yea, might complicate things a bit too much17:06
@HeikoSwhat definitely sucks is that our labels need to be contiguous17:06
@HeikoSbut again, some algos depend on that17:06
@HeikoSso the idea of maintaining an internal map might be good17:06
gf712but it would be nice to have just one labels class exposed to the user17:06
gf712and the rest is internal17:06
@HeikoSyep17:07
@HeikoSbut ok17:07
@HeikoSthat is easy17:07
@HeikoSCLabels17:07
gf712but then errors could become more mysterious17:07
@HeikoSsure17:07
@HeikoScurrently, the factory loader for labels decides what subclass to instantiate17:07
@HeikoSfrom specific to general17:07
@HeikoSie first binary, then bla, then bla17:07
@HeikoSuntil one works17:07
@HeikoSand then the algos call this conversion call for what they need17:07
@HeikoSso let's summarize17:09
@HeikoSwe want17:09
@HeikoSusers dont care about label type, they just provide something that is "label-able"17:09
@HeikoSinternal algos need certain representations -1,+1 or [0,1,2,3]17:09
@HeikoSquestion is where we convert17:09
@HeikoSin a factory, so that inside shogun all is stored in the usual format17:10
gf712HeikoS: quick q, is it possible to get a binary label directly from sg.labels factory?17:10
@HeikoSif the file is +1, -1 then yes17:10
gf712ah from file ok17:10
@HeikoSbut there is no conversion17:10
gf712but not from array17:10
@HeikoSah idk17:11
@HeikoSI think there might be17:11
gf712a=np.array([1,0]17:11
gf712sg.labels(a)17:11
gf712gives an error17:11
gf712can only use shogun::labels< float64_t >(shogun::SGVector< double >)17:11
@HeikoStemplate <class T>17:11
@HeikoSCLabels* labels(SGVector<T> labels)17:11
gf712anyway, sorry was just checking it17:11
@HeikoSwe have17:11
-!- wiking [~wiking@2001:67c:10ec:5784:8000::3ff] has quit [Remote host closed the connection]17:11
@HeikoScould do17:12
-!- wiking [~wiking@2001:67c:10ec:5784:8000::3ff] has joined #shogun17:12
@HeikoSsg.labels(my_file_or_array).to_binary()17:12
@HeikoSbut that sucks17:12
@HeikoSgf712:  what would the user see if doing this17:12
@HeikoSmy_model.apply(data).get("labels")17:13
@HeikoSif trained with sg.labels(["r", "b"])17:13
@HeikoSand then also, what would be sg.labels(["r", "b"]).get("labels")17:13
@HeikoSI would say in both cases we would need see ["r, "b"] or?17:14
@HeikoSso then inside the svm we would do17:14
gf712I would imagine so17:14
gf712but the strings can come later no>17:15
gf712?17:15
gf712I was just thinking out loud17:15
@HeikoSwhat you mean by later?17:15
gf712well, first you want to make it label agnostic right?17:15
@HeikoSyeah sure17:15
@HeikoSjust example17:15
@HeikoScould be all ints17:15
@HeikoSjust what the user sees17:15
@HeikoSthe user never sees the internal representation17:16
@HeikoSthat is your point right?17:16
gf712no, I don't think the user would17:16
gf712yup17:16
@HeikoSso then inside my ::train call I could do17:16
@HeikoSm_labels.as_binary()17:16
gf712exactly17:16
gf712which creates a map17:16
gf712and casts to the right label type17:16
-!- wiking [~wiking@2001:67c:10ec:5784:8000::3ff] has quit [Ping timeout: 264 seconds]17:16
-!- HeikoS [~heiko@237.pool85-48-187.static.orange.es] has quit [Ping timeout: 255 seconds]17:21
-!- HeikoS [~heiko@237.pool85-48-187.static.orange.es] has joined #shogun17:22
-!- mode/#shogun [+o HeikoS] by ChanServ17:22
@HeikoSgf712:  sorry connection failed17:22
@HeikoSwhats the last thing you saw from me?17:22
gf712[16:16] <@HeikoS> m_labels.as_binary()17:23
gf712and then I said17:23
gf712[16:16] <gf712> exactly17:24
@HeikoSah ok17:24
@HeikoSman17:24
gf712[16:16] <gf712> which creates a map17:24
@HeikoSstupic connection17:24
gf712[16:16] <gf712> and casts to the right label type17:24
@HeikoSok so the issue is the time when the map is created17:24
gf712are you using 4G?17:24
@HeikoSe.g. xvalidation17:24
@HeikoSmessing it up17:24
@HeikoSyes 4g17:24
@HeikoSso what we would need is17:24
@HeikoSCMachine::train calls the map-creation invoke17:24
@HeikoSbut XVal::eval also calls it17:25
@HeikoSand since it is lazy17:25
@HeikoSthe subsequence call from CMachine::train is a nop17:25
gf712mmh not sure I follow17:25
gf712so the map is created fine with train17:26
gf712{"R":1, "B": 0}17:26
gf712and then the eval accesses that map when it does predictions?17:26
@HeikoSbut think xval17:26
@HeikoSthe first ::train call17:26
@HeikoSmight only see a subset of data17:27
@HeikoSwhich e.g. misses one clas17:27
@HeikoSclass17:27
gf712oh right17:27
@HeikoSso the mapping is messed up17:27
gf712I guess in that case it would to insert new keys17:27
gf712hmmm17:27
@HeikoSI think you need all information17:28
@HeikoSin order to build the mapping17:28
@HeikoShow to decide otherwise17:28
@HeikoSso the highest level caller needs to do it17:28
gf712so when you call eval in eval does it do bagging or something to determine the label?17:28
gf712xval*17:28
@HeikoSit might17:29
gf712but why not have each trained machine do its individual prediction17:30
gf712and then based on that17:30
gf712xval decides the label17:30
gf712having access to the results of all xval machines17:30
gf712or doesn't that work either?17:30
@HeikoSnot sure I understand it17:30
@HeikoSso if xvalidation was a black bo17:31
@HeikoSbox17:31
@HeikoSthen we could just do that17:31
@HeikoSeach ::train call might have different labels17:31
@HeikoSehm mappings17:31
@HeikoSbut it all doesnt matter since only the accuracy matters17:31
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has joined #shogun17:31
@HeikoSand if a user wants the fold accuracy, that is fine as well right?17:32
@HeikoSmmh so that works actually17:32
@HeikoSwith the only downside being that the mapping is created multiple times17:32
@HeikoSgf712:  ok I gotta dash....let's continue the discussion in a bit17:34
gf712HeikoS: ok sure!17:34
gf712btw all good for ati?17:34
gf712I can read it a couple more times17:34
@HeikoSyes do that :)17:34
@HeikoSthink what someone might not like17:34
@HeikoSI have another call tomorrow with someone who might be added as a collaborator17:35
gf712what are you most worried about?17:35
gf712that someone wouldn't liek17:35
@HeikoSidk17:35
@HeikoStoo specific17:35
@HeikoSnot clear enough17:35
@HeikoSI like the proposal17:35
@HeikoSbut maybe it is good to think about criticisms17:36
gf712ok, let me read it with that kind of mindset :D17:36
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has quit [Ping timeout: 246 seconds]17:36
@HeikoSyeah think about you had to give your money to it17:36
@HeikoSalso I have put the criteria for evaluation in the doc17:36
@HeikoSsaw them?17:36
gf712yea, I need to reread them17:36
gf712ok, ill do that17:36
@HeikoSthx17:38
@HeikoSsee you later!17:38
gf712see you17:39
-!- HeikoS [~heiko@237.pool85-48-187.static.orange.es] has quit [Ping timeout: 258 seconds]17:43
-!- HeikoS [~heiko@34.pool85-48-187.static.orange.es] has joined #shogun17:47
-!- mode/#shogun [+o HeikoS] by ChanServ17:47
-!- HeikoS [~heiko@34.pool85-48-187.static.orange.es] has quit [Ping timeout: 255 seconds]18:03
-!- geektoni [5d22ef24@gateway/web/freenode/ip.93.34.239.36] has quit [Quit: Page closed]18:34
-!- gf712 [905208ce@gateway/web/freenode/ip.144.82.8.206] has quit [Ping timeout: 256 seconds]18:41
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has joined #shogun23:45
--- Log closed Fri Apr 26 00:00:49 2019

Generated by irclog2html.py 2.10.0 by Marius Gedminas - find it at mg.pov.lt!