--- Log opened Thu Apr 25 00:00:48 2019 | ||
-!- besser82 [~besser82@fedora/besser82] has quit [Quit: Freedom, Friends, Features, First [fedoraproject.org]] | 00:35 | |
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has quit [Remote host closed the connection] | 01:19 | |
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has joined #shogun | 01:21 | |
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has quit [Ping timeout: 276 seconds] | 01:26 | |
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has joined #shogun | 01:58 | |
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has quit [Ping timeout: 258 seconds] | 02:03 | |
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has joined #shogun | 02:28 | |
-!- Moatman [~Moatman@pool-96-255-151-151.washdc.fios.verizon.net] has joined #shogun | 02:31 | |
-!- Moatman [~Moatman@pool-96-255-151-151.washdc.fios.verizon.net] has quit [Remote host closed the connection] | 02:33 | |
-!- Moatman [~Moatman@pool-96-255-151-151.washdc.fios.verizon.net] has joined #shogun | 02:34 | |
-!- Moatman [~Moatman@pool-96-255-151-151.washdc.fios.verizon.net] has quit [Remote host closed the connection] | 02:35 | |
-!- Moatman [~Moatman@pool-96-255-151-151.washdc.fios.verizon.net] has joined #shogun | 02:37 | |
-!- Moatman [~Moatman@pool-96-255-151-151.washdc.fios.verizon.net] has quit [Remote host closed the connection] | 02:38 | |
-!- besser82 [~besser82@fedora/besser82] has joined #shogun | 07:18 | |
-!- mode/#shogun [+o besser82] by ChanServ | 07:18 | |
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has quit [Remote host closed the connection] | 08:07 | |
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has joined #shogun | 08:30 | |
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has quit [Remote host closed the connection] | 09:17 | |
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has joined #shogun | 09:18 | |
-!- gf712 [905208ce@gateway/web/freenode/ip.144.82.8.206] has joined #shogun | 09:43 | |
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has quit [Remote host closed the connection] | 09:55 | |
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has joined #shogun | 09:57 | |
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has quit [Remote host closed the connection] | 10:06 | |
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has joined #shogun | 10:13 | |
-!- Taivhi303j [3121a75c@gateway/web/freenode/ip.49.33.167.92] has joined #shogun | 10:24 | |
-!- Taivhi303j [3121a75c@gateway/web/freenode/ip.49.33.167.92] has quit [Client Quit] | 10:25 | |
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has quit [Remote host closed the connection] | 10:32 | |
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has joined #shogun | 10:37 | |
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has quit [Ping timeout: 246 seconds] | 10:41 | |
-!- wiking [~wiking@2001:67c:10ec:5784:8000::3ff] has joined #shogun | 10:47 | |
-!- HeikoS [~heiko@158.pool85-48-187.static.orange.es] has joined #shogun | 11:03 | |
-!- mode/#shogun [+o HeikoS] by ChanServ | 11:03 | |
gf712 | HeikoS: ping | 11:25 |
---|---|---|
@HeikoS | pong | 11:25 |
@HeikoS | hi! | 11:26 |
@HeikoS | how are thing? | 11:26 |
@HeikoS | s | 11:26 |
gf712 | hey, just saw your email | 11:26 |
gf712 | good good | 11:26 |
gf712 | so busy writing stuff for phd | 11:26 |
@HeikoS | yeah I saw | 11:26 |
@HeikoS | no worries | 11:26 |
@HeikoS | its important :) | 11:26 |
@HeikoS | and you have been quite busy for shogun before that as well! | 11:27 |
gf712 | I use Keras/tf a lot and turns out the "newest" lstm implementations are pretty bad | 11:27 |
gf712 | so had to do some tuning | 11:27 |
gf712 | such a pain! | 11:27 |
gf712 | yea, but ill try to get everything done for the shogun project next week | 11:27 |
gf712 | HeikoS: btw who is the ati collaborator? | 11:28 |
@HeikoS | haha | 11:29 |
@HeikoS | wel | 11:29 |
@HeikoS | we need a good hyperparameter tuner! :) | 11:29 |
@HeikoS | let me find the link | 11:30 |
@HeikoS | https://scott-hosking.github.io/ | 11:30 |
@HeikoS | he is keen | 11:30 |
@HeikoS | so I will finish the applicaiton now | 11:30 |
@HeikoS | and he writes us a letter of support | 11:30 |
gf712 | that's awesome! | 11:31 |
gf712 | is the research project about climate change then? | 11:31 |
@HeikoS | nono | 11:31 |
@HeikoS | the way it works | 11:32 |
@HeikoS | is that we do the modelselection stuff | 11:32 |
@HeikoS | and they give us feedback | 11:32 |
@HeikoS | for what they would need | 11:32 |
@HeikoS | in their projects | 11:32 |
@HeikoS | so we will have someone part-time joining meetings and playing with the stuff we are building | 11:32 |
gf712 | and they use svm then? | 11:32 |
@HeikoS | idk | 11:32 |
gf712 | sounds good! | 11:32 |
gf712 | ah | 11:32 |
@HeikoS | they mostly do like time series stuff | 11:32 |
gf712 | mhhh | 11:33 |
@HeikoS | so we also want to come up with a list of requirements for climate researchers | 11:33 |
gf712 | I can do some stuff for time series | 11:33 |
gf712 | I work with that a lot, for protein sequence analysis | 11:33 |
gf712 | but it's all deep net stuff | 11:33 |
gf712 | did he mention any requirements? | 11:34 |
@HeikoS | not yet | 11:34 |
@HeikoS | I mean | 11:34 |
@HeikoS | we can totally add something to the proposal | 11:35 |
@HeikoS | and squeeze the modelselection stuff a bit | 11:35 |
@HeikoS | having supevised learning on time series like classification | 11:35 |
@HeikoS | would be cool | 11:35 |
@HeikoS | a random forest thingi would be best to start with | 11:35 |
@HeikoS | as it is easier to tune :) | 11:35 |
@HeikoS | mmmh I wonder whether we should actually make their requirements part of the proposal | 11:35 |
@HeikoS | will work on it now and then see | 11:36 |
@HeikoS | gf712: so you would be keen on adding new algorithms in general? | 11:39 |
@HeikoS | because then Ill actually ask him for some ideas | 11:39 |
@HeikoS | we can then compress the 3 stages in the proposal to two | 11:39 |
@HeikoS | and add one more for new algos | 11:39 |
@HeikoS | say time series | 11:39 |
gf712 | HeikoS: I can add some more stuff, but i think it would be dependent on the need of the collaborators | 11:42 |
gf712 | i.e. I can work on adding lstm with stan | 11:42 |
gf712 | if they use it | 11:43 |
gf712 | or integrate it with cudnn | 11:43 |
gf712 | that kind of thing | 11:43 |
gf712 | depends on the hardware we are aiming for | 11:43 |
gf712 | what sort of algorithms are you thinking? | 11:43 |
wiking | CIAO BELLA! | 11:48 |
@HeikoS | gf712: good points | 12:06 |
@HeikoS | gf712: he mentioned that they have quite large datasets as well | 12:06 |
@HeikoS | so some lazy loading stuff might be interesting as well | 12:06 |
@HeikoS | wiking: saw the hostel? | 12:10 |
wiking | got mails | 12:10 |
gf712 | yea lazy loading would be interesting | 12:10 |
wiking | haven't got chance to check it yet | 12:10 |
wiking | HeikoS: i'm cleaning shop | 12:10 |
gf712 | never really used it in c++ I think | 12:10 |
wiking | HeikoS: this codebase is insane | 12:10 |
@HeikoS | wiking: lol yeah my phone crashed opening your diff :) | 12:13 |
wiking | HeikoS: yeah github gave up on some shit as well | 12:14 |
wiking | :) | 12:14 |
wiking | i wonder whether we should have a common place | 12:14 |
wiking | for the refactor libtooling stuff i have | 12:14 |
wiking | liek a repo | 12:14 |
@HeikoS | wiking: but pls check the hostel. it is quite a bit less fancy than our last meetings | 12:14 |
wiking | or something | 12:14 |
wiking | hahahha | 12:14 |
@HeikoS | however, it is central and the price was among the better ones | 12:14 |
wiking | is there a bed? | 12:14 |
@HeikoS | there is | 12:14 |
wiking | HeikoS: btw cbase? | 12:14 |
@HeikoS | so all good :) | 12:15 |
wiking | or where is the meeting | 12:15 |
@HeikoS | tomtom | 12:15 |
wiking | aaah yeah | 12:15 |
wiking | ok | 12:15 |
wiking | so there's bed | 12:15 |
wiking | so fuck it | 12:15 |
@HeikoS | ok good | 12:15 |
wiking | we just sleep there | 12:15 |
@HeikoS | booked then | 12:15 |
wiking | if we sleep | 12:15 |
wiking | ;D | 12:15 |
wiking | (knowing berlin nights...) | 12:15 |
@HeikoS | 6 people bunk bed room :) | 12:15 |
@HeikoS | hehehe | 12:15 |
@HeikoS | indeed | 12:15 |
@HeikoS | cool | 12:15 |
@HeikoS | so we are then mostly sorted | 12:15 |
@HeikoS | just need to make sure everyone books flights soon | 12:15 |
wiking | ok | 12:15 |
wiking | will we have a barby ? | 12:16 |
wiking | bbq | 12:16 |
wiking | we should talk with Soeren | 12:16 |
wiking | that was cool the first time we had WS | 12:16 |
wiking | i guess he has equipment | 12:16 |
wiking | so we can just go to a park | 12:16 |
wiking | and fire it up | 12:16 |
wiking | :) | 12:16 |
wiking | HeikoS: must say that i was shocked that basic models ran out of box | 12:17 |
wiking | with the replacements | 12:17 |
wiking | :) | 12:17 |
wiking | but yeah i need Sergey to get any sorted | 12:18 |
wiking | as all tags are now broken obviously | 12:18 |
@HeikoS | wiking: bbq yes | 12:18 |
@HeikoS | at tomtom | 12:18 |
@HeikoS | I think s?ren wants to organise it | 12:18 |
@HeikoS | lol | 12:18 |
wiking | but now i've got tired of SGIO | 12:20 |
wiking | so i'm dropping that | 12:20 |
wiking | fuck all these ancient stuff | 12:20 |
wiking | ? However, many in our community use R or Matlab so they are 'closed-off' from using things like Dask - could Shogun help with these types of users? | 12:21 |
wiking | ok | 12:21 |
wiking | still have access to UCL | 12:21 |
wiking | ? | 12:21 |
wiking | need matlab build i guess | 12:21 |
wiking | :D | 12:21 |
wiking | gf712: spdlog FTW... it uses fmt | 12:21 |
wiking | so lets just go with that | 12:22 |
@HeikoS | wiking: yes I have | 12:22 |
@HeikoS | I can install a buildslave | 12:22 |
wiking | HeikoS: do it! | 12:22 |
@HeikoS | just tell me what to do | 12:22 |
wiking | ok i'll get u the line | 12:22 |
wiking | in the meanwhile | 12:22 |
wiking | pip install buildbot-worker | 12:22 |
gf712 | wiking: fmt is so good | 12:22 |
wiking | in a virtualenv or something | 12:22 |
wiking | gf712: yeah we'll have that | 12:22 |
wiking | for SG_DEBUG | 12:22 |
wiking | and shit | 12:22 |
gf712 | it will be part of c++ | 12:22 |
gf712 | soon | 12:22 |
gf712 | 20 I think | 12:22 |
wiking | yeah saw it | 12:22 |
gf712 | it's also header only | 12:23 |
gf712 | I think? | 12:23 |
wiking | dunno how it will handle our old way of shit | 12:23 |
wiking | spdlog | 12:23 |
wiking | yes | 12:23 |
wiking | so maybe i'll need another libtooling refactor | 12:23 |
wiking | for all the macro calls | 12:23 |
wiking | :DDD | 12:23 |
wiking | but that will be interesting | 12:23 |
wiking | as i need to change the format string | 12:23 |
wiking | in the macro | 12:23 |
gf712 | the format string is the same though no | 12:23 |
gf712 | ? | 12:23 |
wiking | we do | 12:24 |
wiking | "asdf %s %f %d..... | 12:24 |
wiking | fmt has "asdf {} {0:3f} | 12:24 |
wiking | and stuff like that | 12:24 |
gf712 | you can do that with fmt | 12:24 |
gf712 | if you want | 12:24 |
wiking | ok | 12:24 |
wiking | cool | 12:24 |
wiking | so then its backward compatible | 12:24 |
gf712 | but yea {} is easier | 12:24 |
wiking | yeah | 12:24 |
wiking | just you know | 12:24 |
gf712 | mostly to control precision | 12:24 |
wiking | i dont wanna patch | 12:24 |
wiking | old macrocalls | 12:24 |
wiking | :D | 12:24 |
gf712 | if you want you can replace the macros and I can do some %s {} replacements | 12:25 |
gf712 | should be able to do positional {} | 12:25 |
gf712 | like python | 12:25 |
gf712 | and decrease the number of args passed around | 12:25 |
wiking | lets see | 12:25 |
wiking | i wanna mostly drop shit from SGIO | 12:26 |
wiking | and add the whole thing into init | 12:26 |
wiking | and then it'll be part of Env one day | 12:26 |
gf712 | so for the sinks | 12:26 |
wiking | and then you can do | 12:26 |
gf712 | one for stderr | 12:26 |
wiking | add_sink() | 12:26 |
wiking | etc | 12:26 |
wiking | yeah | 12:26 |
wiking | stderr and stdout | 12:26 |
gf712 | one for stdout ? | 12:26 |
gf712 | ok | 12:26 |
wiking | yeah but i'll add a multisink | 12:26 |
@HeikoS | wiking: ok installed | 12:26 |
gf712 | and then can use it also to write to log file? | 12:26 |
wiking | gf712: yeah if u add_sunk | 12:26 |
wiking | *sink | 12:26 |
wiking | then you can log anywhere | 12:26 |
gf712 | would be cool to expose that somehow to swig? | 12:26 |
wiking | but i thought that the default logger | 12:26 |
wiking | is a multisync | 12:27 |
wiking | although | 12:27 |
wiking | now that i'm thinking | 12:27 |
wiking | i guess SG_ERROR should write to stderr | 12:27 |
wiking | but others to stdout | 12:27 |
wiking | ok first i do the shit | 12:27 |
wiking | and then think a bit | 12:27 |
gf712 | haha ok | 12:27 |
wiking | how to do it properly | 12:27 |
@HeikoS | which matlab | 12:27 |
@HeikoS | /opt/matlab-R2017a/bin/matlab | 12:27 |
gf712 | btw in notebook does stderr get displayed? | 12:27 |
wiking | currently just wanna chuck out shit | 12:28 |
@HeikoS | < M A T L A B (R) > | 12:28 |
@HeikoS | Copyright 1984-2017 The MathWorks, Inc. | 12:28 |
@HeikoS | R2017a (9.2.0.556344) 64-bit (glnxa64) | 12:28 |
wiking | gf712: mmm we are actually having a trick | 12:28 |
wiking | in swig | 12:28 |
wiking | sg_global_print_error | 12:28 |
wiking | and then for python | 12:28 |
wiking | we use python stuff | 12:28 |
wiking | so actually u get the errors to your interpreter | 12:29 |
wiking | not to stdout | 12:29 |
wiking | or stderr | 12:29 |
wiking | but this can be done with sinks easily | 12:29 |
gf712 | ah ok cool! | 12:31 |
@HeikoS | gf712: ah ok just saw scotts email | 12:33 |
gf712 | HeikoS: yea, im reading through it | 12:34 |
@HeikoS | the question is what to pick from those things | 12:34 |
gf712 | You mean from the repo he sent? | 12:36 |
@HeikoS | no the email | 12:36 |
@HeikoS | man it is hailing heavily here | 12:36 |
gf712 | you mean the one with 4 bullet points? | 12:36 |
@HeikoS | yes | 12:36 |
wiking | HeikoS: will get u the lines | 12:36 |
gf712 | in spain? | 12:36 |
wiking | but wanna get lunch | 12:36 |
@HeikoS | yes madrid | 12:36 |
wiking | HeikoS: GO TO THE MARKET! | 12:37 |
wiking | bestestest | 12:37 |
wiking | :) | 12:37 |
wiking | i think either sunday or saturday | 12:37 |
@HeikoS | well close to madrid | 12:37 |
@HeikoS | la pedriza | 12:37 |
wiking | in rasto | 12:37 |
@HeikoS | cool Ill check it | 12:38 |
wiking | http://www.madridtourist.info/rastro_market.html | 12:38 |
wiking | its nice | 12:38 |
wiking | with coffee and churros :P | 12:38 |
wiking | u know churros if from madrid actually | 12:38 |
@HeikoS | mjam | 12:39 |
@HeikoS | gf712: I think the sea ice thingi might be something | 12:40 |
@HeikoS | multinomial logistic regressio | 12:41 |
gf712 | HeikoS: ok! so that needs to be implemented in shogun? | 12:42 |
@HeikoS | nono | 12:43 |
@HeikoS | it is just an example | 12:43 |
@HeikoS | of what they do | 12:43 |
@HeikoS | what I am after is | 12:43 |
@HeikoS | "what things could we add that would be useful for them" | 12:43 |
@HeikoS | the dask thing is interesting obvisouly | 12:43 |
@HeikoS | or: can we offer something that solves really large-scale problems | 12:43 |
@HeikoS | saw we added the actor stuff | 12:44 |
@HeikoS | then maybe we can also connect it to a few selected algorithms | 12:44 |
@HeikoS | like logistic regression | 12:44 |
@HeikoS | I think what I will do is to rewrite the second work package | 12:45 |
@HeikoS | to add algorthms of interest for them | 12:45 |
@HeikoS | without being too specfic | 12:45 |
@HeikoS | and then we can discuss this when things kick off | 12:45 |
-!- HeikoS [~heiko@158.pool85-48-187.static.orange.es] has quit [Ping timeout: 245 seconds] | 12:54 | |
-!- HeikoS [~heiko@25.pool85-48-187.static.orange.es] has joined #shogun | 12:56 | |
-!- mode/#shogun [+o HeikoS] by ChanServ | 12:56 | |
gf712 | HeikoS: considering the name of the project the actor stuff is the most important | 13:00 |
gf712 | the most important would be to make sure they know we are willing to extend the library | 13:01 |
gf712 | with more algos | 13:01 |
gf712 | so yea, I guess nothing specific | 13:01 |
@HeikoS | gf712: you think we should re-phrase in terms of actor specific algorothms? | 13:01 |
gf712 | the title? | 13:01 |
@HeikoS | and the initial pitch | 13:02 |
@HeikoS | i.e. rather than saying this is about model selection | 13:02 |
@HeikoS | we can say it is about actor implementation | 13:02 |
@HeikoS | and modelselection is one example | 13:02 |
@HeikoS | but there could be others | 13:02 |
@HeikoS | idk | 13:02 |
gf712 | ah right | 13:02 |
gf712 | hmmm, I think modelselection is good | 13:02 |
gf712 | because it is an application | 13:02 |
gf712 | otherwise it becomes to cs | 13:02 |
gf712 | too | 13:02 |
@HeikoS | okalso keep in mind that the reviewers wont know actors :) | 13:02 |
@HeikoS | kk | 13:02 |
@HeikoS | with a particular focus on integrating algorithms used by the environmental sciences community. | 13:03 |
@HeikoS | this kinda makes it clear I guess? | 13:03 |
gf712 | basically the take away should be twofold: multi parallel software for modelselection and extension of shogun to help our collaborators? | 13:04 |
@HeikoS | ah wait | 13:04 |
gf712 | yup | 13:04 |
@HeikoS | are you looking at the abstract? | 13:04 |
@HeikoS | because I am editing that atm | 13:04 |
@HeikoS | the other docs might be a bit outdated atm | 13:04 |
gf712 | the one you shared? | 13:04 |
@HeikoS | "ATI abstract" | 13:04 |
gf712 | basically we need to ensure that shogun doesn't become too niche for this one group | 13:06 |
-!- HeikoS [~heiko@25.pool85-48-187.static.orange.es] has quit [Ping timeout: 258 seconds] | 13:52 | |
-!- HeikoS [~heiko@4.pool85-48-187.static.orange.es] has joined #shogun | 14:14 | |
-!- mode/#shogun [+o HeikoS] by ChanServ | 14:14 | |
-!- geektoni [5d22ef24@gateway/web/freenode/ip.93.34.239.36] has joined #shogun | 14:27 | |
-!- HeikoS [~heiko@4.pool85-48-187.static.orange.es] has quit [Ping timeout: 245 seconds] | 14:27 | |
-!- HeikoS [~heiko@237.pool85-48-187.static.orange.es] has joined #shogun | 15:28 | |
-!- mode/#shogun [+o HeikoS] by ChanServ | 15:28 | |
geektoni | ping HeikoS | 15:51 |
@HeikoS | geektoni: hi | 15:51 |
geektoni | quick question about labels | 15:52 |
geektoni | I have this file here https://github.com/geektoni/geektoni.github.io/pull/1/files#diff-b006ee2ca678823a1306ba5bfd8abd7d | 15:52 |
geektoni | which I'm using for the blog post | 15:52 |
geektoni | however | 15:52 |
geektoni | if I do labels(that_file), it sees it as multiclass instead of binary | 15:52 |
geektoni | why is that? :/ | 15:53 |
geektoni | the meta examples use the same kind of methods | 15:53 |
geektoni | and they works pretty fine | 15:53 |
geektoni | HeikoS: that's the error I'm getting https://pastebin.com/J1rzMwfi | 15:54 |
@HeikoS | sure | 15:55 |
@HeikoS | the factory tries to load | 15:55 |
@HeikoS | from specific to general | 15:55 |
@HeikoS | tries first as binary, if it doesnt work it tries as multiclass | 15:56 |
@HeikoS | but you could just add a conversion call in the perceptron | 15:56 |
@HeikoS | labels = binary_labels(m_labels) | 15:56 |
@HeikoS | at the beginning of train | 15:56 |
geektoni | HeikoS: there is already a conversion call inside the perceptron | 15:59 |
geektoni | here https://github.com/shogun-toolbox/shogun/blob/develop/src/shogun/classifier/Perceptron.cpp#L63 | 15:59 |
@HeikoS | geektoni: just checking the error | 16:00 |
@HeikoS | ah ok | 16:00 |
@HeikoS | i see | 16:00 |
@HeikoS | the conversion call causes the error | 16:00 |
@HeikoS | gf712: did you end up implementing the conversion from multiclass labels to binary? | 16:00 |
@HeikoS | or was that another way around? | 16:00 |
@HeikoS | geektoni: not sure why it loads the labels as multiclass | 16:00 |
@HeikoS | I think it doesnt in the other meta examples or? | 16:00 |
geektoni | HeikoS: The other meta examples which use toy data works fine | 16:01 |
@HeikoS | as in load as binary? | 16:01 |
geektoni | yes yes | 16:01 |
@HeikoS | so the same file is loaded as binary? | 16:01 |
@HeikoS | check the files and what is different then I guess :) | 16:01 |
geektoni | yep, let's start the debugging session then | 16:02 |
@HeikoS | hehe | 16:02 |
@HeikoS | it is probably some file formatting stuff | 16:02 |
@HeikoS | geektoni: let me know! | 16:11 |
geektoni | HeikoS: sure! :) | 16:12 |
geektoni | HeikoS: okay, I've found the problem | 16:15 |
geektoni | basically | 16:15 |
geektoni | even if you have a binary data file with let's say two classes -1 and 1 | 16:15 |
geektoni | that file has to have those values written as float, like -1.00 or 1.00 | 16:16 |
geektoni | otherwise it will be considered as multiclass | 16:16 |
@HeikoS | i see | 16:16 |
@HeikoS | boooooo! | 16:16 |
@HeikoS | can you file an issue for that? | 16:16 |
@HeikoS | sucks | 16:16 |
geektoni | I need to figure out where this happens inside the code though | 16:16 |
@HeikoS | dont fix it | 16:16 |
@HeikoS | issue it :) | 16:16 |
@HeikoS | entrance task | 16:16 |
@HeikoS | just change the file | 16:17 |
@HeikoS | (for now) | 16:17 |
@HeikoS | unless you want to fix it, then feel free :) | 16:17 |
geektoni | sure! | 16:17 |
geektoni | let's see how complicate it is | 16:17 |
geektoni | HeikoS: lol found the problem https://github.com/shogun-toolbox/shogun/blob/develop/src/shogun/labels/BinaryLabels.cpp#L73 | 16:22 |
@HeikoS | how is that causing the issue? | 16:23 |
geektoni | basically | 16:23 |
@HeikoS | is 1 != 1.0? | 16:23 |
geektoni | I guess it is | 16:23 |
@HeikoS | but m_labels is float64_t | 16:23 |
geektoni | the main issue is that the binary labels are somehow hardcoded | 16:23 |
geektoni | if you have a dataset made by 0 and 1 | 16:24 |
@HeikoS | not sure I follow :) | 16:24 |
geektoni | okay so | 16:24 |
geektoni | Shogun considers binary labels only -1 and 1 | 16:25 |
geektoni | so, if you have a file which contains 0 and 1 | 16:25 |
geektoni | it won't be considered binary, but multiclass | 16:25 |
@HeikoS | ah yes | 16:25 |
@HeikoS | i see now | 16:25 |
@HeikoS | yep that is true | 16:25 |
geektoni | it can be seen as a design choice | 16:25 |
@HeikoS | it is | 16:25 |
@HeikoS | it is not the best one though | 16:26 |
@HeikoS | ah yes of course | 16:26 |
@HeikoS | sorry I knew that | 16:26 |
@HeikoS | you have to do +1 -1 | 16:26 |
@HeikoS | some algos are based on that | 16:26 |
@HeikoS | their mathematical formulation | 16:26 |
@HeikoS | svm | 16:26 |
@HeikoS | perceptron | 16:26 |
@HeikoS | etc | 16:26 |
geektoni | I see I see | 16:26 |
geektoni | but then | 16:26 |
geektoni | I guess that somehow it sees 1 != 1.0 | 16:26 |
geektoni | not sure how it is even possible | 16:27 |
@HeikoS | what? | 16:28 |
@HeikoS | nono | 16:28 |
@HeikoS | it loads it as floats | 16:28 |
@HeikoS | i think you can even write -1 and +1 | 16:28 |
@HeikoS | no need for -1.0 + 1.0 | 16:29 |
@HeikoS | (I think) | 16:29 |
@HeikoS | it is the -1, +1 vs 0,1 | 16:29 |
@HeikoS | that distiguished atm | 16:29 |
@HeikoS | s | 16:29 |
geektoni | but I've tried with a dataset with only -1 and +1 and it was not working either :/ | 16:29 |
@HeikoS | ok scrap | 16:30 |
@HeikoS | but then -1.0 +1.0 works? | 16:30 |
geektoni | lemme try to do it again | 16:30 |
geektoni | yes, -1.0 and +1.0 works | 16:30 |
@HeikoS | ok | 16:30 |
@HeikoS | that is a bug | 16:30 |
gf712 | HeikoS: the label conversion works for my case, but it isnt merged yet because we ended up not being sure if it was the right way to go | 16:31 |
@HeikoS | gf712: ah ok | 16:32 |
@HeikoS | sure | 16:32 |
@HeikoS | we found the issue here | 16:32 |
@HeikoS | geektoni: for now just use the floats in the file | 16:32 |
@HeikoS | this is something to be fixed later | 16:32 |
@HeikoS | not your problem right now I would say | 16:32 |
geektoni | HeikoS: I've tried again and with just -1 and 1 it works | 16:34 |
@HeikoS | aha! | 16:35 |
@HeikoS | sg.rocks | 16:35 |
geektoni | it is just a problem of labels different from -1 and 1 | 16:35 |
@HeikoS | I knew it :D | 16:35 |
geektoni | I probably copied over the wrong files :/ | 16:35 |
geektoni | so, all good now | 16:35 |
@HeikoS | hehe | 16:39 |
@HeikoS | great | 16:39 |
@HeikoS | btw I really like the blog post | 16:39 |
@HeikoS | very nice | 16:39 |
@HeikoS | Let me know once it is published and I will tweet it | 16:39 |
@HeikoS | gf712: so re the labels | 16:49 |
@HeikoS | I think we need to hand the conversion to the user | 16:50 |
@HeikoS | or do it in a base class | 16:50 |
@HeikoS | but doing it inside the algorithms is not good | 16:50 |
gf712 | OK, I agree | 16:52 |
gf712 | need to rethink a bit how to make it easier for users though | 16:52 |
gf712 | for a python user it might not make sense | 16:52 |
@HeikoS | yeah so the first q is whether the user does it or the base class | 16:52 |
@HeikoS | exactly | 16:52 |
gf712 | because you'd expect it to be done for you | 16:52 |
gf712 | yeah | 16:52 |
gf712 | I think base class | 16:52 |
@HeikoS | and then how we deal with meta algorithms | 16:53 |
@HeikoS | cross-validation | 16:53 |
@HeikoS | it doesnt matter for that right? | 16:53 |
@HeikoS | because users just wants the number | 16:53 |
@HeikoS | but e.g. what if the user looks at performance in the folds (using geektoni observable stuff) | 16:54 |
@HeikoS | then the labels might be different than those provided by the user | 16:54 |
@HeikoS | so we need to inform the user of the conversion | 16:55 |
@HeikoS | but then if it happens in ::train | 16:55 |
@HeikoS | we get lots of conversion messages if we do model selection | 16:56 |
@HeikoS | so it is always the "highest level" class that should do it | 16:56 |
@HeikoS | idk | 16:57 |
@HeikoS | maybe we should give it to the user :D | 16:57 |
@HeikoS | or we step back further and say | 16:57 |
@HeikoS | DiscreteLabels | 16:57 |
@HeikoS | which just contain unique values | 16:57 |
@HeikoS | rather than multiclass/binary | 16:57 |
gf712 | I think that might be best | 16:57 |
gf712 | if it is feasible | 16:58 |
@HeikoS | APi wise definitely | 16:58 |
@HeikoS | problem is | 16:58 |
@HeikoS | e.g. svm | 16:58 |
@HeikoS | the optimization problem is based on +1 -1 | 16:58 |
@HeikoS | so which will be assigned to which | 16:58 |
gf712 | but internally you could cast it | 16:58 |
@HeikoS | Discretelabels("A", "B") | 16:58 |
gf712 | and keep track | 16:58 |
@HeikoS | I see | 16:58 |
gf712 | but just in svm | 16:58 |
@HeikoS | so have a map inside | 16:58 |
gf712 | yup | 16:58 |
@HeikoS | to_binary | 16:58 |
@HeikoS | to_multiclass | 16:58 |
gf712 | and it should be cheap | 16:59 |
gf712 | at runtime | 16:59 |
@HeikoS | and those map the label type/value to the [-1,+1] or [0,1,2,3,4] | 16:59 |
@HeikoS | yeah I mean the map can be built lazily | 16:59 |
@HeikoS | if it doesnt exist yet, populate it | 16:59 |
@HeikoS | in some way | 16:59 |
@HeikoS | and for predictions | 16:59 |
@HeikoS | it is mapped backwards? | 16:59 |
gf712 | I guess so? | 17:00 |
gf712 | so you have an internal prediction | 17:00 |
@HeikoS | ok and then we would just get rid of BinaryLabels MuliclassLabels | 17:00 |
@HeikoS | but would just have DenseLabels | 17:00 |
gf712 | that is translated to the labels | 17:00 |
gf712 | yea | 17:00 |
@HeikoS | and make it templated even? | 17:00 |
gf712 | DenseLabels? | 17:00 |
@HeikoS | that is the base class | 17:00 |
@HeikoS | which stores the actual data | 17:00 |
gf712 | yea, it would have to be | 17:00 |
gf712 | well wouldn't have to | 17:01 |
gf712 | but it would make it more efficient | 17:01 |
@HeikoS | I am not sure whether we ever want to actually have labels as "F" | 17:01 |
@HeikoS | rather than just ints? | 17:01 |
@HeikoS | ah | 17:01 |
@HeikoS | what about regression | 17:01 |
@HeikoS | that is real valued | 17:01 |
@HeikoS | currently, the base class holds a float vector | 17:01 |
gf712 | well you need it to be templated | 17:01 |
@HeikoS | and just stores as int | 17:02 |
@HeikoS | that makes some stuff easier, other stuff more complicated | 17:02 |
gf712 | but at the end of the day it makes it easier for the user right? | 17:02 |
@HeikoS | yeah agree | 17:03 |
gf712 | it means that for classification we could maybe even convert something that is a string | 17:03 |
gf712 | to numeric labels | 17:03 |
@HeikoS | well | 17:03 |
@HeikoS | I wonder | 17:03 |
@HeikoS | is that something needed | 17:03 |
@HeikoS | or can we expect that a user converts them to ints | 17:03 |
gf712 | i.e. labels {"Red", "Blue"} | 17:03 |
@HeikoS | or we can offer a method to do that | 17:03 |
gf712 | and then is converted | 17:03 |
@HeikoS | because the factory could accept other types | 17:04 |
@HeikoS | but internally | 17:04 |
@HeikoS | what do we do there? | 17:04 |
@HeikoS | see what I mean? | 17:04 |
gf712 | we would need a check in classifiers | 17:04 |
@HeikoS | labels(["r", "b"]) -> is it converted to ints internally? | 17:04 |
gf712 | just thinking that that is something you can do in sklearn | 17:04 |
gf712 | I think | 17:04 |
gf712 | yes | 17:04 |
gf712 | it just makes life a bit easier for a user | 17:05 |
gf712 | im just thinking when this would go wrong | 17:05 |
@HeikoS | do they convert or actually store the strings? | 17:05 |
gf712 | you would need the string to convert back? | 17:05 |
@HeikoS | I tend to think int values are fine | 17:05 |
gf712 | yea, might complicate things a bit too much | 17:06 |
@HeikoS | what definitely sucks is that our labels need to be contiguous | 17:06 |
@HeikoS | but again, some algos depend on that | 17:06 |
@HeikoS | so the idea of maintaining an internal map might be good | 17:06 |
gf712 | but it would be nice to have just one labels class exposed to the user | 17:06 |
gf712 | and the rest is internal | 17:06 |
@HeikoS | yep | 17:07 |
@HeikoS | but ok | 17:07 |
@HeikoS | that is easy | 17:07 |
@HeikoS | CLabels | 17:07 |
gf712 | but then errors could become more mysterious | 17:07 |
@HeikoS | sure | 17:07 |
@HeikoS | currently, the factory loader for labels decides what subclass to instantiate | 17:07 |
@HeikoS | from specific to general | 17:07 |
@HeikoS | ie first binary, then bla, then bla | 17:07 |
@HeikoS | until one works | 17:07 |
@HeikoS | and then the algos call this conversion call for what they need | 17:07 |
@HeikoS | so let's summarize | 17:09 |
@HeikoS | we want | 17:09 |
@HeikoS | users dont care about label type, they just provide something that is "label-able" | 17:09 |
@HeikoS | internal algos need certain representations -1,+1 or [0,1,2,3] | 17:09 |
@HeikoS | question is where we convert | 17:09 |
@HeikoS | in a factory, so that inside shogun all is stored in the usual format | 17:10 |
gf712 | HeikoS: quick q, is it possible to get a binary label directly from sg.labels factory? | 17:10 |
@HeikoS | if the file is +1, -1 then yes | 17:10 |
gf712 | ah from file ok | 17:10 |
@HeikoS | but there is no conversion | 17:10 |
gf712 | but not from array | 17:10 |
@HeikoS | ah idk | 17:11 |
@HeikoS | I think there might be | 17:11 |
gf712 | a=np.array([1,0] | 17:11 |
gf712 | sg.labels(a) | 17:11 |
gf712 | gives an error | 17:11 |
gf712 | can only use shogun::labels< float64_t >(shogun::SGVector< double >) | 17:11 |
@HeikoS | template <class T> | 17:11 |
@HeikoS | CLabels* labels(SGVector<T> labels) | 17:11 |
gf712 | anyway, sorry was just checking it | 17:11 |
@HeikoS | we have | 17:11 |
-!- wiking [~wiking@2001:67c:10ec:5784:8000::3ff] has quit [Remote host closed the connection] | 17:11 | |
@HeikoS | could do | 17:12 |
-!- wiking [~wiking@2001:67c:10ec:5784:8000::3ff] has joined #shogun | 17:12 | |
@HeikoS | sg.labels(my_file_or_array).to_binary() | 17:12 |
@HeikoS | but that sucks | 17:12 |
@HeikoS | gf712: what would the user see if doing this | 17:12 |
@HeikoS | my_model.apply(data).get("labels") | 17:13 |
@HeikoS | if trained with sg.labels(["r", "b"]) | 17:13 |
@HeikoS | and then also, what would be sg.labels(["r", "b"]).get("labels") | 17:13 |
@HeikoS | I would say in both cases we would need see ["r, "b"] or? | 17:14 |
@HeikoS | so then inside the svm we would do | 17:14 |
gf712 | I would imagine so | 17:14 |
gf712 | but the strings can come later no> | 17:15 |
gf712 | ? | 17:15 |
gf712 | I was just thinking out loud | 17:15 |
@HeikoS | what you mean by later? | 17:15 |
gf712 | well, first you want to make it label agnostic right? | 17:15 |
@HeikoS | yeah sure | 17:15 |
@HeikoS | just example | 17:15 |
@HeikoS | could be all ints | 17:15 |
@HeikoS | just what the user sees | 17:15 |
@HeikoS | the user never sees the internal representation | 17:16 |
@HeikoS | that is your point right? | 17:16 |
gf712 | no, I don't think the user would | 17:16 |
gf712 | yup | 17:16 |
@HeikoS | so then inside my ::train call I could do | 17:16 |
@HeikoS | m_labels.as_binary() | 17:16 |
gf712 | exactly | 17:16 |
gf712 | which creates a map | 17:16 |
gf712 | and casts to the right label type | 17:16 |
-!- wiking [~wiking@2001:67c:10ec:5784:8000::3ff] has quit [Ping timeout: 264 seconds] | 17:16 | |
-!- HeikoS [~heiko@237.pool85-48-187.static.orange.es] has quit [Ping timeout: 255 seconds] | 17:21 | |
-!- HeikoS [~heiko@237.pool85-48-187.static.orange.es] has joined #shogun | 17:22 | |
-!- mode/#shogun [+o HeikoS] by ChanServ | 17:22 | |
@HeikoS | gf712: sorry connection failed | 17:22 |
@HeikoS | whats the last thing you saw from me? | 17:22 |
gf712 | [16:16] <@HeikoS> m_labels.as_binary() | 17:23 |
gf712 | and then I said | 17:23 |
gf712 | [16:16] <gf712> exactly | 17:24 |
@HeikoS | ah ok | 17:24 |
@HeikoS | man | 17:24 |
gf712 | [16:16] <gf712> which creates a map | 17:24 |
@HeikoS | stupic connection | 17:24 |
gf712 | [16:16] <gf712> and casts to the right label type | 17:24 |
@HeikoS | ok so the issue is the time when the map is created | 17:24 |
gf712 | are you using 4G? | 17:24 |
@HeikoS | e.g. xvalidation | 17:24 |
@HeikoS | messing it up | 17:24 |
@HeikoS | yes 4g | 17:24 |
@HeikoS | so what we would need is | 17:24 |
@HeikoS | CMachine::train calls the map-creation invoke | 17:24 |
@HeikoS | but XVal::eval also calls it | 17:25 |
@HeikoS | and since it is lazy | 17:25 |
@HeikoS | the subsequence call from CMachine::train is a nop | 17:25 |
gf712 | mmh not sure I follow | 17:25 |
gf712 | so the map is created fine with train | 17:26 |
gf712 | {"R":1, "B": 0} | 17:26 |
gf712 | and then the eval accesses that map when it does predictions? | 17:26 |
@HeikoS | but think xval | 17:26 |
@HeikoS | the first ::train call | 17:26 |
@HeikoS | might only see a subset of data | 17:27 |
@HeikoS | which e.g. misses one clas | 17:27 |
@HeikoS | class | 17:27 |
gf712 | oh right | 17:27 |
@HeikoS | so the mapping is messed up | 17:27 |
gf712 | I guess in that case it would to insert new keys | 17:27 |
gf712 | hmmm | 17:27 |
@HeikoS | I think you need all information | 17:28 |
@HeikoS | in order to build the mapping | 17:28 |
@HeikoS | how to decide otherwise | 17:28 |
@HeikoS | so the highest level caller needs to do it | 17:28 |
gf712 | so when you call eval in eval does it do bagging or something to determine the label? | 17:28 |
gf712 | xval* | 17:28 |
@HeikoS | it might | 17:29 |
gf712 | but why not have each trained machine do its individual prediction | 17:30 |
gf712 | and then based on that | 17:30 |
gf712 | xval decides the label | 17:30 |
gf712 | having access to the results of all xval machines | 17:30 |
gf712 | or doesn't that work either? | 17:30 |
@HeikoS | not sure I understand it | 17:30 |
@HeikoS | so if xvalidation was a black bo | 17:31 |
@HeikoS | box | 17:31 |
@HeikoS | then we could just do that | 17:31 |
@HeikoS | each ::train call might have different labels | 17:31 |
@HeikoS | ehm mappings | 17:31 |
@HeikoS | but it all doesnt matter since only the accuracy matters | 17:31 |
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has joined #shogun | 17:31 | |
@HeikoS | and if a user wants the fold accuracy, that is fine as well right? | 17:32 |
@HeikoS | mmh so that works actually | 17:32 |
@HeikoS | with the only downside being that the mapping is created multiple times | 17:32 |
@HeikoS | gf712: ok I gotta dash....let's continue the discussion in a bit | 17:34 |
gf712 | HeikoS: ok sure! | 17:34 |
gf712 | btw all good for ati? | 17:34 |
gf712 | I can read it a couple more times | 17:34 |
@HeikoS | yes do that :) | 17:34 |
@HeikoS | think what someone might not like | 17:34 |
@HeikoS | I have another call tomorrow with someone who might be added as a collaborator | 17:35 |
gf712 | what are you most worried about? | 17:35 |
gf712 | that someone wouldn't liek | 17:35 |
@HeikoS | idk | 17:35 |
@HeikoS | too specific | 17:35 |
@HeikoS | not clear enough | 17:35 |
@HeikoS | I like the proposal | 17:35 |
@HeikoS | but maybe it is good to think about criticisms | 17:36 |
gf712 | ok, let me read it with that kind of mindset :D | 17:36 |
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has quit [Ping timeout: 246 seconds] | 17:36 | |
@HeikoS | yeah think about you had to give your money to it | 17:36 |
@HeikoS | also I have put the criteria for evaluation in the doc | 17:36 |
@HeikoS | saw them? | 17:36 |
gf712 | yea, I need to reread them | 17:36 |
gf712 | ok, ill do that | 17:36 |
@HeikoS | thx | 17:38 |
@HeikoS | see you later! | 17:38 |
gf712 | see you | 17:39 |
-!- HeikoS [~heiko@237.pool85-48-187.static.orange.es] has quit [Ping timeout: 258 seconds] | 17:43 | |
-!- HeikoS [~heiko@34.pool85-48-187.static.orange.es] has joined #shogun | 17:47 | |
-!- mode/#shogun [+o HeikoS] by ChanServ | 17:47 | |
-!- HeikoS [~heiko@34.pool85-48-187.static.orange.es] has quit [Ping timeout: 255 seconds] | 18:03 | |
-!- geektoni [5d22ef24@gateway/web/freenode/ip.93.34.239.36] has quit [Quit: Page closed] | 18:34 | |
-!- gf712 [905208ce@gateway/web/freenode/ip.144.82.8.206] has quit [Ping timeout: 256 seconds] | 18:41 | |
-!- wiking [~wiking@c-185-45-237-122.customer.ggaweb.ch] has joined #shogun | 23:45 | |
--- Log closed Fri Apr 26 00:00:49 2019 |
Generated by irclog2html.py 2.10.0 by Marius Gedminas - find it at mg.pov.lt!