--- Log opened Mon Mar 12 00:00:37 2018 | ||
-!- witness [uid10044@gateway/web/irccloud.com/x-qoaueyjaqdhccyka] has joined #shogun | 00:06 | |
-!- witness [uid10044@gateway/web/irccloud.com/x-qoaueyjaqdhccyka] has quit [Quit: Connection closed for inactivity] | 02:47 | |
-!- durovo [~durovo@cc.b7.2ea9.ip4.static.sl-reverse.com] has quit [Remote host closed the connection] | 07:35 | |
-!- durovo [~durovo@cc.b7.2ea9.ip4.static.sl-reverse.com] has joined #shogun | 07:35 | |
-!- wiking [~wiking@huwico/staff/wiking] has joined #shogun | 10:21 | |
-!- mode/#shogun [+o wiking] by ChanServ | 10:21 | |
-!- HeikoS [~heiko@host86-132-201-109.range86-132.btcentralplus.com] has joined #shogun | 11:21 | |
-!- mode/#shogun [+o HeikoS] by ChanServ | 11:21 | |
@sukey | [https://github.com/shogun-toolbox/shogun] Issue https://github.com/shogun-toolbox/shogun/issues/3058 closed by karlnapf | 12:04 |
---|---|---|
-!- HeikoS [~heiko@host86-132-201-109.range86-132.btcentralplus.com] has quit [Ping timeout: 240 seconds] | 13:21 | |
-!- HeikoS [~heiko@host86-132-201-109.range86-132.btcentralplus.com] has joined #shogun | 13:34 | |
-!- mode/#shogun [+o HeikoS] by ChanServ | 13:34 | |
-!- HeikoS [~heiko@host86-132-201-109.range86-132.btcentralplus.com] has quit [Ping timeout: 268 seconds] | 13:42 | |
-!- wuwei [~Vincent@202.120.19.103] has joined #shogun | 13:47 | |
wuwei | hi | 13:47 |
-!- HeikoS [~heiko@86.132.201.109] has joined #shogun | 13:55 | |
-!- mode/#shogun [+o HeikoS] by ChanServ | 13:55 | |
@wiking | wuwei, sup? | 13:56 |
wuwei | wiking, I'm considering the propressor API | 13:57 |
@wiking | wuwei, in what sense? | 13:58 |
wuwei | we want immutable features, so i think preprocessors need some redesign | 13:59 |
@wiking | hahah | 13:59 |
@wiking | yeah | 13:59 |
@wiking | in that case that is definitely the case | 13:59 |
@wiking | the problem is that preprocessors | 14:00 |
wuwei | i am not sure whether it's good to support only on-the-fly preprocessors | 14:00 |
@wiking | are actually part of features | 14:00 |
@wiking | which is in case you wanna be able to support | 14:00 |
@wiking | pipelines | 14:00 |
@wiking | then you would need to take care | 14:00 |
@wiking | of what operator works on what | 14:00 |
@wiking | for exaple | 14:00 |
@wiking | *examples | 14:00 |
@wiking | perprocessors are actually have to use a very different method call etc | 14:01 |
@wiking | than machines | 14:01 |
@wiking | as in case of preprocessors its always | 14:01 |
@wiking | preprocessor.init (on train set) | 14:01 |
@wiking | then train_features.add&apply_preprocessor | 14:01 |
@wiking | etc etc | 14:01 |
@wiking | whereas | 14:01 |
@wiking | if we would have the similar api like scikit learn | 14:01 |
@wiking | then everything is a train/apply | 14:01 |
@wiking | or fit/predict | 14:01 |
@wiking | wuwei, see what i mean? i mean i would stop supporting at all this preprocessor funkyness | 14:02 |
@wiking | in features | 14:02 |
wuwei | yeah i see | 14:02 |
@wiking | of course they are in a way there | 14:02 |
@wiking | becuase of the fact of subset stack | 14:02 |
wuwei | so preprocessors might have very different interface? | 14:03 |
@wiking | i think it should have a machine like | 14:03 |
@wiking | train/apply interface | 14:03 |
wuwei | init + apply | 14:03 |
@wiking | yeah i mean currently that's the story | 14:03 |
@wiking | yes | 14:03 |
@wiking | i mean the story of preprocessors is actually | 14:04 |
@wiking | that they are stackable | 14:04 |
@wiking | i.e. can have a pipeline | 14:04 |
@wiking | already | 14:04 |
@wiking | if you check Features | 14:04 |
@wiking | it has an add_preprocessor | 14:04 |
@wiking | and you can add as much as you want | 14:04 |
wuwei | yes i found it | 14:04 |
@wiking | and they are actually called in sequence | 14:04 |
@wiking | that's like an initial pipeline support | 14:04 |
@wiking | but only for preprocessors | 14:05 |
@wiking | it woudl be better to have the whole thing extended to | 14:05 |
@wiking | the whole library | 14:05 |
@wiking | meaning we can have pipeline of machines and preprocessors | 14:05 |
wuwei | i see, so everything will have a similar interface like fit and predict | 14:05 |
@wiking | that always operate over the (features, label) tupple | 14:06 |
@wiking | *tuple | 14:06 |
@wiking | i mean on the end of the day | 14:06 |
@wiking | no matter what you do | 14:06 |
@wiking | in ML you always operate over your features&labels | 14:06 |
@wiking | once you arrived having features and labels of course | 14:06 |
@wiking | so yeah i would say | 14:07 |
@wiking | that lets refactor for the time being | 14:07 |
@wiking | the whole preprocessor api | 14:07 |
@wiking | to train/apply | 14:07 |
wuwei | yeah, for current features, as processors modify data directly, i would suggest to apply preprocessors when one call get_feature_matrix or vector, and then we can cache the result | 14:08 |
@wiking | cache result? | 14:09 |
wuwei | for example, in CDenseFeatures | 14:10 |
@wiking | mmmm | 14:10 |
@wiking | well | 14:10 |
wuwei | there is already a CCache | 14:10 |
@wiking | if you wanna do immutability | 14:10 |
@wiking | then of course these things are all copied :))) | 14:10 |
@wiking | but yeah having the preprocessed output | 14:10 |
@wiking | cached in one way or another would be good | 14:10 |
@wiking | wuwei, i would refrain myself for using that cache | 14:11 |
@wiking | :D | 14:11 |
@wiking | i mean it's a cache that could be accessible from SWIG interfaces | 14:11 |
@wiking | but in this case you really dont need this | 14:11 |
@wiking | you just want to have a good cache implementation | 14:12 |
@wiking | that preferably uses standards | 14:12 |
@wiking | as you dont really need to expose the cache itself to the swig | 14:12 |
@wiking | just it's content | 14:12 |
wuwei | i see, but what's the point of CCache | 14:13 |
@wiking | well it was used | 14:13 |
@wiking | in the time when everything was actually | 14:13 |
@wiking | swigable | 14:13 |
@wiking | i.e. exposed to swig | 14:13 |
@wiking | and as well there were not really good standards | 14:13 |
@wiking | so many things had to be implemented within shogun | 14:14 |
wuwei | so we should implement a better cache :) | 14:14 |
@wiking | the question is whether we really need to implement one | 14:14 |
@wiking | meaning cant we just take something off from the shelf? | 14:14 |
wuwei | sth like std::map | 14:15 |
wuwei | ? | 14:15 |
@wiking | well | 14:15 |
@wiking | dunno how an std::map would be a good idea | 14:15 |
@wiking | to use for feature cache :) | 14:15 |
wuwei | yeah, i will think about that | 14:18 |
wuwei | but does std classes work with those ref count stuff | 14:18 |
@wiking | you should wrap it with Some | 14:19 |
@wiking | so for example std::vector<Some<CSGObject>> | 14:20 |
@wiking | this will make sure that the ref count is correct | 14:20 |
wuwei | i see | 14:20 |
wuwei | btw, another thing is SGMatrix/ vector | 14:21 |
wuwei | they have shadow copy ctors | 14:21 |
wuwei | so one can always mutate const data by creating shadow copies | 14:22 |
wuwei | my idea is to make copy-ctor deep-copy, and add move ctor that accepts rvalue reference | 14:24 |
-!- HeikoS [~heiko@86.132.201.109] has quit [Ping timeout: 248 seconds] | 14:40 | |
-!- wuwei [~Vincent@202.120.19.103] has quit [Quit: wuwei] | 15:12 | |
-!- HeikoS [~heiko@host86-132-201-109.range86-132.btcentralplus.com] has joined #shogun | 15:14 | |
-!- mode/#shogun [+o HeikoS] by ChanServ | 15:14 | |
-!- wuwei [~Vincent@59.78.1.153] has joined #shogun | 15:39 | |
-!- wuwei [~Vincent@59.78.1.153] has left #shogun [] | 15:52 | |
@HeikoS | wiking: C:\src\vcpkg> .\vcpkg install shogun | 16:18 |
@HeikoS | that works now? | 16:18 |
@wiking | imo should | 16:19 |
@wiking | no | 16:20 |
@wiking | sorry wrong window | 16:20 |
@HeikoS | wiking: cool, I will add that to the installation page then :) | 16:29 |
@HeikoS | kudos! | 16:29 |
@HeikoS | lisitsyn: jojo | 16:29 |
lisitsyn | HeikoS: hey | 16:34 |
@HeikoS | lisitsyn: nice one with the clone | 16:35 |
@HeikoS | lisitsyn: what about the free issue? | 16:35 |
lisitsyn | SG_FREE is fine | 16:35 |
@HeikoS | lisitsyn: also, any more opinions on the as_binary thingi? | 16:35 |
@HeikoS | like, have it be a constructor instead | 16:36 |
lisitsyn | HeikoS: elaborate plz :) | 16:36 |
lisitsyn | sa_binary constructor? | 16:36 |
@HeikoS | new CBinaryLabels(CLabels*) | 16:36 |
lisitsyn | internally? | 16:36 |
@HeikoS | yes | 16:36 |
@HeikoS | and then inside libsvm | 16:36 |
@HeikoS | rather than | 16:36 |
lisitsyn | idk I am fine with copy ctor | 16:36 |
@HeikoS | as_binary, we do new CBinaryLabels(m_labels) | 16:36 |
lisitsyn | what's the motivation? | 16:37 |
@HeikoS | idk | 16:37 |
lisitsyn | yolo | 16:37 |
@HeikoS | viktor suggested | 16:37 |
@HeikoS | but didnt say why | 16:37 |
lisitsyn | what was the initial approach? | 16:37 |
lisitsyn | cast? | 16:37 |
@HeikoS | the my suggested approach is in the PR | 16:38 |
@HeikoS | the as_binary method | 16:38 |
@HeikoS | that gives you a NEW object | 16:38 |
@HeikoS | with potentially the same vector underneath | 16:38 |
@HeikoS | or a new one, if it needs to convert | 16:38 |
@HeikoS | and I guess technically, that is not necessary if the passed labels are already binary labels | 16:38 |
@HeikoS | could cast them | 16:38 |
lisitsyn | how's that different from the other one? | 16:39 |
@HeikoS | what is the other one? | 16:39 |
lisitsyn | new BinaryLabels(other) | 16:39 |
@HeikoS | is is the same thing, just different API | 16:39 |
lisitsyn | I am a bit lost what are the variants we choose from | 16:39 |
lisitsyn | :) | 16:39 |
@HeikoS | what is the best way in your opinon | 16:39 |
lisitsyn | between what and what? | 16:40 |
@HeikoS | not between | 16:40 |
@HeikoS | just like | 16:40 |
@HeikoS | I have a CLabels member pointer | 16:41 |
@HeikoS | but I need to treat is as Binary labels | 16:41 |
@HeikoS | and I want to potentially convert | 16:41 |
lisitsyn | ->as_binary() sounds good to me | 16:41 |
@HeikoS | lisitsyn: kk | 16:41 |
lisitsyn | ah | 16:41 |
lisitsyn | ok wait | 16:41 |
lisitsyn | I think I got the idea | 16:41 |
lisitsyn | as_binary() violates open-closed principle | 16:42 |
lisitsyn | :) | 16:42 |
@HeikoS | explain pls | 16:42 |
lisitsyn | every type we should add | 16:42 |
lisitsyn | into the base class | 16:42 |
@HeikoS | it would | 16:42 |
lisitsyn | that's why new BinaryLabels(labels) sound better | 16:42 |
lisitsyn | for wiking | 16:42 |
lisitsyn | (I guess) | 16:42 |
@HeikoS | I mean labels are pretty finite | 16:43 |
@HeikoS | but ok | 16:43 |
lisitsyn | yes | 16:43 |
@HeikoS | it is a point | 16:43 |
lisitsyn | that's my opinion as well | 16:43 |
lisitsyn | but good practice tells to do this way | 16:43 |
@HeikoS | yep | 16:43 |
lisitsyn | the only thing I'd change | 16:43 |
@HeikoS | K_GAUSSIAN | 16:43 |
@HeikoS | :D | 16:43 |
lisitsyn | is not new BinaryLabels | 16:43 |
lisitsyn | but binary_labels(labels) | 16:43 |
@wiking | imo it's stupid to define | 16:44 |
@HeikoS | inside the C++ code? | 16:44 |
@wiking | a new method | 16:44 |
@wiking | for soemthing that you have | 16:44 |
@wiking | and as well some of those copy-ctors | 16:44 |
@wiking | should jsut be | 16:44 |
@wiking | move-s | 16:44 |
@wiking | *moves | 16:44 |
@wiking | as they dont need to copy the underlying | 16:44 |
@wiking | data | 16:44 |
@wiking | just interpret it differently | 16:44 |
@HeikoS | so just to be clear | 16:45 |
@HeikoS | we have CBinaryLabels(CLabels *) | 16:45 |
@HeikoS | and that checks the type of the passed labels and converts appropriately | 16:45 |
@HeikoS | ? | 16:45 |
lisitsyn | it is not really stupid but nevermind :P | 16:46 |
@wiking | we have already many of these | 16:47 |
@wiking | as_* | 16:47 |
@wiking | i. mean there's already | 16:47 |
@wiking | as<WhateverSGObj> | 16:47 |
@HeikoS | but that method doesnt help here | 16:47 |
@wiking | of course that's a bit different | 16:47 |
@wiking | but still | 16:47 |
@wiking | it just too noisy as well | 16:47 |
@HeikoS | i will put it into the ctor then | 16:47 |
@wiking | HeikoS, really? :) | 16:47 |
@wiking | i though we could just do .as<> | 16:48 |
@wiking | and all good | 16:48 |
@wiking | ;D | 16:48 |
@wiking | but the problem is | 16:48 |
@HeikoS | if I could overload it | 16:48 |
@wiking | that we have much more | 16:48 |
@wiking | pressing matter | 16:48 |
@wiking | s | 16:48 |
@wiking | imo | 16:48 |
@wiking | than these nitpickings | 16:48 |
@wiking | :D | 16:48 |
@HeikoS | well let me put it into the ctor | 16:48 |
@HeikoS | yeah sure | 16:48 |
@HeikoS | just wanna get the thing done | 16:48 |
@wiking | as this is bullshit | 16:48 |
@wiking | bingo | 16:48 |
@wiking | :D | 16:48 |
@wiking | our problem of not having a stable working cv | 16:49 |
@wiking | or a pipeline | 16:49 |
@wiking | is actually a more pressing issue | 16:49 |
@wiking | than any of our last prs in these months | 16:49 |
@wiking | unfortunately | 16:49 |
@wiking | :( | 16:49 |
@HeikoS | such a motivation booster you are sometimes! | 16:49 |
lisitsyn | :) | 16:49 |
@wiking | yeah | 16:49 |
@wiking | i mean this is bs bingo | 16:50 |
@wiking | really | 16:50 |
@wiking | i tried 2 kernels | 16:50 |
@wiking | to be written | 16:50 |
@wiking | for some simple kaggle competitions | 16:50 |
@wiking | you know for a fact | 16:50 |
@wiking | that atm it's almost impossible | 16:50 |
@wiking | to do the job only | 16:50 |
@wiking | with shogun | 16:50 |
@wiking | (i'm not talking now about ETL of the data in the beginning) | 16:50 |
@wiking | i consider pandas or any other tool for that | 16:51 |
lisitsyn | hey can you assign an issue to me? :) | 16:51 |
lisitsyn | I have seen some | 16:51 |
@wiking | there's a shitty one | 16:51 |
@wiking | about dynamic matrix mapping | 16:51 |
@wiking | lemme get it | 16:51 |
lisitsyn | I am in regime of doing one issue at a time, slowly, but I can resolve some if you point | 16:51 |
@wiking | lisitsyn, https://stackoverflow.com/questions/45291071/passing-dynamic-eigen-matrices-to-shogun-cdensefeatures | 16:52 |
@wiking | HeikoS, it's not meant to be a booster it was rather meant to be getting our priorities right | 16:53 |
@wiking | imo atm that is a 'bit' off | 16:54 |
@HeikoS | I think I can set my priorities myself, thanks | 16:54 |
@wiking | lol | 16:54 |
@wiking | i wasnt talking what your priorities should be | 16:54 |
@wiking | i was talking about what shogun's priorities should be | 16:54 |
@wiking | that might not be the same thing | 16:55 |
@HeikoS | sure | 16:55 |
@wiking | so you can do whatever | 16:55 |
@HeikoS | I will check into some of those things | 16:55 |
@wiking | i really dont care | 16:55 |
@HeikoS | but I want to do one thign after another | 16:55 |
@wiking | i'm just saying | 16:55 |
@wiking | that if anybody start using this tool | 16:55 |
@wiking | some very very basic stuff | 16:55 |
@HeikoS | to finish stuff that I have spent time with before | 16:55 |
@wiking | is super not working | 16:55 |
@wiking | that probably will just block | 16:56 |
@wiking | futher advanced use of the tool | 16:56 |
@wiking | but hey | 16:57 |
@wiking | it's a free world | 16:57 |
@wiking | i aint want to fuck with your shit | 16:57 |
@wiking | so do what you feel is the best | 16:57 |
@wiking | but imo that's not how a group should work regarding how to go forward :) | 16:57 |
lisitsyn | it is not really that we don't care about the experience :) | 16:57 |
@wiking | strictly imo | 16:57 |
@wiking | lisitsyn, but we drag those stuff around | 16:57 |
lisitsyn | the thing is that we wanted to change a lot of things, that's why we discuss some of them | 16:58 |
lisitsyn | yeah I totally agree we should do first things first | 16:58 |
@wiking | i mean maybe | 16:58 |
@wiking | its time to throw things out | 16:58 |
@wiking | because those r blocking a lot of progress | 16:58 |
@wiking | as u keep trying to be 'backward compatible' | 16:58 |
lisitsyn | oh :) | 16:59 |
@wiking | meaning keep it working | 16:59 |
@wiking | not you in particular | 16:59 |
@wiking | but all of us | 16:59 |
lisitsyn | it is a bit different from my pov | 16:59 |
lisitsyn | when I try to keep the same behaviour | 16:59 |
lisitsyn | it is rather to not get away from something working at least somehow | 16:59 |
@wiking | yeah | 17:00 |
lisitsyn | I mean getting to something compileable from broken would be exponentially harder | 17:00 |
@wiking | although that presumption | 17:00 |
@wiking | is not always true | 17:00 |
@wiking | compiles, some tests pass | 17:00 |
@wiking | doesn't necesary means that it works | 17:00 |
@wiking | i can give a handful of those examples | 17:00 |
@wiking | of last week | 17:00 |
@wiking | that i faced :) | 17:01 |
lisitsyn | that's true but in what we've changed recently | 17:01 |
lisitsyn | it was rather essential to keep it compileable | 17:01 |
@wiking | yea | 17:01 |
@wiking | but maybe not having it compabtilbe | 17:01 |
@wiking | would have allowed faster moving forward | 17:02 |
lisitsyn | I agree | 17:02 |
@wiking | (i'm talking about all those prs of typedo | 17:02 |
@wiking | that Heiko did | 17:02 |
@wiking | he has spent literally days | 17:02 |
lisitsyn | I am a bit out of the process | 17:02 |
@wiking | to get some of the interfaces in place | 17:02 |
lisitsyn | but I think it is just the same thing, trying to keep it at least executing somehow | 17:02 |
lisitsyn | :) | 17:02 |
lisitsyn | wiking: since we've refactored some things already I think I can focus on fixing something | 17:03 |
lisitsyn | just let me know what's the most important/shameful not-working-thing | 17:03 |
@wiking | yeah but thats the thing | 17:03 |
@wiking | i would rather just break whole cv | 17:03 |
@wiking | and start it from scratch | 17:03 |
@wiking | and the same is for feature | 17:03 |
@wiking | s | 17:03 |
@wiking | as we know where are the 'bad' decisions | 17:04 |
lisitsyn | I am a fan of implementing the same thing twice in a codebase and then do the switch | 17:04 |
lisitsyn | so actually implementing the other cv could be a great idea | 17:05 |
@wiking | same goes for features im | 17:05 |
@wiking | o | 17:05 |
@wiking | and the same io thing | 17:05 |
lisitsyn | I am a bit lost :) | 17:06 |
lisitsyn | I don't mind reworking something but you talked about the crashing/non-working scenarios | 17:07 |
@wiking | we have couple of subsystems | 17:07 |
@wiking | that known to be error-prone | 17:08 |
@wiking | right? | 17:08 |
lisitsyn | yeah | 17:08 |
@wiking | or just straight bad design | 17:08 |
@wiking | that blocks some higher level stuff | 17:08 |
lisitsyn | some of them are | 17:08 |
@wiking | (see preprocessors) | 17:08 |
lisitsyn | true | 17:08 |
@wiking | and their usecase in pipeline | 17:08 |
@wiking | if we would have a pipeline | 17:08 |
@wiking | we all agree that we should follow | 17:08 |
@wiking | operator(Features,Labels) | 17:09 |
@wiking | right? | 17:09 |
@wiking | i.e. train/apply or fit/predict | 17:09 |
@wiking | kind of interface | 17:09 |
@wiking | for all | 17:09 |
@wiking | this is for example | 17:09 |
@wiking | about features/preprocessor | 17:09 |
@wiking | the whole CV is error-prone | 17:10 |
lisitsyn | yeah true true but what exactly you want to do? | 17:10 |
@wiking | same true for the IO functionality | 17:10 |
@wiking | so a) these should be our highest prio to be a descent baseline ML library - IO in this case is not sooooo important | 17:11 |
@wiking | b) maybe it would be better just to have these rewritten | 17:11 |
@wiking | instead of trying to patch them constantly | 17:11 |
@wiking | to keep it wokring | 17:11 |
@wiking | or to amend them to our new requirements | 17:11 |
lisitsyn | oh but I haven't spent any time working on IO | 17:12 |
lisitsyn | most of the time was spent on refactoring parameters | 17:12 |
@wiking | yes | 17:12 |
@wiking | and thats what i tried to say | 17:12 |
@wiking | that although in the very big picture | 17:12 |
@wiking | because of various resasons having refactored | 17:12 |
@wiking | params is a great thing | 17:12 |
@wiking | but until basic stuff | 17:13 |
lisitsyn | but don't parameters matter a lot? I think it is important | 17:13 |
@wiking | like CV | 17:13 |
@wiking | or being able to have a pipeline | 17:13 |
@wiking | not available | 17:13 |
@wiking | params imo are rather 2nd class citizen | 17:13 |
@wiking | in this aspect | 17:13 |
lisitsyn | not sure, I see them really important for model selection | 17:13 |
lisitsyn | CV I agree | 17:14 |
@wiking | but model selection | 17:14 |
lisitsyn | we haven't spent enough time working on CV | 17:14 |
@wiking | without CV? :) | 17:14 |
@wiking | i mean we know that CV just need a redesign | 17:14 |
@wiking | as thats currently a patchpatchpatch | 17:14 |
@wiking | hack | 17:14 |
@wiking | not the most optimal | 17:14 |
@wiking | and sometimes just fails :D | 17:14 |
@wiking | sometimes = when you start using it with some random kaggle dataset :) | 17:15 |
lisitsyn | since new parameters are more or less settled | 17:15 |
lisitsyn | we can fix the CV then | 17:15 |
@wiking | but | 17:15 |
@wiking | for cv | 17:15 |
@wiking | we should first | 17:16 |
@wiking | get the features right, right? | 17:16 |
@wiking | i mean apart from these basic thing not working | 17:16 |
lisitsyn | depends | 17:16 |
@wiking | (lemme paste 2 issues) | 17:16 |
@wiking | https://github.com/shogun-toolbox/shogun/issues/4175 | 17:16 |
@wiking | https://github.com/shogun-toolbox/shogun/issues/4169 | 17:17 |
lisitsyn | ok that's thing to be fixed indeed | 17:19 |
@wiking | and those are feature-like problems | 17:20 |
@wiking | same for stacks | 17:20 |
@wiking | i.e. views of features | 17:20 |
@wiking | or same about preprocessor | 17:20 |
@wiking | wuwei was interested in checking into preprocessors | 17:20 |
lisitsyn | I am not sure this is easy fixed by redesigning everything | 17:20 |
lisitsyn | I mean this means rewriting quite a lot of code | 17:20 |
@wiking | yes | 17:20 |
@wiking | unfortunatley that is the case | 17:20 |
lisitsyn | I am out of time to do this | 17:21 |
@wiking | yes | 17:21 |
@wiking | i know | 17:21 |
@wiking | maybe we are better to have u | 17:21 |
@wiking | as arch in this case | 17:21 |
@wiking | ? | 17:21 |
@wiking | *architect | 17:21 |
lisitsyn | I don't know | 17:21 |
lisitsyn | I can give some advice | 17:21 |
@wiking | and then others are filling your interface | 17:21 |
lisitsyn | but not sure the best ones | 17:21 |
lisitsyn | :) | 17:21 |
lisitsyn | we have to try a few things I believe | 17:21 |
@wiking | i mean we know already most of the shortcomings | 17:22 |
@wiking | of the current features design | 17:22 |
lisitsyn | wiking: https://github.com/shogun-toolbox/shogun/pull/4203#discussion-diff-173634612R354 | 17:24 |
lisitsyn | can you comment on that issue? | 17:24 |
lisitsyn | I put delete[] here | 17:24 |
lisitsyn | but it might be wrong | 17:24 |
@wiking | yes | 17:24 |
@wiking | well | 17:24 |
lisitsyn | HeikoS suggested SG_FREE | 17:24 |
@wiking | we override EEVERYTHING | 17:24 |
@wiking | ;) | 17:24 |
lisitsyn | which sound good | 17:24 |
@wiking | i mean u know in memory.h | 17:25 |
@wiking | we override new/delete | 17:25 |
@wiking | etc | 17:25 |
@wiking | as well as new[] and delete[] | 17:25 |
@wiking | so using delete[] if its allocated | 17:25 |
@wiking | within shogun | 17:25 |
@wiking | should be using the right libraries right delete-or :) | 17:26 |
@wiking | see for example tcmalloc, jemalloc | 17:26 |
@wiking | etc | 17:26 |
@wiking | so essentially | 17:27 |
@wiking | delete[] is fine | 17:27 |
@wiking | lisitsyn, see what i mean | 17:28 |
@wiking | ? | 17:28 |
@wiking | once i've tried all 3 mallocs | 17:30 |
@wiking | they worked :) | 17:31 |
@wiking | (4-5 years ago) | 17:31 |
@wiking | i.e. the allocators where not mixed up | 17:31 |
@wiking | allocator-free-er | 17:31 |
lisitsyn | wiking: yeah but it has to be new[] delete[] or malloc/free | 17:40 |
lisitsyn | that's why I doubt what to put here | 17:40 |
@wiking | lisitsyn, yeah | 18:01 |
@wiking | but they are all using the same free :) | 18:01 |
@wiking | lisitsyn, have you checked memory.cpp | 18:02 |
@wiking | ? | 18:02 |
-!- ayush_IITH [~androirc@106.223.108.10] has joined #shogun | 19:04 | |
-!- ayush_IITH [~androirc@106.223.108.10] has quit [Ping timeout: 264 seconds] | 19:31 | |
@sukey | [https://github.com/shogun-toolbox/shogun] Pull Request https://github.com/shogun-toolbox/shogun/pull/4198 synchronized by karlnapf | 19:57 |
@sukey | [https://github.com/shogun-toolbox/shogun] Pull Request https://github.com/shogun-toolbox/shogun/pull/4203 synchronized by lisitsyn | 20:06 |
@HeikoS | lisitsyn: nice | 20:07 |
lisitsyn | HeikoS: what is? :) | 20:07 |
@HeikoS | the updated thing | 20:07 |
lisitsyn | ah well SG_FREE should be better | 20:07 |
@wiking | lisitsyn,SG_FREE -> sg_generic_free -> delete[] | 20:08 |
@wiking | :) | 20:08 |
@wiking | -> free | 20:08 |
lisitsyn | wiking: that's fine | 20:08 |
@wiking | or any other lib*alloc free | 20:08 |
lisitsyn | wiking: the problem is that it is going to shoot one day | 20:08 |
lisitsyn | I mean if somebody changes that to free | 20:08 |
@wiking | ? | 20:08 |
lisitsyn | malloc -> free | 20:08 |
lisitsyn | new[] -> delete[] | 20:09 |
lisitsyn | this thing | 20:09 |
@wiking | yeah but atm | 20:09 |
@wiking | it's the very same | 20:09 |
lisitsyn | who knows what happens | 20:09 |
lisitsyn | :P | 20:09 |
@wiking | yeah | 20:10 |
@wiking | but the idea is that any malloc | 20:10 |
@wiking | or allocation | 20:10 |
@wiking | is controlled by shogun | 20:10 |
@wiking | i mean whatever you use | 20:10 |
@wiking | that should be the case | 20:10 |
@wiking | and you should be able to count on that | 20:10 |
lisitsyn | very good idea the best idea. should use that! sad! | 20:11 |
@wiking | ? | 20:11 |
* lisitsyn speaks trumpian | 20:11 | |
@wiking | :D | 20:11 |
@wiking | anyhow you should rely on the fact | 20:11 |
@wiking | that that thing wont be broken | 20:11 |
@wiking | even if anybody switches to magic allocator | 20:11 |
lisitsyn | yeah with SG_FREE I am fine | 20:11 |
@wiking | only thing is that they update the memory.cpp | 20:11 |
@wiking | to handle that | 20:12 |
lisitsyn | so we use the best allocator | 20:12 |
lisitsyn | :P | 20:12 |
lisitsyn | very good allocator | 20:12 |
@HeikoS | :D | 20:18 |
@HeikoS | https://twitter.com/deepdrumpf?lang=en | 20:19 |
@HeikoS | lisitsyn, wiking | 20:19 |
@wiking | covfefe | 20:19 |
lisitsyn | wiking: do they have fantastic covfefe in bg? | 20:20 |
lisitsyn | we need the best one | 20:20 |
lisitsyn | :P | 20:20 |
@wiking | ? | 20:21 |
@wiking | :) | 20:21 |
-!- Farouk [81617d04@gateway/web/freenode/ip.129.97.125.4] has joined #shogun | 20:50 | |
Farouk | Hi Everyone. I am interested in the GSOC project_data_application and have an idea in mind which I am not sure is okay. I have some experience in Reinforcement learning and especially related to gaming. For example, I've coded Neural networks to play games such as Pong, and Space Invaders from scratch in Tensorflow and Keras. I would be interested in doing something similar with Shogun, perhaps build a full jupyter notebook on how to b | 21:20 |
Farouk | Does that sound like a reasonable project? | 21:20 |
@HeikoS | Farouk: sadly, there is no RF present in Shogun atm | 21:21 |
@HeikoS | Neither are there serious neural networks | 21:21 |
@wiking | Farouk, anything is possible but that's a lot of work as HeikoS mentioned we dont have RF but more importantly: make sure that you read about requirements for gsoc participations | 21:21 |
@HeikoS | however, we have plans to add a wrapper for say Keras | 21:21 |
@wiking | HeikoS, how ? :) | 21:22 |
@wiking | Farouk, meaning an important part: you need to send in PRs to the library prior to your gsoc application | 21:22 |
@wiking | or the latest before we would start marking your applications | 21:22 |
@HeikoS | so I can imagine a project where the first half is focussing on building an interface Keras<-> Shogun, and the second half would be to re implement some well known algorithm/project using that | 21:22 |
@wiking | *application | 21:22 |
@HeikoS | so doing something in these lines would be a good task to get going with the library, as wiking just mentioned | 21:23 |
@wiking | HeikoS, keras is python only... so i guess in this case you are rather talking of keras generated model? | 21:23 |
@HeikoS | I am referring to the discussing we had in the Budapest hackathon | 21:23 |
@wiking | yeah | 21:24 |
lisitsyn | HeikoS: do we use it for SGObject arrays? | 21:24 |
@wiking | which one? :) | 21:24 |
@wiking | i mean we can only do shogun wise | 21:25 |
@wiking | something like tf wrapping | 21:25 |
@HeikoS | lisitsyn: sadly, yes | 21:25 |
@wiking | because as i said keras is strictly a python lib | 21:25 |
Farouk | HeikoS: I looked at the API Shogun offers and I think I can build it. When I coded it in Tensorflow and Keras, I didn't use any RL tools. I only used the Dense as well as the dropout layers with a simple 2 hidden layers Feed Forward Network and it did fairly well. Looking at Shogun API, it does offer both of these no? | 21:25 |
lisitsyn | HeikoS: ok I'll specialize for SGObject | 21:25 |
@HeikoS | Farouk: it does | 21:25 |
@HeikoS | Farouk: we could also imagine adding a few minor things to the NN parts of shogun, if that is enough for the project | 21:26 |
@HeikoS | Farouk: I guess then key would be to send some patches to that, and to send a PoC for what you wanna do | 21:26 |
@HeikoS | Farouk: data projects can be more independent and the generated code doesnt necessarily need to go back into Shogun, however, it would be better :) | 21:27 |
@HeikoS | but the main goal is something cool to look at, a demo, somethign interactive, etc | 21:27 |
Farouk | HeikoS: Great, I wouldn't mind if the project is two parts, implementing that neural network to play several Atari games, and perhaps adding some code into the Neural Network component of Shogun | 21:28 |
@HeikoS | lisitsyn: I think the case array of SG* we can ignore | 21:28 |
@wiking | Farouk, note that that requires some c++ coding on your side | 21:28 |
@wiking | Farouk, (for extending some parts of the NN subgroup of shogun) | 21:29 |
Farouk | HeikoS: That's fine. I am fairly confident in my C++ and Python. | 21:29 |
@wiking | hence i'd suggest to pick up some of these tasks | 21:29 |
@HeikoS | Farouk: make sure to send some pull requests with samples then :) | 21:29 |
@HeikoS | i gotta grab dinner, cu | 21:29 |
@wiking | https://github.com/shogun-toolbox/shogun/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22 | 21:29 |
Farouk | HeikoS: So just as a few action points on my part, I need to 1) Fix at least one patch in the issues on Shogun before the App deadline and 2) Pull request with my project | 21:30 |
Farouk | HeiskoS: Sure, have a nice dinner :) | 21:30 |
@wiking | Farouk, 1) more than one preferabluy | 21:30 |
@wiking | (check that link i've pasted) | 21:30 |
Farouk | Sure (y). Thanks for the help everyone! | 21:30 |
@wiking | 2) no need for a PR for your project | 21:30 |
@wiking | for your project it'd be better to have a very detailed plan | 21:31 |
@wiking | what are your goals | 21:31 |
@wiking | and a schedule | 21:31 |
@wiking | over those weeks | 21:31 |
@wiking | of gsoc | 21:31 |
Farouk | I see, and then apply with that plan right away on the GSOC website? | 21:31 |
@wiking | Farouk, check our wiki it has all the information how to submit a successful applciation | 21:31 |
@wiking | Farouk, yes | 21:31 |
Farouk | Sure, will do. Got lots of things to do before the deadline. Thanks! :) | 21:32 |
@wiking | we usually give feedback on possible good applications before the deadline | 21:32 |
@wiking | and there's always room to improve in an application | 21:32 |
@wiking | so i would suggest you to have it at least couple of days before the deadline | 21:32 |
@wiking | (the application) so we could suggest you some improvements | 21:32 |
Farouk | Sure thing. I will get started on it now :) | 21:33 |
@wiking | great | 21:33 |
@wiking | good luck | 21:33 |
@wiking | and looking forward for your prs | 21:33 |
Farouk | Thanks a lot for all the help! | 21:33 |
@sukey | [https://github.com/shogun-toolbox/shogun] Issue https://github.com/shogun-toolbox/shogun/issues/4200 | 21:46 |
-!- Farouk [81617d04@gateway/web/freenode/ip.129.97.125.4] has quit [Ping timeout: 260 seconds] | 21:47 | |
@sukey | [https://github.com/shogun-toolbox/shogun] Pull Request https://github.com/shogun-toolbox/shogun/pull/4203 synchronized by lisitsyn | 21:50 |
lisitsyn | HeikoS: addressed the thing | 21:51 |
lisitsyn | objects are free'd now | 21:51 |
lisitsyn | unref'ed I mean :) | 21:52 |
-!- iglesias [~iglesias@f119189.upc-f.chello.nl] has joined #shogun | 22:16 | |
@wiking | iglesias, hello hello | 22:21 |
@wiking | but yeah lisitsyn all in all the problem is that doing Pipeline::train should not only support a DAG imo but as well we should think how that would be done when you suppord DYNLIB | 22:29 |
@wiking | meaning when the elmenents in the pipeline are some stuff that are actually implemented as a plugin | 22:29 |
@wiking | or we just say that factory will take care of it? | 22:29 |
@wiking | so say | 22:30 |
@wiking | auto pipeline = some<Pipleline>(); | 22:30 |
@wiking | pipeline.add_step(preprocessor("Standardizer")) | 22:30 |
@wiking | pipeline.add_step(kernel("Gaussian")) | 22:30 |
@wiking | pipeline.add_step(machine("LibSVM")) | 22:31 |
@wiking | ? | 22:31 |
lisitsyn | yeah sounds good | 22:31 |
lisitsyn | no? | 22:31 |
@wiking | and those factories will take care of dll load? | 22:31 |
lisitsyn | yes | 22:31 |
@wiking | mmm | 22:31 |
@wiking | so when it's sequential | 22:32 |
@wiking | it is obvious that somehow those thing should be 'wrappable' | 22:32 |
@wiking | but say in case of a kernel | 22:32 |
@wiking | i guess you wouldn't do this | 22:32 |
@wiking | but rather | 22:32 |
@wiking | auto m =machine("LibSVM") | 22:32 |
lisitsyn | what's sequential? the pipeline? | 22:32 |
@wiking | m.set_kernel(machine("LibSVM")) | 22:32 |
@wiking | and then do | 22:32 |
@wiking | pipeline.add_step(m) | 22:32 |
@wiking | or? | 22:32 |
@wiking | beucase you know | 22:33 |
lisitsyn | yeah kernel is rather a parameter of machine | 22:33 |
@wiking | what would be the best | 22:33 |
@wiking | is that pipeline elements | 22:33 |
@wiking | are operators over (labels, features) tuple | 22:33 |
@wiking | right? | 22:33 |
@wiking | so that you dont have to actually write a super crazy logic | 22:33 |
@wiking | of how to connect different steps | 22:33 |
lisitsyn | sometimes I wonder if they should be the whole entity | 22:33 |
@wiking | ? | 22:34 |
iglesias | wiking: hola hola! | 22:34 |
lisitsyn | features + labels | 22:34 |
@wiking | ah you mean as something like | 22:34 |
@wiking | DataSet | 22:34 |
lisitsyn | wiking: yeah list<Example> where Example = Features + Label | 22:34 |
@wiking | and then some elements | 22:35 |
lisitsyn | because it is sort of fragile what people do now | 22:35 |
@wiking | just ignore some parts of the datset | 22:35 |
lisitsyn | they split the features and labels and rely on indices | 22:35 |
@wiking | say preprocessors mostly work over features | 22:35 |
lisitsyn | yeah but they can also use labels | 22:35 |
@wiking | but then machines always take the tuple etc | 22:36 |
@wiking | yeah i mean that could be an option | 22:36 |
@wiking | to have a dataset object | 22:36 |
lisitsyn | Dataset = Collection<Example> | 22:36 |
@wiking | yeah | 22:37 |
@wiking | example = datapooint | 22:37 |
@wiking | *datapoint | 22:37 |
@wiking | :) | 22:37 |
lisitsyn | + Optional<Label> | 22:37 |
@wiking | but yeah these are just semantics | 22:37 |
@wiking | and i mean | 22:37 |
@wiking | even having it for test set | 22:37 |
@wiking | is good | 22:37 |
@wiking | because then in all the metrics shit | 22:37 |
@wiking | you have it there | 22:38 |
@wiking | not only the features but the labels | 22:38 |
@wiking | on the other hand you shouldnt care about the features | 22:38 |
@wiking | when you do metrics | 22:38 |
@wiking | :P | 22:38 |
lisitsyn | wiking: I like the terminology from http://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf | 22:38 |
lisitsyn | I think good design would push these notions | 22:38 |
@wiking | yeah sure | 22:38 |
@wiking | i mean i really dont care about the semantics | 22:39 |
@wiking | :) | 22:39 |
@wiking | it could be even KQW and MLQ | 22:39 |
@wiking | :D | 22:39 |
lisitsyn | yeah but it helps to make some decisions :) | 22:39 |
lisitsyn | KGB? | 22:39 |
@wiking | yeah | 22:39 |
@wiking | fsb | 22:39 |
@wiking | :) | 22:39 |
@wiking | whatever | 22:39 |
@wiking | i am a bit ambivalent regardint the collection<example> | 22:40 |
@wiking | but my objection is not so strong | 22:40 |
@wiking | but yeah we need oeprators over that | 22:40 |
@wiking | instead of this weird | 22:40 |
@wiking | mix that we have atm | 22:40 |
@wiking | ah and my favouriout on top | 22:41 |
@wiking | is CDistance | 22:41 |
@wiking | :D | 22:41 |
@wiking | no | 22:41 |
@wiking | CDistribution | 22:41 |
@wiking | :D | 22:41 |
lisitsyn | why? | 22:42 |
@wiking | it doesnt follow anything | 22:42 |
@wiking | :D | 22:42 |
lisitsyn | it could be a part of graphical model | 22:42 |
@wiking | yea anything | 22:43 |
@wiking | just to have it covered by a standard api | 22:43 |
@wiking | :) | 22:43 |
@wiking | meaming train/apply | 22:44 |
@wiking | :) | 22:44 |
@wiking | *meaning' | 22:44 |
lisitsyn | we had some discussion about that | 22:45 |
lisitsyn | I don't remember exactly | 22:45 |
lisitsyn | but we had some good idea | 22:45 |
lisitsyn | ah | 22:45 |
lisitsyn | Distribution is something that is both machine and something | 22:46 |
@wiking | yeah | 22:54 |
@wiking | anyhow | 22:54 |
@wiking | it could train | 22:54 |
@wiking | :D | 22:54 |
@wiking | and it can apply | 22:54 |
@sukey | [https://github.com/shogun-toolbox/shogun] Pull Request https://github.com/shogun-toolbox/shogun/pull/4196 synchronized by syashakash | 22:59 |
--- Log closed Tue Mar 13 00:00:39 2018 |
Generated by irclog2html.py 2.10.0 by Marius Gedminas - find it at mg.pov.lt!