--- Log opened Wed Jul 26 00:00:11 2017 | ||
-!- olinguyen [81615ad9@gateway/web/freenode/ip.129.97.90.217] has quit [Quit: Page closed] | 01:30 | |
-!- nikhilweee [~nikhilwee@128.199.66.195] has joined #shogun | 02:35 | |
-!- zoq_ [~marcus_zo@urgs.org] has joined #shogun | 04:21 | |
-!- zoq [~marcus_zo@urgs.org] has quit [Read error: Connection reset by peer] | 04:22 | |
-!- mikeling [uid89706@gateway/web/irccloud.com/x-zdtbswgrvzbvzstn] has joined #shogun | 05:41 | |
mikeling | Hi wiking , for this https://github.com/shogun-toolbox/shogun/pull/3906#issuecomment-317627264, I mean sample function been defined as a pure virtual function and I can't add a template parameter into it. | 05:45 |
---|---|---|
mikeling | https://stackoverflow.com/a/2354671 | 05:45 |
mikeling | so I keep the m_rng as a variable in those classes which has sample function | 05:46 |
mikeling | maybe it's better to define them as private rather than protected ;) | 05:46 |
@wiking | hi | 05:50 |
@wiking | just a sec | 05:50 |
@wiking | mikeling, i do not understand | 06:20 |
@wiking | on which line do you mean this? | 06:20 |
@wiking | mikeling, could u give me an example | 06:21 |
@wiking | where this is a problem? | 06:21 |
@wiking | mikeling, pin | 06:49 |
@wiking | ping | 06:49 |
mikeling | wiking: pong I'm having dinner , I will send you a gist or share it on pastbin later | 06:50 |
mikeling | Sorry | 06:50 |
@wiking | k | 06:50 |
mikeling | :) | 06:50 |
-!- geektoni [~geektoni@93-34-128-40.ip49.fastwebnet.it] has joined #shogun | 09:13 | |
@sukey | [https://github.com/shogun-toolbox/shogun] vigsterkr pushed 24 commits: | 10:07 |
@sukey | https://github.com/shogun-toolbox/shogun/commit/763a0de7a4a38ffd21f984d098395a1cba049060 | 10:07 |
@sukey | https://github.com/shogun-toolbox/shogun/commit/c86a26899b665e14bcb9cfb71db882d069e0b0dc | 10:08 |
@sukey | https://github.com/shogun-toolbox/shogun/commit/d02d5376b7801e2553610b58ef636e09f5a51d8c | 10:08 |
@sukey | https://github.com/shogun-toolbox/shogun/commit/1ebe93c4c1eb97f94022ba2b57dec16cafa43d9a | 10:08 |
@sukey | https://github.com/shogun-toolbox/shogun/commit/9f700b3b03336797f2231535998520f0008264cb | 10:08 |
@sukey | https://github.com/shogun-toolbox/shogun/commit/9b76d222850714fc9a0a6ccbbfd5fd8ee992237e | 10:08 |
@sukey | https://github.com/shogun-toolbox/shogun/commit/d551ebb92484bddb9464ea71c7afe63d881fe90e | 10:08 |
@sukey | https://github.com/shogun-toolbox/shogun/commit/e61736435b39f53d8c8cea9ecaf0f758e583534d | 10:08 |
@sukey | https://github.com/shogun-toolbox/shogun/commit/818fcbfe05402a574aff584d8885d50fdf1a49c0 | 10:08 |
@sukey | https://github.com/shogun-toolbox/shogun/commit/aa86c2f1f4f7b7f34778aefd2de6491d4e1e329c | 10:08 |
@sukey | https://github.com/shogun-toolbox/shogun/commit/df4fe0fb945b6fabfd02ad8066e957c516d88763 | 10:08 |
@sukey | https://github.com/shogun-toolbox/shogun/commit/e87cf4c6d27e01c7f5c161c96c143313bc799600 | 10:08 |
@sukey | https://github.com/shogun-toolbox/shogun/commit/5dec463e107ed03236f24705bdd4ea9618a02b1a | 10:08 |
@sukey | https://github.com/shogun-toolbox/shogun/commit/3fcdaae66dc51c433003efed11e41f466a6f7b6f | 10:08 |
@sukey | https://github.com/shogun-toolbox/shogun/commit/6794f4b9ecfc6ba95993dec94aa10082c694b0a8 | 10:08 |
@sukey | https://github.com/shogun-toolbox/shogun/commit/0c69cc243f4af0741b68ca99789e733326de0b72 | 10:08 |
@sukey | https://github.com/shogun-toolbox/shogun/commit/e6ede2eb835959c8e1d6e2371180216a6563b5df | 10:08 |
@sukey | https://github.com/shogun-toolbox/shogun/commit/4612086993e3a94eafa7c42e7cfb17d57fbdbc81 | 10:08 |
@sukey | https://github.com/shogun-toolbox/shogun/commit/8422a1ff068a89549c3a873c214d84f3f7a1789c | 10:08 |
-!- sukey [~nodebot@ks312251.kimsufi.com] has quit [Remote host closed the connection] | 10:08 | |
-!- HeikoS [~heiko@host-92-0-169-11.as43234.net] has joined #shogun | 10:29 | |
-!- mode/#shogun [+o HeikoS] by ChanServ | 10:29 | |
-!- HeikoS [~heiko@host-92-0-169-11.as43234.net] has quit [Ping timeout: 240 seconds] | 10:43 | |
-!- HeikoS [~heiko@host-92-0-169-11.as43234.net] has joined #shogun | 11:04 | |
-!- mode/#shogun [+o HeikoS] by ChanServ | 11:04 | |
geektoni | HeikoS: the new CrossValidation is now complete https://github.com/shogun-toolbox/shogun/pull/3953 ;) | 11:15 |
@wiking | mikeling, are you here? | 12:08 |
mikeling | wiking: yes | 12:09 |
@wiking | i dont understand this story | 12:09 |
@wiking | with the template parameter | 12:09 |
@wiking | really | 12:09 |
mikeling | mmm, which part? | 12:10 |
@wiking | any of this | 12:10 |
mikeling | I just add a comment on it | 12:10 |
@wiking | i dont understand what is the problem | 12:10 |
@wiking | with having | 12:10 |
@wiking | get_prng<std::mt...> | 12:10 |
@wiking | ? | 12:10 |
mikeling | wiking: in https://github.com/shogun-toolbox/shogun/pull/3906#issuecomment-318008396 | 12:10 |
@wiking | what template parameter u wanna add to CTraceSampler/ | 12:11 |
@wiking | ? | 12:11 |
mikeling | wiking: like you mention before https://pastebin.mozilla.org/9027999 | 12:12 |
@wiking | when did i mentioned this? :D | 12:13 |
mikeling | wiking: I just reply you in the thread we talk about how to add template parameter into sample function | 12:15 |
mikeling | :) | 12:15 |
mikeling | on the email | 12:15 |
@wiking | ok cool | 12:15 |
@wiking | lemme see the patched version | 12:15 |
@wiking | as i'm a bit out of context now | 12:15 |
@wiking | the problem with this is | 12:17 |
@wiking | https://github.com/MikeLing/shogun/blob/817eeeb3a40e3358b6cf7b52c44ca7fedf8311c2/src/shogun/mathematics/linalg/ratapprox/tracesampler/TraceSampler.h#L103-L109 | 12:17 |
@wiking | that one would expect serialization would serialize the state of the prng :P | 12:17 |
@wiking | anyhow apart from that lemme see what one could do | 12:17 |
mikeling | wiking: mmm, sorry I miss the serialization part, I will move it to ctor function | 12:19 |
@wiking | nono | 12:19 |
@wiking | it has nothing to do with that | 12:19 |
@wiking | gimme one more sec | 12:20 |
@wiking | i guess there are no type traits like stuff for concepts, rihgt? | 12:22 |
mikeling | mmmm, what's your mean concepts in type traits ? About template and virtual function part? | 12:25 |
@wiking | http://en.cppreference.com/w/cpp/concept/UniformRandomBitGenerator | 12:25 |
-!- zoq_ is now known as zoq | 12:32 | |
-!- HeikoS [~heiko@host-92-0-169-11.as43234.net] has quit [Quit: Leaving.] | 13:11 | |
-!- HeikoS [~heiko@host-92-0-169-11.as43234.net] has joined #shogun | 15:44 | |
-!- mode/#shogun [+o HeikoS] by ChanServ | 15:44 | |
-!- dexeger [~kj@2601:409:8500:5da6:3b94:d2a4:a0c6:d756] has joined #shogun | 16:50 | |
-!- olinguyen [81615ad9@gateway/web/freenode/ip.129.97.90.217] has joined #shogun | 17:00 | |
olinguyen | HeikoS: hey, do you have a moment? | 17:03 |
-!- iglesiasg [~iglesiasg@217.119.234.214] has joined #shogun | 17:08 | |
-!- mode/#shogun [+o iglesiasg] by ChanServ | 17:08 | |
@HeikoS | olinguyen: hi | 17:08 |
@HeikoS | yes sure | 17:08 |
@HeikoS | whats up? | 17:08 |
olinguyen | I started work on obtaining the probability scores for RandomForest (https://github.com/olinguyen/shogun/commit/6c578d584b27df5f315b56ca2791d9ef84e76491). Could you have a quick look? | 17:09 |
olinguyen | A few things that I'm uncertain about. I'm keeping the output matrix of all the trees, then computing the combination rule (depending on what it is), and always using mean rule for the probability scores. Does that make sense? I'm also only using it for BinaryLabels, since I'm not sure how to derive the probability scores for all the other classes. | 17:09 |
olinguyen | The multiclass probabilities should return a matrix of size [num_samples, num_classes] if i'm not mistaken | 17:10 |
@HeikoS | yes multiclass is a matrix | 17:16 |
@HeikoS | for each class, the probabilities would sum to 1 | 17:16 |
@HeikoS | what does the apply_get_outputs do? | 17:16 |
olinguyen | it returns the combined outputs of all trees | 17:17 |
olinguyen | if it's MajorityVote it would return the label voted upon, if it's MeanRule it returns the probability | 17:18 |
@HeikoS | so you are not using the apply_get_outputs anymore | 17:18 |
@HeikoS | ah I see | 17:18 |
@HeikoS | you refactored it | 17:18 |
olinguyen | right, because the combination rule was majority vote, i needed a way to still compute the probabilites | 17:18 |
olinguyen | yea | 17:18 |
olinguyen | if the combination rule was majority vote* | 17:18 |
@HeikoS | ok i see that is only in the apply_binary | 17:19 |
@HeikoS | so let me see if I get this right | 17:20 |
@HeikoS | I am confused why you need this new method apply_outputs | 17:20 |
olinguyen | apply_get_outputs returns a SGVector<float64_t> so it either returns labels or probabilities | 17:21 |
olinguyen | i created another function since i wanted to keep both | 17:22 |
@HeikoS | ah I see | 17:22 |
@HeikoS | return type is different | 17:22 |
@HeikoS | ok I get it now | 17:22 |
@HeikoS | so you change the binary classification case | 17:22 |
@HeikoS | there you call your new method so that you keep all votes | 17:22 |
@HeikoS | and then you combine them manually in the apply method | 17:23 |
olinguyen | right | 17:23 |
@HeikoS | conceptually that is the right thing to do | 17:23 |
@HeikoS | code wise, this needs a bit restructuring | 17:23 |
@HeikoS | a lot of redundant code there | 17:23 |
@HeikoS | since you basically copy the other metho | 17:23 |
@HeikoS | d | 17:24 |
olinguyen | true | 17:24 |
@HeikoS | I suggest you make the apply_get_ouput method call you new method | 17:24 |
@HeikoS | and then do the combination | 17:24 |
@HeikoS | and in apply_binary you just call the new one | 17:24 |
@HeikoS | then you re-use the code | 17:24 |
@HeikoS | also, could you maybe make the naming a bit more clear | 17:24 |
@HeikoS | apply_get_output was named before, so let's leave that | 17:25 |
@HeikoS | apply_without_combination | 17:25 |
@HeikoS | or something | 17:25 |
@HeikoS | olinguyen: see what I mean? | 17:25 |
@HeikoS | olinguyen: oh another thing is the unit test | 17:25 |
@HeikoS | I guess you copied the data from another test case? | 17:25 |
olinguyen | yea | 17:25 |
@HeikoS | olinguyen: this is also a lot of redundant code | 17:26 |
@HeikoS | you could put the data generation into a function, or even better, a fixture | 17:26 |
@HeikoS | i.e. code that is re-used across tests | 17:26 |
olinguyen | yea, i'll fix that | 17:26 |
@HeikoS | another question | 17:26 |
@HeikoS | EXPECT_DOUBLE_EQ(1.0,values_vector[0]); | 17:26 |
@HeikoS | where do those numbers come from? | 17:26 |
olinguyen | i reversed engineering the expected probabilities temporarily. I'll have to come up with actual test cases | 17:27 |
@HeikoS | olinguyen: fixture you can use to set up the RF, labels, etc and you can use a function to generate the data. This might require touching a few of the existing tests, but helps us cleaning them up big time | 17:28 |
@HeikoS | olinguyen: ok, so it is not that you just copied the output of the run in there, and then assert against that? :D | 17:28 |
@HeikoS | olinguyen: if you write python code to generate numbers for test cases, you can always put a link to a gist into a comment in the test | 17:28 |
olinguyen | no, that is what I did :P. So i will have to fix it | 17:29 |
@HeikoS | olinguyen: ok, yes that is quite important | 17:29 |
@HeikoS | you should do two types of tests | 17:29 |
@HeikoS | a) you manually calculate what the result should be for fixed (minimal!) data | 17:29 |
@HeikoS | b) you create a synthetic problem and make sure that the results "make sense", i.e. probabilities >0.5 have class 1 and <0.5 have class 0 in binary (same for multiclass) | 17:30 |
@HeikoS | and you could also find datasets and compare against sklearns probabilities (roughly) | 17:30 |
@HeikoS | you should also add a test that just isolates your new function in a super minimal case | 17:30 |
@HeikoS | olinguyen: but conceptually, I think you are there | 17:30 |
olinguyen | cool, thanks! | 17:30 |
olinguyen | but for the mutliclass case | 17:31 |
@HeikoS | so nice work, I guess next step is to check whether results make sense | 17:31 |
@HeikoS | and then clean up tests /refactor | 17:31 |
@HeikoS | and then PR | 17:31 |
@HeikoS | ah yes multiclass | 17:31 |
olinguyen | cause the individual trees don't provide a probabilty themselves | 17:31 |
@HeikoS | yes indeed | 17:31 |
olinguyen | should this be another issue | 17:31 |
@HeikoS | did you check how sklearn does this? | 17:31 |
olinguyen | briefly i'll have to go look again | 17:32 |
@HeikoS | lets check now if you dont mind? | 17:33 |
olinguyen | sure | 17:33 |
@HeikoS | wiking: btw https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/voting_classifier.py#L12 | 17:33 |
@HeikoS | sklearn is quite minimal with lincensing | 17:34 |
@HeikoS | olinguyen: where is RandomForestClassifier initially defined? | 17:35 |
olinguyen | https://github.com/scikit-learn/scikit-learn/blob/4d9a12d175a38f2bcb720389ad2213f71a3d7697/sklearn/ensemble/forest.py | 17:35 |
-!- iglesiasg [~iglesiasg@217.119.234.214] has quit [Quit: leaving] | 17:37 | |
@HeikoS | https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/forest.py#L741 | 17:38 |
@HeikoS | this is latest master | 17:38 |
@HeikoS | "The predicted class of an input sample is a vote by the trees in the forest, weighted by their probability estimates. That is, the predicted class is the one with highest mean probability estimate across the trees." | 17:39 |
@HeikoS | ok so their trees have probability output I guess? | 17:39 |
olinguyen | i believe so | 17:40 |
@HeikoS | olinguyen: and what trees are they using? | 17:41 |
olinguyen | I think its CART | 17:41 |
olinguyen | lemme double check | 17:41 |
@HeikoS | RandomTreesEmbedding | 17:42 |
@HeikoS | sry | 17:42 |
@HeikoS | olinguyen: https://stats.stackexchange.com/questions/193424/is-decision-tree-output-a-prediction-or-class-probabilities | 17:45 |
@HeikoS | "This could be done simply by running any standard decision tree algorithm, and running a bunch of data through it and counting what portion of the time the predicted label was correct in each leaf; this is what sklearn does" | 17:46 |
@HeikoS | funny enough dougal (the guy who wrote the answer) is one of my colleagues ;) | 17:46 |
olinguyen | haha cool :) | 17:47 |
olinguyen | should this be a separate issue? | 17:47 |
@HeikoS | olinguyen: I mean | 17:47 |
@HeikoS | so sklearn chooses the class where the largest average class probability | 17:47 |
@HeikoS | and what you are doing is that, but all the probabilities for other classes are 0 | 17:47 |
@HeikoS | see what I mean? | 17:47 |
olinguyen | yea | 17:48 |
@HeikoS | so if you make the "probability estimation trees" work, then you get the binary case automatically | 17:48 |
@HeikoS | but I think your patch still applies then | 17:48 |
@HeikoS | just the input will be changed | 17:48 |
@HeikoS | so let's first fix this one for binary | 17:48 |
@HeikoS | and then do the one for multiclass | 17:48 |
olinguyen | got it, thanks | 17:49 |
@HeikoS | I think you need to change something in the way you collect results in the individual trees for the multiclass case | 17:49 |
@HeikoS | olinguyen: you have a good idea on how to do that? | 17:49 |
@HeikoS | or shall we dig a bit more=? | 17:49 |
olinguyen | i think i have an idea | 17:50 |
olinguyen | i'll give it a try | 17:50 |
olinguyen | and let you know | 17:50 |
@HeikoS | ok cool | 17:50 |
@HeikoS | I will check back tonight | 17:50 |
olinguyen | cool | 17:50 |
@HeikoS | can you send me an email with updates in a couple of hours? | 17:50 |
olinguyen | sure | 17:50 |
@HeikoS | in 5hrs good? | 17:50 |
olinguyen | yea | 17:50 |
@HeikoS | Ill check my mails then when coming back home | 17:50 |
@HeikoS | cool thx | 17:50 |
olinguyen | thanks! | 17:51 |
@HeikoS | olinguyen: nice to see this going now, will be very useful | 17:51 |
-!- HeikoS [~heiko@host-92-0-169-11.as43234.net] has quit [Ping timeout: 240 seconds] | 17:57 | |
-!- geektoni [~geektoni@93-34-128-40.ip49.fastwebnet.it] has quit [Quit: Leaving.] | 19:37 | |
-!- geektoni [~geektoni@93-34-128-40.ip49.fastwebnet.it] has joined #shogun | 19:37 | |
-!- geektoni [~geektoni@93-34-128-40.ip49.fastwebnet.it] has quit [Ping timeout: 246 seconds] | 19:44 | |
-!- dexeger [~kj@2601:409:8500:5da6:3b94:d2a4:a0c6:d756] has quit [Quit: Leaving] | 20:32 | |
-!- geektoni [~geektoni@93-34-128-40.ip49.fastwebnet.it] has joined #shogun | 21:27 | |
-!- geektoni [~geektoni@93-34-128-40.ip49.fastwebnet.it] has quit [Quit: Leaving.] | 23:29 | |
--- Log closed Thu Jul 27 00:00:12 2017 |
Generated by irclog2html.py 2.10.0 by Marius Gedminas - find it at mg.pov.lt!