Open in new window / Try shogun cloud
--- Log opened Wed Jul 26 00:00:11 2017
-!- olinguyen [81615ad9@gateway/web/freenode/ip.] has quit [Quit: Page closed]01:30
-!- nikhilweee [~nikhilwee@] has joined #shogun02:35
-!- zoq_ [] has joined #shogun04:21
-!- zoq [] has quit [Read error: Connection reset by peer]04:22
-!- mikeling [uid89706@gateway/web/] has joined #shogun05:41
mikelingHi wiking , for this, I mean sample function been defined as a pure virtual function and I can't add a template parameter into it.05:45
mikelingso I keep the m_rng as a variable in those classes which has sample function05:46
mikelingmaybe it's better to define them as private rather than protected ;)05:46
@wikingjust a sec05:50
@wikingmikeling, i do not understand06:20
@wikingon which line do you mean this?06:20
@wikingmikeling, could u give me an example06:21
@wikingwhere this is a problem?06:21
@wikingmikeling, pin06:49
mikelingwiking: pong I'm having dinner , I will send you a gist or share it on pastbin later06:50
-!- geektoni [] has joined #shogun09:13
@sukey[] vigsterkr pushed 24 commits:10:07
-!- sukey [] has quit [Remote host closed the connection]10:08
-!- HeikoS [] has joined #shogun10:29
-!- mode/#shogun [+o HeikoS] by ChanServ10:29
-!- HeikoS [] has quit [Ping timeout: 240 seconds]10:43
-!- HeikoS [] has joined #shogun11:04
-!- mode/#shogun [+o HeikoS] by ChanServ11:04
geektoniHeikoS: the new CrossValidation is now complete ;)11:15
@wikingmikeling, are you here?12:08
mikelingwiking: yes12:09
@wikingi dont understand this story12:09
@wikingwith the template parameter12:09
mikelingmmm, which part?12:10
@wikingany of this12:10
mikelingI just add a comment on it12:10
@wikingi dont understand what is the problem12:10
@wikingwith having12:10
mikelingwiking: in
@wikingwhat template parameter u wanna add to CTraceSampler/12:11
mikelingwiking: like you mention before
@wikingwhen did i mentioned this? :D12:13
mikelingwiking: I just reply you in the thread we talk about how to add template parameter into sample function12:15
mikelingon the email12:15
@wikingok cool12:15
@wikinglemme see the patched version12:15
@wikingas i'm a bit out of context now12:15
@wikingthe problem with this is12:17
@wikingthat one would expect serialization would serialize the state of the prng :P12:17
@wikinganyhow apart from that lemme see what one could do12:17
mikelingwiking: mmm, sorry I miss the serialization part, I will move it to ctor function12:19
@wikingit has nothing to do with that12:19
@wikinggimme one more sec12:20
@wikingi guess there are no type traits like stuff for concepts, rihgt?12:22
mikelingmmmm,  what's your mean concepts in  type traits ? About template and virtual function part?12:25
-!- zoq_ is now known as zoq12:32
-!- HeikoS [] has quit [Quit: Leaving.]13:11
-!- HeikoS [] has joined #shogun15:44
-!- mode/#shogun [+o HeikoS] by ChanServ15:44
-!- dexeger [~kj@2601:409:8500:5da6:3b94:d2a4:a0c6:d756] has joined #shogun16:50
-!- olinguyen [81615ad9@gateway/web/freenode/ip.] has joined #shogun17:00
olinguyenHeikoS: hey, do you have a moment?17:03
-!- iglesiasg [~iglesiasg@] has joined #shogun17:08
-!- mode/#shogun [+o iglesiasg] by ChanServ17:08
@HeikoSolinguyen: hi17:08
@HeikoSyes sure17:08
@HeikoSwhats up?17:08
olinguyenI started work on obtaining the probability scores for RandomForest ( Could you have a quick look?17:09
olinguyenA few things that I'm uncertain about. I'm keeping the output matrix of all the trees, then computing the combination rule (depending on what it is), and always using mean rule for the probability scores. Does that make sense? I'm also only using it for BinaryLabels, since I'm not sure how to derive the probability scores for all the other classes.17:09
olinguyenThe multiclass probabilities should return a matrix of size [num_samples, num_classes] if i'm not mistaken17:10
@HeikoSyes multiclass is a matrix17:16
@HeikoSfor each class, the probabilities would sum to 117:16
@HeikoSwhat does the apply_get_outputs do?17:16
olinguyenit returns the combined outputs of all trees17:17
olinguyenif it's MajorityVote it would return the label voted upon, if it's MeanRule it returns the probability17:18
@HeikoSso you are not using the apply_get_outputs anymore17:18
@HeikoSah I see17:18
@HeikoSyou refactored it17:18
olinguyenright, because the combination rule was majority vote, i needed a way to still compute the probabilites17:18
olinguyenif the combination rule was majority vote*17:18
@HeikoSok i see that is only in the apply_binary17:19
@HeikoSso let me see if I get this right17:20
@HeikoSI am confused why you need this new method apply_outputs17:20
olinguyenapply_get_outputs returns a SGVector<float64_t> so it either returns labels or probabilities17:21
olinguyeni created another function since i wanted to keep both17:22
@HeikoSah I see17:22
@HeikoSreturn type is different17:22
@HeikoSok I get it now17:22
@HeikoSso you change the binary classification case17:22
@HeikoSthere you call your new method so that you keep all votes17:22
@HeikoSand then you combine them manually in the apply method17:23
@HeikoSconceptually that is the right thing to do17:23
@HeikoScode wise, this needs a bit restructuring17:23
@HeikoSa lot of redundant code there17:23
@HeikoSsince you basically copy the other metho17:23
@HeikoSI suggest you make the apply_get_ouput method call you new method17:24
@HeikoSand then do the combination17:24
@HeikoSand in apply_binary you just call the new one17:24
@HeikoSthen you re-use the code17:24
@HeikoSalso, could you maybe make the naming a bit more clear17:24
@HeikoSapply_get_output was named before, so let's leave that17:25
@HeikoSor something17:25
@HeikoSolinguyen: see what I mean?17:25
@HeikoSolinguyen: oh another thing is the unit test17:25
@HeikoSI guess you copied the data from another test case?17:25
@HeikoSolinguyen: this is also a lot of redundant code17:26
@HeikoSyou could put the data generation into a function, or even better, a fixture17:26
@HeikoSi.e. code that is re-used across tests17:26
olinguyenyea, i'll fix that17:26
@HeikoSanother question17:26
@HeikoSwhere do those numbers come from?17:26
olinguyeni reversed engineering the expected probabilities temporarily. I'll have to come up with actual test cases17:27
@HeikoSolinguyen: fixture you can use to set up the RF, labels, etc and you can use a function to generate the data. This might require touching a few of the existing tests, but helps us cleaning them up big time17:28
@HeikoSolinguyen: ok, so it is not that you just copied the output of the run in there, and then assert against that? :D17:28
@HeikoSolinguyen: if you write python code to generate numbers for test cases, you can always put a link to a gist into a comment in the test17:28
olinguyenno, that is what I did :P. So i will have to fix it17:29
@HeikoSolinguyen: ok, yes that is quite important17:29
@HeikoSyou should do two types of tests17:29
@HeikoSa) you manually calculate what the result should be for fixed (minimal!) data17:29
@HeikoSb) you create a synthetic problem and make sure that the results "make sense", i.e. probabilities >0.5 have class 1 and <0.5 have class 0 in binary (same for multiclass)17:30
@HeikoSand you could also find datasets and compare against sklearns probabilities (roughly)17:30
@HeikoSyou should also add a test that just isolates your new function in a super minimal case17:30
@HeikoSolinguyen: but conceptually, I think you are there17:30
olinguyencool, thanks!17:30
olinguyenbut for the mutliclass case17:31
@HeikoSso nice work, I guess next step is to check whether results make sense17:31
@HeikoSand then clean up tests /refactor17:31
@HeikoSand then PR17:31
@HeikoSah yes multiclass17:31
olinguyencause the individual trees don't provide a probabilty themselves17:31
@HeikoSyes indeed17:31
olinguyenshould this be another issue17:31
@HeikoSdid you check how sklearn does this?17:31
olinguyenbriefly i'll have to go look again17:32
@HeikoSlets check now if you dont mind?17:33
@HeikoSwiking: btw
@HeikoSsklearn is quite minimal with lincensing17:34
@HeikoSolinguyen: where is RandomForestClassifier initially defined?17:35
-!- iglesiasg [~iglesiasg@] has quit [Quit: leaving]17:37
@HeikoSthis is latest master17:38
@HeikoS"The predicted class of an input sample is a vote by the trees in        the forest, weighted by their probability estimates. That is,        the predicted class is the one with highest mean probability        estimate across the trees."17:39
@HeikoSok so their trees have probability output I guess?17:39
olinguyeni believe so17:40
@HeikoSolinguyen: and what trees are they using?17:41
olinguyenI think its CART17:41
olinguyenlemme double check17:41
@HeikoS"This could be done simply by running any standard decision tree algorithm, and running a bunch of data through it and counting what portion of the time the predicted label was correct in each leaf; this is what sklearn does"17:46
@HeikoSfunny enough dougal (the guy who wrote the answer) is one of my colleagues ;)17:46
olinguyenhaha cool :)17:47
olinguyenshould this be a separate issue?17:47
@HeikoSolinguyen: I mean17:47
@HeikoSso sklearn chooses the class where the largest average class probability17:47
@HeikoSand what you are doing is that, but all the probabilities for other classes are 017:47
@HeikoSsee what I mean?17:47
@HeikoSso if you make the "probability estimation trees" work, then you get the binary case automatically17:48
@HeikoSbut I think your patch still applies then17:48
@HeikoSjust the input will be changed17:48
@HeikoSso let's first fix this one for binary17:48
@HeikoSand then do the one for multiclass17:48
olinguyengot it, thanks17:49
@HeikoSI think you need to change something in the way you collect results in the individual trees for the multiclass case17:49
@HeikoSolinguyen: you have a good idea on how to do that?17:49
@HeikoSor shall we dig a bit more=?17:49
olinguyeni think i have an idea17:50
olinguyeni'll give it a try17:50
olinguyenand let you know17:50
@HeikoSok cool17:50
@HeikoSI will check back tonight17:50
@HeikoScan you send me an email with updates in a couple of hours?17:50
@HeikoSin 5hrs good?17:50
@HeikoSIll check my mails then when coming back home17:50
@HeikoScool thx17:50
@HeikoSolinguyen: nice to see this going now, will be very useful17:51
-!- HeikoS [] has quit [Ping timeout: 240 seconds]17:57
-!- geektoni [] has quit [Quit: Leaving.]19:37
-!- geektoni [] has joined #shogun19:37
-!- geektoni [] has quit [Ping timeout: 246 seconds]19:44
-!- dexeger [~kj@2601:409:8500:5da6:3b94:d2a4:a0c6:d756] has quit [Quit: Leaving]20:32
-!- geektoni [] has joined #shogun21:27
-!- geektoni [] has quit [Quit: Leaving.]23:29
--- Log closed Thu Jul 27 00:00:12 2017