Open in new window / Try shogun cloud
--- Log opened Fri Aug 04 00:00:24 2017
@wikingTrixis, the current version of libsvm does not03:19
@wikingour kernels are multithreaded03:20
@wikingmikeling, sorry yesterday i've crashed03:20
@wikingi'll have time today03:20
@wikingto check onto things03:20
@wikingin 1-1.5 i'll have some fixes for you03:20
-!- http_GK1wmSU [~deep-book@] has joined #shogun05:49
-!- http_GK1wmSU [~deep-book@] has left #shogun []05:52
-shogun-buildbot:#shogun- Build nightly_all #23 is complete: Success [build successful] -
mikelingwiking: thanks a lot!07:27
@wikingHeikoS, ping?08:34
@wikingHeikoS, i wonder why some base interfaces got into gpl08:34
@wikingi know that it uses SLEP08:35
@wikingbut basically what we should do is to make MulticlassLogisticRegression.h an interface/abstract class08:35
@wikingand have an implementation like SlepMulticlassLogisticRegression.h08:35
@wikingthat is using slep to do it08:35
@wikingHeikoS, ping?09:39
@wikingmikeling, i've just rebased again and fixed some of the errors09:50
@wikingcan you check now?09:50
@wikingor i mean when you have time09:50
-!- travis-ci [] has joined #shogun10:00
travis-ciit's Viktor Gal's turn to pay the next round of drinks for the massacre he caused in shogun-toolbox/shogun:
-!- travis-ci [] has left #shogun []10:00
-!- HeikoS [] has quit [Quit: Leaving.]10:10
Trixiswiking: i was mostly wondering at what point should my own multithreading kick in10:15
Trixis(cant have it kick in from the start, otherwise i end up with libgomp errors all over the place)10:33
mikelingwiking: hi, sorry for the late reply. I think those errors has been fixed now :)12:24
mikelingthank you12:24
-!- HeikoS [] has joined #shogun12:58
-!- mode/#shogun [+o HeikoS] by ChanServ12:58
-!- HeikoS [] has quit [Ping timeout: 240 seconds]13:33
-!- travis-ci [] has joined #shogun14:18
travis-ciit's Olivier's turn to pay the next round of drinks for the massacre he caused in shogun-toolbox/shogun:
-!- travis-ci [] has left #shogun []14:18
-shogun-buildbot:#shogun- Build deb3 - interfaces #78 is complete: Success [build successful] -
Trixiswiking: btw i was wondering, whats the rationale behind features/labels not supporting multiple independent subsets at once?15:01
@wikingjust a sec15:01
@wikingTrixis, well it's not rationale16:11
@wikingit's a bottleneck atm16:12
Trixisright, makes sense16:12
@wikingcurrently micmn is working on fixing that in Features16:12
Trixiswiking: yeah i found i had to create a wrapper to make that operation stateless (via copying the whole object ofc) so that i could use it in map reduce16:27
Trixiswiking: also the shogun_num_threads limit is global for the entire program, right?16:38
Trixiswiking: i.e. suppose i set it to 8, then dispatch 32 threads, each training a single classifier, then only at most 8 classifiers will be trained in parallel?16:39
-!- StarmanDeluxe [~StarmanDe@unaffiliated/starmandeluxe] has joined #shogun16:40
-!- StarmanDeluxe [~StarmanDe@unaffiliated/starmandeluxe] has left #shogun ["WeeChat 1.9-dev"]16:41
@wikingshogun_num_threads is global16:49
@wikingwe dont have yet the concept of openmp teams16:50
Trixiswell, fuck cant set customkernel matrix because i cant cast the kernel returned by combinedkernel to customkernel :| apparently this is correct behavior as per swig
Trixisi guess a dirty hack to get around it atm is to insert a new custorm kernel at the position and delete the old one?17:08
-!- HeikoS [] has joined #shogun17:18
-!- mode/#shogun [+o HeikoS] by ChanServ17:18
-!- HeikoS [] has quit [Ping timeout: 240 seconds]17:22
-!- HeikoS [] has joined #shogun17:37
-!- mode/#shogun [+o HeikoS] by ChanServ17:37
olinguyenHeikoS: sorry for the spam commits, I had a little trouble with the style checker17:39
@HeikoSolinguyen: no worries17:39
@HeikoSgithub recently added the "squash option"17:39
@HeikoSso I can just turn them all into a single one17:40
@HeikoSand rewrite the message ;)17:40
@HeikoSbtw dont use git commit -a17:40
@HeikoSas this always adds data17:40
@HeikoScan you explain me this test:
@HeikoSolinguyen: just sent a review for the PR as well17:46
olinguyensure & thanks17:46
@HeikoSI am here for another hour or so17:46
@HeikoSso let's discuss17:46
olinguyeni generated toy data where the output is 1 when features 1 and 2 are < 517:46
olinguyenso and tested on data where the probability was 1 or 017:46
@HeikoSI get the toy data generation17:47
@HeikoSI dont get the second17:47
@HeikoS"tested on data where the prob was 1 or 0"17:47
@HeikoSyou mean the RF assigns 100% confidence in its prediction17:47
@HeikoSI.e. all trees give the same result17:47
olinguyencorrect and that is when features 1 and 2 are < 517:47
@HeikoSI see17:48
@HeikoSok so then17:48
@HeikoSno seed necessary, and no single thread17:48
@HeikoSsince that doesnt change the fact that all trees will agree17:48
olinguyenyea, you're right17:48
@HeikoStest name should also reflect that17:49
@HeikoSor something nicer17:49
@HeikoSbut something that kind of explains what happens17:49
@HeikoSand what the rationale is17:49
@HeikoSthen next thing17:49
@HeikoScan we have a test where the trees don't all agree?17:49
@HeikoSlike a dataset where the class labels are random17:50
olinguyensure, i'd use like EXPECT_NEQ?17:50
@HeikoSI am more thinking17:50
@HeikoSsay you have a dataset where you just assign random class labels17:50
@HeikoSsay all features are gaussian17:50
@HeikoSand then you just randomly give them +1, -1 labela17:50
@HeikoSor rather 0,1,217:51
@HeikoSthen on prediction17:51
@HeikoSthe confidences should be spread more or less evenly17:51
@HeikoSsee what I mean?17:51
@HeikoSso you can add a rough check for that in the test, calibrate it so that it passes almost most of the time17:51
olinguyenyea, i'm just unsure what value assertion i'd make in that case?17:51
olinguyenlike the probability outputs are likely fluctuating17:52
@HeikoSEXPECT_NEAR(score, 0.3, 0.1)17:52
@HeikoSsomething like this17:52
olinguyenok got it17:52
@HeikoSrun it a few times, observe17:52
@HeikoSand then give it some headroom17:52
@HeikoSso that it doesnt fail17:52
@HeikoSbut it catches some problems if somebody would screw up the scores17:52
@HeikoSand then name this like "scores_random_labels"17:53
@HeikoSor so17:53
@HeikoS(method name should be in there somehow)17:53
@HeikoSI think we can merge once that is done17:53
olinguyenIn that case17:54
olinguyendo you think it's a good idea to follow sklearn's test here:
olinguyeni believe the probabilities are quite similar so with headroom the test will be similar17:54
@HeikoSwhat exactly does it do?17:54
@HeikoScompare against logistic regression scores?17:54
olinguyenno, the randomforestclassifier17:54
olinguyenSo with the same input X, i'd compare with np.array([[0.8, 0.2], [0.8, 0.2], [0.2, 0.8], [0.3, 0.7]])17:55
olinguyenon shogun's RF17:55
@HeikoSI see17:55
@HeikoSso you are saying17:55
@HeikoSthat if we have random number comparison with headroom17:56
@HeikoSwhy not compare against sklearn test directly?17:56
@HeikoSif the algorithm implementation is the same, then that is a very sensible thing to do17:56
@HeikoSmaybe add two more tests then17:56
@HeikoSthe one with the random labels17:56
@HeikoSand the one for sklearn17:56
olinguyenok, sure17:56
@HeikoSthe more different ones the better17:57
olinguyencan we chat a little bit about the data project?17:57
olinguyenunless you had a few things to add17:57
olinguyenon the RF stuff17:57
@HeikoSno thats all18:01
@HeikoSI am curious how the RF behaves18:01
@HeikoSespecially when we add the lag features18:01
olinguyenSo I'm a little uncertain about the incorporation of the lagged features18:01
olinguyenWhat i'm trying to do right now is to extract time series data (first 24 hours) for each feature of dying patients and normal patients (that's a lot of data!)18:02
olinguyenFrom my understanding, using the time-lagged features would help me predict how a time series would look like, given past data. But that doesn't help predict the mortality outcome. How did you see the use of time-lagged features in this case?18:02
olinguyene.g. i have points (t-2, t-1) and i'm trying to predict (t+1). that's adding lagged features are, or am i seeing it wrong?18:03
@HeikoSthats exactly it18:03
@HeikoSbut it depends all on the target, i.e. what you are trying to predict18:03
@HeikoScurrently, you are predicting mortality at a fixed time point t right?18:04
@HeikoSand you use a snapshot of data at an earlier time point for that18:04
olinguyenright now: using the first 24 hours aggregated, i'm seeing if the patient will die in the hospital18:04
@HeikoSyes exactly18:04
@HeikoSthat is "ever"18:04
@HeikoSbut that is not the most interesting question (and also hard)18:05
@HeikoSbut what about the question "will the patient die tomorrow/next week /  next month18:05
@HeikoSthat is a bit more interesting for the hospital18:06
@HeikoSas they can react18:06
@HeikoSif you just tell them a patient is going to die while here, that is not going to help immensely18:06
@HeikoSbut if you tell them; I am sure he won't die next week .... and then suddenly : I am certain she dies next week18:06
@HeikoSthat is more useful18:07
olinguyenyea i see your point18:07
olinguyeni'm having trouble visualizing it in a time series binary classification setting18:07
olinguyene.g. i have a N time-series of patients heartrate18:07
olinguyenand i have binary outcomes at different points in time (next day, next week, next month, next year)18:07
@HeikoSyou have to think that you will have a pair of (X,y) for every point in time that you predict18:08
@HeikoSwhere X is patient data18:08
@HeikoSand y is "dies next ____"18:08
@HeikoSand then you have to generate those (x,y) pairs for all time points in the time series18:08
@HeikoSfor example for every week18:08
@HeikoSor every day18:08
@HeikoSolinguyen: if it helps, I can write you a little example notebook18:10
@HeikoSusing toy data in 1d18:10
olinguyenyea i think that would be helpful18:10
olinguyenwould the X be a time series up till outcome y?18:10
olinguyenLike if i have 20 measurements of heartrate in the first 24 hours for a patient (at different time intervals), and i have the death outcome at 1 day, 1 week, 1 month18:13
@HeikoSolinguyen: well the X is the features you use from the time series up till outcome y18:14
@HeikoSso you can decide what to use there18:14
@HeikoS-raw value18:14
@HeikoS-lagged average18:14
@HeikoS-linear fit slope18:14
@HeikoSall this adds auto-regressive structure18:15
@HeikoSyour RNN will be just another set of features18:15
@HeikoSolinguyen: a good way to think about this is if you were to use this as a system in real life18:15
@HeikoSolinguyen: and imagine you are doing the decision yourself18:16
@HeikoSso you are confronted with the patient record up to time t18:16
@HeikoSand then you are asked the question whether the patient will die next week18:16
@HeikoSso you can use all information you have of the patient up to time t18:16
@HeikoSand you can extract certain summaries from that18:16
@HeikoSthen you predict a chance of dying18:16
@HeikoSthen, the next day (t+1), you are asked again whether the patient will die next week18:17
@HeikoSso you give another answer18:17
@HeikoSand so on18:17
@HeikoSnow replacing your manual answer wiht what actually happened will be your training data18:17
olinguyenyea thats a nice way to view it18:17
olinguyeni see it better18:18
@HeikoSI suggest a daily time resoluition for the training data18:19
@HeikoSand I suggest 1day ahead, 1week ahead, 1month ahead18:19
@HeikoSin terms of predicting mortality18:19
@wikingmikeling, does the unit tests pass in that PR?18:20
@wikingi.e. in
olinguyenok, i'll give that a shot18:20
@wikingor it's a WIP18:20
olinguyenHeikoS: i'll finish the RF PR but i'll tackle that next18:20
@HeikoSolinguyen: yeah one thing at a time18:20
@HeikoSolinguyen: for the time series stuff18:21
@HeikoScan you prototype a very quick and dirty example of what we just discussed18:21
@HeikoSusing only heart rate or so18:21
@HeikoSso that we make sure we are on the same page18:21
olinguyensure, will do!18:21
mikelingwiking: no, but I think we do it commit by commit18:21
@wikingmikeling, okey!18:22
mikelingOtherwise you need review more than 2000lines of code at once18:22
@wikingpip install18:22
@wikingsometimes i'm like wtf is happening in this world18:22
@wiking'radically efficient'18:22
@HeikoSolinguyen: cool! something really quick18:23
@HeikoSbut where you show that you get the concept of generating the training data, and the lagged features18:23
Trixisi should learn to do debugging and testing on small datasets :|18:28
olinguyenHeikoS: I'll send a draft by tonight18:29
@HeikoSolinguyen: cool! doent need to be perfect18:29
Trixiswiking: what exactly is that thing. kinda reads like a startup pitch, lol18:34
-!- HeikoS [] has quit [Ping timeout: 240 seconds]18:57
Trixiswiking: this is probably a completely dumb question19:07
Trixisbut when im setting a custom kernel matrix for classification19:07
Trixishow do i get around the dimension check19:07
Trixisi mean its obvious its not going to be a square matrix like the matrix the classifier was trained on? or do i create a square matrix, and keep all entries but the ones in the lhs x rhs block 0?19:08
Trixiswiking: right its probably because im deleting / inserting the kernel, unfortunately cant get around that b/c cant cast19:27
Trixiswiking: i guess the only alternative is to retain a java reference to all customkernels19:35
Trixisinstead of accessing it through combinedkernel19:35
Trixisyep works im an idiot, shouldve though of it earlier19:40
-!- mikeling [uid89706@gateway/web/] has quit [Quit: Connection closed for inactivity]21:37
-!- HeikoS [] has joined #shogun23:35
-!- mode/#shogun [+o HeikoS] by ChanServ23:35
@HeikoSolinguyen: hi23:35
@HeikoSIll go to bed soon23:35
@HeikoSjust saying, Ill be away over the next 2 days, back on Monday23:35
@HeikoSI can still review things a bit23:35
@HeikoSbut have to use my phone ;)23:35
olinguyenok, np23:36
olinguyenenjoy your weekend :)23:36
@HeikoSyou too!23:36
-!- HeikoS [] has quit [Remote host closed the connection]23:42
--- Log closed Sat Aug 05 00:00:25 2017