Open in new window / Try shogun cloud
--- Log opened Thu Jun 15 00:00:14 2017
-!- olinguyen [81615ad9@gateway/web/freenode/ip.] has quit [Ping timeout: 260 seconds]00:04
-!- olinguyen [81615ad9@gateway/web/freenode/ip.] has joined #shogun00:37
-!- olinguyen [81615ad9@gateway/web/freenode/ip.] has quit [Ping timeout: 260 seconds]02:10
-!- johklu [c1abba08@gateway/web/freenode/ip.] has joined #shogun09:02
-!- johklu [c1abba08@gateway/web/freenode/ip.] has quit [Client Quit]09:02
-!- geektoni [] has joined #shogun09:09
-!- shogun-buildbot [] has quit [Ping timeout: 240 seconds]09:15
-!- shogun-buildbot [] has joined #shogun09:15
micmnwiking: I'm doing the linalg splitting thing09:16
-!- HeikoS [] has quit [Quit: Leaving.]09:43
@sukeyPull Request #3846 "Split Eigen3's linalg backend into header and implementation."  opened by micmn -
@sukeyPull Request #3846 "Split Eigen3's linalg backend into header and implementation."  synchronized by micmn -
-!- HeikoS [] has joined #shogun12:42
-!- mode/#shogun [+o HeikoS] by ChanServ12:42
-!- HeikoS [] has quit [Remote host closed the connection]12:43
-!- HeikoS [] has joined #shogun12:45
-!- mode/#shogun [+o HeikoS] by ChanServ12:45
@HeikoSmicmn: jo12:58
@HeikoSgeektoni: hi12:58
@HeikoSmicmn: any thoughts about the non-templated linalg interface?12:58
micmnnot yet, this morning I've worked on the splitting thing and the blog post12:59
micmnit is a nice idea13:00
geektonihi HeikoS13:00
@HeikoSmicmn: ok no worries, just asking13:00
@HeikoSmicmn: so what would need discussion is:13:01
micmnand I thought about it a little but for now I've no clue on how to do that13:01
@HeikoSwhere does the switch from run-time types to compile time type calls happen, and how?13:01
@HeikoSthere could be something like13:01
@HeikoSswitch (matrix.dtype) { case REAL: linalg_impl::method<float64t>(SGMatrix<float64_t>(matrix) ....13:02
@HeikoSgenerated with macros13:02
@HeikoS(which is kind of similar to whats going on in the LARS/LDA atm13:02
@HeikoSbut it is not that nice and bloats code13:02
@HeikoSthe other way would be to register function pointers based on the type in some way ... discussed that with wiking a while ago13:03
@HeikoSmicmn: ok, I mean the stuff you are doing right now is more important anyways, just wanted to point you to that a bit, in case you happen to have an idea13:03
@HeikoSgeektoni: re the progress bar, I think there are some algos that could benefit from it :)13:03
micmnsure it would be very nice to push the data type inside linalg13:04
micmnbut that requires a big sgmatrix/vector refactoring right?13:05
micmnhow do you plan to do the transition?13:05
-!- TingMiao [uid229534@gateway/web/] has joined #shogun13:18
@HeikoSmicmn: sorry for the delay13:28
@HeikoSmicmn: so I dont think we would re-factor SGMatrix<T>13:29
@HeikoSwe would have a new type13:29
@HeikoSthat has the possibility to query the type of its underlying data at runtime13:29
@HeikoSand then we add new methods to the linalg API that accept this type13:29
@HeikoSmicmn: I would just start a feature branch on that, and play it through for a single method that only accepts Matrix13:30
@HeikoSand then iterate there a bit13:30
micmngot it13:30
geektoniHeikoS: sorry for the delay.  Yeah, there are surely algorithms which can be "upgraded", but I'll need to check which of them can benefit from a progress bar.13:32
@HeikoSgeektoni: should be easy right? :)13:34
@HeikoSLet me find out13:34
geektoniyes, it should be easy to do13:35
@HeikoScross-validation probably as well13:36
geektoniHeikoS: btw, I haven't tried with iPython yet, but the progress bar should behave correctly.13:37
@HeikoSgeektoni: I just wondered about
@HeikoSthis does it nicely in ipython13:38
@HeikoS(including the notebook)13:38
@HeikoSlol saw this, geektoni?13:39
geektonihe... nope. ;)13:40
geektoniI hope I did everything right. LOL13:41
geektoniHeikoS: btw, what is your concern with tqdm?13:42
@HeikoSgeektoni: in what sense?13:42
geektoniI mean13:43
geektoniit is just another way to do progress bar13:43
geektoniin Python13:43
geektoniWere you planning to do something with it?13:44
geektonior it is just to highlight another competitor of ours progress bar ;)13:45
@HeikoSno it is just another one13:45
@HeikoSnot using it13:45
@HeikoSbut it looks cool in the ipython notebook13:45
@HeikoSand it would be cool if ours looked cool there as well13:45
@HeikoSthats why I ask13:45
@HeikoSbtw I have another question: what happens if you nest the progress bars?13:46
lisitsyntricky questions you have here hhuh13:46
@HeikoSlisitsyn: hows the iterator refactoring goin? :D13:48
geektonilisitsyn: yes, indeed13:48
@HeikoSgeektoni: ill be back later with more tricky questions13:50
geektoniHeikoS: I think it won't work well.13:50
lisitsynHeikoS: I tried to find a funny answer13:50
@HeikoSgeektoni: but think about this: you added progress bar to xvalidation and to svm13:50
@HeikoSand then you start13:50
@HeikoSwhat happens?13:50
@HeikoSand what should happen?13:50
lisitsynfinishing in one hour13:51
lisitsyntwo days13:51
lisitsynone minute13:51
@HeikoSi gotta go now, but will be back soon13:51
@HeikoSlisitsyn: ah cool let me know when youre done :D13:51
lisitsyn110% done13:51
geektoniHeikoS: kk, I'll be prepared ;)13:51
-!- HeikoS [] has quit [Quit: Leaving.]13:51
-!- geektoni [] has quit [Quit: Leaving.]13:57
-!- olinguyen [81615ad9@gateway/web/freenode/ip.] has joined #shogun15:19
-!- HeikoS [] has joined #shogun15:34
-!- mode/#shogun [+o HeikoS] by ChanServ15:34
@HeikoSlisitsyn: done?15:35
lisitsynHeikoS: sure sure15:35
@HeikoSolinguyen: hey15:56
@HeikoSTingMiao: hi!15:56
@HeikoSolinguyen: so we have someone from gunnars group now15:56
@HeikoSlets see what he says15:56
olinguyenyea, thanks for that email!15:56
@HeikoSolinguyen: saw the email I wrote a few mins ago?15:57
olinguyenyea :)!15:57
@HeikoSI was thinking of you there -- as you have already used Shogun's messy API quite a bit15:57
@HeikoSolinguyen: so really curious on your input there15:57
@HeikoSolinguyen: API design is something we often dont focus on very much15:57
@HeikoSolinguyen: or more like: usability15:58
olinguyenyea, it's a good idea to have the doc. I'd forget about the messy stuff otherwise15:58
@HeikoSolinguyen: yeah, just keep on filling it, and then ping us15:58
@HeikoSolinguyen: hey I have another question related to the analysis you are doin15:58
@HeikoSolinguyen: so you are predicting mortality within the next X years right?15:59
olinguyenas of right now i did 30-days and 1-year15:59
@HeikoSok cool15:59
@HeikoSso you data is pairs of (X,y)15:59
@HeikoSwhere X is all the data of the patient at time t, and y is whether the patient dies t+1?15:59
olinguyenright! and time t is the first 24 hours16:00
@HeikoSok cool16:00
@HeikoSso we need to discuss this a bit :)16:00
@HeikoSso first question: in some sense, one should also use information before t (not just at t) to predict?16:01
@HeikoSi.e. heart rate at t, but also at t-1, t-2, etc16:01
olinguyenyea. to be more specific i'm taking the max/min/mean values of the first 24 hours16:02
TingMiaohi I've got the email. I'll think about it:)16:02
@HeikoSTingMiao: cool thx! :)16:02
@HeikoSolinguyen: cool16:02
@HeikoSolinguyen: so in general, this is some kind of "rolling" feature extraction16:02
@HeikoSwhere you compute some function over a time series in a certain interval16:03
@HeikoSolinguyen: offtopic: here is a potential feature that Shogun would benefit from16:03
@HeikoSsomething like this16:03
@HeikoSto generalise this16:03
@HeikoSok but anyways, I am after something else atm16:04
@HeikoSolinguyen: so there might be a problem16:04
olinguyencool, i didn't know about that from pandas :P16:04
@HeikoSolinguyen: yeah it might save you some headaches! ;)16:04
@HeikoSso your (X,y) pairs are not iid in fact16:04
@HeikoSand this means16:04
@HeikoSthat you cannot do plain cross-validation16:05
@HeikoSbecause you leak future information into the past16:05
@HeikoSi.e. when you predict at time t, from you x-validation, it might be that you have seen observations from the future16:05
@HeikoSolinguyen: you see what I mean?16:05
olinguyenyea i get it16:05
@HeikoSolinguyen: ok, so the way to solve this is to do something like "cross-validation in time"16:06
@HeikoShere is some inspiration16:06
@HeikoS(dont download the course, this just describes the problem=)16:07
olinguyencool, thanks!16:07
@HeikoSbetter look here
@HeikoSso what the idea here basically is:16:07
@HeikoSgo through time in a certain step size (this corresponds to "folds" in x-validation)16:07
-!- geektoni [~geektoni@] has joined #shogun16:07
@HeikoSand then train using all the information up to that time point, and then test the prediction16:08
@HeikoS(being really careful about no future information leaking into the current model)16:08
@HeikoSand then repeat for all time "folds"16:08
olinguyeni see16:08
olinguyenare my current models likely overfitting?16:08
@HeikoS^this is another nice explain16:09
@HeikoSolinguyen: the numbers you are getting are definitely questionable16:09
@HeikoSolinguyen: and it is likely that if you would run the model in practice on entirely new data, that you would get worse results16:10
@HeikoSsince you are using python16:10
@HeikoSyou could start with generating the splits like this16:10
@HeikoSolinguyen: thats another feature for the list brw16:11
@HeikoSolinguyen: let me know if you have any questions on that16:12
@HeikoSolinguyen: I have a few more things, but they are less important for now16:12
@HeikoSolinguyen: one feature that usually works quite well for time series is to fit a linear regression to the time series in a certain past window (like you do the mean at the moment), and the extrapolate that linear model into the future. This can be then used as part of the X in your (X,y) pairs16:13
olinguyenThanks, i'll go through this and let you know if i have questions (i feel like I'll have a few :P)16:14
olinguyenHeikoS: A little off topic. Do you have time to have a quick look at something?16:14
@HeikoSolinguyen: sure16:14
olinguyenit's been bugging me16:15
olinguyeni'm not sure why the AUC is changing when i'm doing the conversion of the binary labels and recomputing the AUC16:15
olinguyeni didn't attach data because i'm having errors when i'm pickling the objects or putting it in csv (can't seem to reproduce)16:15
@HeikoSdont get the last point16:16
@HeikoScan you explain?16:16
@HeikoSIll try to reproduce this, give me a sec16:16
olinguyenI was trying to reproduce this by pickle the BinaryLabels object, but when i load them after saving and I try to run roc.evaluate, i get an error16:17
olinguyenSystemError: [ERROR] In file /build/shogun-v9ad6W/shogun-6.0.0+1SNAPSHOT201704270057/src/shogun/labels/Labels.cpp line 67: assertion m_current_values.vector && idx < get_num_labels() failed in virtual float64_t shogun::CLabels::get_value(int32_t) file /build/shogun-v9ad6W/shogun-6.0.0+1SNAPSHOT201704270057/src/shogun/labels/Labels.cpp line 6716:17
olinguyenit's driving me nuts lol16:17
olinguyenthis all started when i wanted to evaluate XGBoost. i was getting poor results, similar to randomforest on shogun. Then i tested with sklearn's SVM, etc. They all produce poor results out of the box.16:18
olinguyenI'm getting doubts about my results now16:18
@HeikoSolinguyen: let's do things one after another :)16:19
@HeikoSso pickling BinaryLabels is a sep. issue16:19
@HeikoScould you open a github issue with stand-alone minimal code to reproduce the error?16:19
@HeikoSthen stand-alone code to reproduce the auc error16:20
@HeikoSand then let's talk about sgboost16:20
@HeikoSbut before that, lets make random forest work :)16:20
olinguyenyea, i'm a bit jumpy lol16:20
@HeikoSolinguyen: send me the issue once you have openend them16:20
@HeikoSif there is python code I can copy paste that helps immenseley16:20
@HeikoSyou can do that for the pickling16:20
@HeikoSfor the auc, ill make that16:20
@HeikoSolinguyen: ill be back in 30 min, replied to you gist16:30
olinguyenHeikoS: Yep, get_values fixed the issue!16:33
@wikingmicmn, kudos for medium :)16:38
olinguyenHeikoS: I have to step out for a bit. I think i figured out the issue with the pickling of objects. I'll open issues for the two problems when i'm back16:52
@sukeyIssue #3847 "Implement a time-series splitting strategy" opened by karlnapf -
@sukeyIssue #3847 "Implement a time-series splitting strategy" karlnapf added label: "development tasks" -
@sukeyIssue #3847 "Implement a time-series splitting strategy" karlnapf added label: "entrance" -
@HeikoSolinguyen:  ^17:09
-!- mikeling [uid89706@gateway/web/] has joined #shogun17:19
@wikingHeikoS, pung17:21
@wikingmikeling, pong17:21
mikelingwiking: ping17:21
@HeikoSwiking: pong17:21
@wikingpong pong17:21
mikelingwiking: hi, what's the m_parameters->add_vector() for?17:21
mikelingand I still how those registered parameter be cloned into another object17:22
@HeikoSmikeling: that registers parameters17:22
@HeikoSto be considered in clone, equals, serialize17:22
@HeikoSmikeling: it is the same as SG_ADD17:23
@wikingHeikoS, buuuuuuuyaaaaaaa
@wikingmikeling, just do it :)17:23
@wikingi mean just do what i told you to do ther ein the comment17:23
@wikingthat should again fix some of the errors17:23
@wikingyou were seeing17:23
@wikingalthough :>17:23
@wikinglemme just have a quick peep17:24
@wiking1 second17:24
@wikingHeikoS, wtf we have tons of people in this channel :P17:24
@HeikoSwiking: yeah thats good17:33
@HeikoSwiking: what about having some conversion methods for std::vector to SGVector?17:34
@HeikoSmikeling: ^17:34
mikelingwiking: yeah, but I just want to understand how it exactly happened, like we have an variable std::vector<int32_t> m_array, how could we clone it into another object just by add_vector() ?17:35
mikelingit seems like wouldn't call something like std::vector<int32_t>(array_size) to init an vector17:35
mikeling* a vector17:35
@HeikoSmikeling: the clone method iterates through all registered parameters17:36
@HeikoSif it is a vector, it iterates through its elements and copies them one by one17:36
@HeikoSthe code for that is in SGObject::clone17:36
@HeikoSmikeling: why do you need that info?17:36
mikelingHeikoS: I'm just thinking about if why we need add_vector() to register DynArray's data and length rather than just register it as other variables17:41
@wikingmikeling, just tyring to add you this17:41
@HeikoSDynArray is serializable17:42
@HeikoSso its data should be serialized17:42
@HeikoSthis is why the vector that contains the data needs to be registered for serilaization17:42
mikelingHeikoS: so, what about std::vector? It's not a part of Shogun so I'm not sure if it's serializable17:43
@HeikoSmikeling: it is not17:43
@HeikoSbut its elements are continuous in memory17:43
@HeikoSthis is why you can just get the pointer and the length and pass that to the add_vector method17:43
@HeikoSwhich will do the rest17:43
mikelingwiking: HeikoS ok, I see17:44
mikelingThank you17:44
@HeikoSmikeling: so all is good :)17:44
mikelingHeikoS: How about clone? The clone() is about to clone a same object with same variables?17:56
@HeikoSmikeling: not sure what you mean17:56
mikelinglike we have a vector{1,2,3,4}17:56
@HeikoSbut there is no problems when you change to std::vector API17:56
@HeikoSas cloning only relies on the fact that things are continuous in memory17:56
@wikingHeikoS, it is serializable17:59
@wikingstd::vector has a continuous memory backing17:59
@HeikoSwiking: yeah thats what I mean17:59
@wikingah yeah17:59
@wikingi misread17:59
@wikingi think there's one problem17:59
@HeikoS"add_vector" accepts pointer and some type info18:00
@wikingjust looking into the code18:00
@HeikoSthats it18:00
@wikingyep eyp18:00
@wikingi have one question18:12
@wikingwhat's the difference18:12
@wikingbetween these two properties18:12
@wiking                /** number of elements */18:12
@wiking                int32_t num_elements;18:12
@wiking                /** the number of currently used elements */18:12
@wiking                int32_t array_size;18:12
mikelinglet me have a check18:13
mikelingwiking: no, they are duplicated :(18:15
@wikingso let's try to consolidate those first18:15
mikelingI need to remove the array_size;18:15
@wikingmikeling, but you see that you are setting them differently in18:16
@wiking                CDynamicArray() : CSGObject()18:16
@wikingin case it's DynamicArray<bool>18:17
mikelingyes, I will address them18:17
@wikingbut when? :)18:17
@wikingi mean are we gonna finish this ?18:17
@wikingmikeling, try to start writing again daily updates18:18
@wikingif you are just writing18:18
@wiking'i'm stuck with'18:18
mikelingI will do it right away!18:19
mikeling :)18:19
@wikingHeikoS, so one thing18:21
@wikingif you are here18:22
@sukeyPull Request #3845 "[PrematureStopping] Add CMake support to search or install RxCpp."  merged by vigsterkr -
@sukeyNew Commit "Merge pull request #3845 from geektoni/feature/premature-stopping18:25
@sukey[PrematureStopping] Add CMake support to search or install RxCpp." to shogun-toolbox/shogun by vigsterkr:
micmnwiking: I pushed this morning :P18:29
micmnclang: maybe it should be in another pr, I just kept the old style18:30
@wikinglemme check18:30
@wikingah ok18:30
@wikingidiot me18:31
@wikingor better yeat18:31
@wikingsee clang fix18:31
@wikingand then if that passes then merge18:31
@sukeyPull Request #3846 "Split Eigen3's linalg backend into header and implementation."  merged by vigsterkr -
@sukeyNew Commit "Merge pull request #3846 from micmn/feature/linalg-refactor18:31
@sukeySplit Eigen3's linalg backend into header and implementation." to shogun-toolbox/shogun by vigsterkr:
@wikingmicmn, can you then now rebase
@wikingso we can move forward with LDA and kpca18:33
-!- HeikoS [] has quit [Ping timeout: 240 seconds]18:39
@sukeyPull Request #3848 "[PrematureStopping] Refactor CSignal class to use RxCpp utilities (WIP)"  opened by geektoni -
@wikingmikeling, thnx 4 the update18:53
@wikinggo to sleep :)18:53
@wikingbetter to work on things like this fresh18:54
mikelingwiking: np, that's what I should do18:54
mikelingwiking: yeah, I will. ;)18:55
-!- geektoni [~geektoni@] has quit [Remote host closed the connection]19:26
@sukeyPull Request #3843 "Add linalg methods needed by FisherLDA and KernelPCA (CPU-only)"  synchronized by micmn -
@sukeyPull Request #3842 " Port KernelPCA to use linalg and refactor unit test"  synchronized by micmn -
@sukeyPull Request #3826 "Remove duplicate code in LDA/FisherLDA and port the solvers to use linalg (WIP)"  synchronized by micmn -
@sukeyIssue #3849 "Pickle on a BinaryLabels object does not save the values parameter " opened by olinguyen -
@sukeyIssue #3849 "Pickle on a BinaryLabels object does not save the values parameter "-
@sukeyIssue #3849 "Pickle on a BinaryLabels object does not save the values parameter "-
-!- iglesiasg [~iglesiasg@] has quit [Ping timeout: 246 seconds]21:40
-!- iglesiasg [~iglesiasg@] has joined #shogun21:41
-!- mode/#shogun [+o iglesiasg] by ChanServ21:41
-!- mikeling [uid89706@gateway/web/] has quit [Quit: Connection closed for inactivity]22:22
olinguyenis there a way to get probability scores/confidence values for random forest or decision trees?23:02
olinguyenI'm trying to calculate auROC23:02
--- Log closed Fri Jun 16 00:00:15 2017