IRC logs of #shogun for Tuesday, 2017-08-08

--- Log opened Tue Aug 08 00:00:29 2017
-!- HeikoS [~heiko@host-92-0-169-11.as43234.net] has quit [Ping timeout: 248 seconds]00:11
-!- witness [uid10044@gateway/web/irccloud.com/x-vzotvjnefbculkny] has quit [Quit: Connection closed for inactivity]01:30
@wikingTrixis, that could be it... it's a pure ansi c code :) no wonder if it's not production ready :)04:31
Trixiswiking: haha10:52
lisitsynwiking: god bless not pure python :P11:11
Trixislisitsyn: doing math in pure python, lol11:12
-!- HeikoS [~heiko@host-92-0-169-11.as43234.net] has joined #shogun11:57
-!- mode/#shogun [+o HeikoS] by ChanServ11:57
-!- zoq [~marcus_zo@urgs.org] has quit [Ping timeout: 268 seconds]12:19
-!- shogun-toolbox [~shogun@7nn.de] has quit [Ping timeout: 268 seconds]12:20
--- Log closed Tue Aug 08 12:20:23 2017
--- Log opened Tue Aug 08 12:20:31 2017
-!- shogun-t1olbox [~shogun@7nn.de] has joined #shogun12:20
-!- Irssi: #shogun: Total of 18 nicks [4 ops, 0 halfops, 0 voices, 14 normal]12:20
-!- Irssi: Join to #shogun was synced in 8 secs12:20
-!- zoq_ is now known as zoq12:57
-!- HeikoS [~heiko@host-92-0-169-11.as43234.net] has quit [Ping timeout: 260 seconds]13:39
@wikinglisitsyn, :D14:18
Trixiswiking: actually i hear the best thing to do is to do optimisation in /pure/ R14:45
Trixis/s14:46
Trixiswiking: ye my only issue with the SVMLight is it seems to be the only svm implementation that supports interleaved optimisation15:07
-!- HeikoS [~heiko@untrust-out.swc.ucl.ac.uk] has joined #shogun15:52
-!- mode/#shogun [+o HeikoS] by ChanServ15:53
-!- olinguyen [81615ad9@gateway/web/freenode/ip.129.97.90.217] has joined #shogun16:40
olinguyenHeikoS: hey, sorry, i came a little later today16:41
olinguyendo you have some time now?16:41
Trixiswiking: well after nearly 2 months i finally deployed first gridsearch + evaluation on the cluster in full scale. as per my supervisor, any precision better than random guessing is considered success16:44
Trixisso at least the bar is set quite low16:45
@wikingTrixis, is this the #joke channel? :)17:25
Trixiswiking: what do you mean?17:32
@wiking" <Trixis> [14:43:23] wiking: actually i hear the best thing to do is to do optimisation in /pure/ R17:32
@wiking"17:32
@wiking"17:32
Trixiswiking: oh yeah that was a joke17:32
Trixisin reference to the "at least its not python"17:32
Trixis*response17:32
Trixiswiking: do you know if anyone here has solid experience with graph kernels / graph similarity methods?17:39
@wikingwhat do u need? :)17:40
@wikingwe can talk :)17:42
@wikingand then maybe figure out something17:42
@wikingthat actually make sense17:42
@wikingbtw Trixis have u checked optimizers in tf?17:42
Trixiswiking: nop17:42
Trixiswell at some point it seems ill require a graph kernel / similrity (pseudo) measure to capture large scale features & structure of a graph17:43
@wikingthere's an SDCA optimizer in tf17:43
@wikingwhich is quite good17:43
@wikingand of course a gd17:43
Trixisright now i've got a treelet kernel in play, which however, is capable only of capturing features only involving few neighbouring vertices17:44
@wikingah so you wanna do something like field-aware factorization machines?17:45
@wikinghttps://www.csie.ntu.edu.tw/~cjlin/libffm/17:45
@wiking?17:45
@wikingthere you can use fields17:46
@wikingsee the format17:46
Trixismy hypothesis however is that the process we're investigating is guided by overall structure of a molecule and it's overall shape / elasticity rather than pharmacophores (small functional groups)17:46
@wiking"<label> <field1>:<feature1>:<value1> <field2>:<feature2>:<value2> ...17:46
@wiking"17:46
@wikingTrixis, why dont you try structural svm?17:46
@wikingor your labeling is really just binary?17:46
Trixisits binary17:47
@wikingmmm but your features are more17:47
Trixisyes im currently using both physical measures17:47
@wiking'interdependet'17:47
@wiking?17:47
Trixisand transformed graph features17:47
Trixisbased on the graph structure of the molecule17:47
@wikingmmm have u tried a stupid xgboost for this? :)17:47
Trixisi was wondering if perhaps spectral graph theory could help17:47
@wikingtrees can get you some nice interractions17:48
@wikingbetween features17:48
@wikingbut ok can u give me an example17:48
@wikingof your graphical features?17:48
Trixiswiking: the molecule pool is extremely heterogenous, an example of my graph features would be the molecular structure, with labels corresponding to atom. numbers and bond order. (ofc for this kind of feature treelet kernels are pefrect)17:49
Trixishowever i also have features that are the structure of the molecule labeled as in to capture the charge distributions / flexibility17:50
Trixiswiking: this guy';s definitely in my dataset https://upload.wikimedia.org/wikipedia/commons/thumb/6/65/Erythromycin_A.svg/624px-Erythromycin_A.svg.png (and its still one of the prettier ones)17:50
@wikingso how do you encode this in a feature?17:51
@wikingsome categorical value for the atom?17:52
Trixisye17:52
Trixisofc all labels are categorical17:52
@wikingmmm17:52
@wikingso what is17:52
@wikingH, O, HO?17:52
@wikingi mean how do encode this?17:52
Trixissince continuous labels are highly problematic17:52
Trixiswiking: (O)-(H)17:52
@wikingi mean do you emphasise that HO is actually a H and an O17:53
Trixisye17:53
@wikingbut you dont do later an ordinal encoding17:53
@wikingbut you let your kernel17:53
@wikingtake care of these categoricals?17:53
Trixiseverything but implicit hydrogens are excluded17:53
Trixisyes17:53
@wikingk17:53
@wikingand how do you define the edges?17:53
@wikingor the order of graph nodes17:54
Trixisa few variations on the kernel in use currently group atoms based on some common features17:54
@wikingdefines the edges?17:54
Trixiswiking: edges are explicit as well17:54
@wikingok so what you have like17:54
Trixisthe graph is treated in the canonical way17:54
Trixisi.e. set of edges17:54
Trixisset of vertices17:54
@wikingyeah so you first have the vertices17:54
@wikinglike17:54
@wikingh, o, ho, h3o17:55
@wikingetc17:55
Trixiswiking: one vertex per atom17:55
@wikingand then write the adjacency matrix17:55
@wikingas a verctor?17:55
@wiking*vector17:55
@wikingsince the graph would have that right? :)17:55
Trixisyes indeed17:55
@wikingi mean these type of molecular adjacency matrices i suppose17:55
@wikingare very sparse17:55
Trixisyes17:55
Trixishence why you can use treelet kernels at all17:56
Trixisbecause those are NP complete17:56
@wikingk17:56
Trixisor NP hard17:56
Trixisdepending on the kernel17:56
Trixisone transform which ill possibly try out at some point is to replace rings up to order 8 with a single node --- but that blows up the complexity17:57
@wikingah i see17:57
@wikingso kind of take the benzine and replace it as one node? :)17:58
@wiking*benzene17:58
Trixislike the best way to deal with the ring would be to add a 'ring centre' node, and connect all atoms in the ring to it using a specials labeled edges17:58
Trixisyes17:58
Trixisbut again17:58
Trixisthat blows up the complexity17:58
@wikinghaha17:58
Trixisfor anything treelet based17:58
@wikingman i rememberd this from chemistry17:58
@wiking:D17:58
Trixishahahahah17:58
@wikingwhich i learnt like 20 years ago17:58
@wiking:P17:59
@wikingamazing17:59
@wikingmmm17:59
@wikingbut why does it blow up17:59
@wikingi mean you can define a sort of hierarchy as well17:59
@wikinghere17:59
Trixisits a combinatorial algorithm17:59
Trixisso the complexity is something like O(d^d)17:59
Trixisor even worse17:59
Trixisnow because d is low (max 4), something like 2.5 on avg its not too bad18:00
Trixisbut rings are very common18:00
@wikingah your proteins are not so complex?18:00
Trixisits small molecules and macrolides, so no proteins, and yeah, in general, you dont get more than 4 bonds per atom in chem18:01
@wikingk18:01
Trixisand because you ignore hydrogens bound to the carbon backbone, its 2 for most18:01
Trixisproteins usually use kernels based on 3d shape + sequence18:02
@wikingah i see18:02
Trixisyou wouldnt want that as a graph18:02
Trixishttp://www.sciencedirect.com/science/article/pii/S1063520315000214  i really like this paper, however im afraid its of no use for such small graphs18:05
Trixis*small and sparse18:07
@wiking"windowed Fourier analysis"18:09
@wikingon graph?18:09
@wikingwtf?18:09
Trixisyeah18:09
@wikingwtf is this18:09
@wiking:D18:09
Trixisexactly18:09
Trixisofc its on the laplacian matrix18:10
Trixisbut still lol18:10
@HeikoSolinguyen:  hi there18:11
@HeikoScould chat a bit now18:12
@wikingHeikoS, ping18:12
@HeikoSwiking: pong18:12
olinguyenok cool18:12
@wikingHeikoS, fyi https://gitter.im/shogun-toolbox/shogun/archives/18:13
@wiking:)18:13
@wikingdidn't lknow but yeah here's another archive :D18:13
@HeikoScool, half anon18:13
-!- shogitter [~nodebot@ks312251.kimsufi.com] has joined #shogun18:13
@wikingHeikoS, half anon?18:13
-!- sukey [~nodebot@ks312251.kimsufi.com] has joined #shogun18:13
-!- mode/#shogun [+o sukey] by ChanServ18:14
@HeikoSah no18:14
@HeikoSwiking: whats this bazdmeg @bazdmeg18:14
@wikingHeikoS, it's shogitter18:14
@wikingthat's the relaying bot18:14
@wikingsukey, flip18:14
@sukey(/¯◡ ‿ ◡)/¯ ~ ┻━┻18:14
olinguyenHeikoS: am I suppose to use the outcome labels of past data (e.g. t-2) when trying to predict t+1? I kinda understand the problem for regression when i'm trying to predict the next value, but i have trouble grasping it for predicitng the binary outcome18:14
@HeikoSwiking: okok :)18:15
@HeikoSolinguyen: whats the difference?18:15
shogitter(vigsterkr) so if you type here in gitter then you'll get the relay from him18:15
shogitter(vigsterkr) Heiko :)18:16
@wikingthat's all18:16
olinguyenwhen constructing the lagged features (shifting the values), doesn't the feature matrix just look like 1's and 0's?18:16
@HeikoSwiking: cool18:16
@HeikoSyou mean lagged labels18:16
@HeikoSolinguyen: as in patient died or not18:16
olinguyenyea18:16
@HeikoSolinguyen: yes sure18:16
@HeikoS(X=avg lagged feats until t, y=patient died at t+1)18:17
@HeikoSand most y's will be 018:17
@HeikoSuntil the patient died18:17
Trixiswiking: hm on a second though something like that kernel could perhaps work... i can weight the graph using charge data, and since it considers frequency of paths it could perhaps capture some large scale structural features (e.g. how good of a 'soap' a compound is), but yeah, the sparsness probably wont help18:17
olinguyenHeikoS: so for predicting different time points, am I just training distinct classifiers e.g. Random Forests independantly? Sorry, I feel really slow on this concept18:19
@HeikoSnono18:20
@HeikoSit is more like18:20
@HeikoSevery (X,y) is a data point18:20
@HeikoSthat forms the features and labels that a single RF is trained on18:20
@HeikoSso the RF picks up "patient dies next week"18:20
@HeikoSvia seeing all the examples where this happened (or not)18:21
olinguyenbut some of my (X, y) points are "patient dies in 12 hours" or "patient dies in 24 hours"18:21
@HeikoSolinguyen: ah I see18:22
@HeikoSwell you pick one18:22
@HeikoSand then you have one RF per "time that you predict into the future"18:22
@wikingbtw anybody wanna do this https://www.kaggle.com/c/zillow-prize-118:22
@wiking?18:22
@HeikoSwiking: phd ;)18:23
@wikingHeikoS, :D still we can pick your brain no? :D18:23
@wikingjust for 20 inutes for a week18:23
@HeikoSwiking: sure18:23
@wiking:D18:23
@wikingi mean actually what would be great18:23
@wikingis only use shoung18:23
@wikingand that would bring all the missing features18:23
@wikingfor sure18:23
@wiking:)18:23
@wikingmicmn, ping?18:23
@HeikoSolinguyen: see what I mean?18:26
olinguyenyea, i think i was just over thinking it18:27
olinguyeni thought i would have only 1 classifeir18:27
olinguyenusing all the different (X, y) at different times18:27
olinguyenand couldn't picture that18:27
@HeikoSI see18:31
@HeikoSnono, one classifier per horizont18:31
@HeikoSso if we pick three of them18:31
@HeikoSwe will have three models18:31
@HeikoSit will be easier to predict shorter obviously18:32
@HeikoSolinguyen: you had any luck with the plots I mentioned ?18:33
olinguyennot yet, i'll get on it soon18:34
@wikinghttps://epjdatascience.springeropen.com/articles/10.1140/epjds/s13688-017-0110-z18:47
@wiking:)18:47
@wikinglets have a shogun service18:48
@wikingfor instagrammers18:48
@wiking:D18:48
Trixislol18:52
@wikinghttps://qz.com/262595/why-germans-pay-cash-for-almost-everything/ so HeikoS i guess u are an outlier :)18:53
@HeikoSwiking: cash is naturally anonymous18:54
@HeikoSwhich is quite ncie18:54
-!- HeikoS [~heiko@untrust-out.swc.ucl.ac.uk] has quit [Ping timeout: 255 seconds]19:01
@wikinghttps://www.turbulenceforecast.com19:12
@wikingbest site ever!19:12
--- Log closed Wed Aug 09 00:00:31 2017

Generated by irclog2html.py 2.10.0 by Marius Gedminas - find it at mg.pov.lt!