--- Log opened Tue Aug 08 00:00:29 2017 | ||
-!- HeikoS [~heiko@host-92-0-169-11.as43234.net] has quit [Ping timeout: 248 seconds] | 00:11 | |
-!- witness [uid10044@gateway/web/irccloud.com/x-vzotvjnefbculkny] has quit [Quit: Connection closed for inactivity] | 01:30 | |
@wiking | Trixis, that could be it... it's a pure ansi c code :) no wonder if it's not production ready :) | 04:31 |
---|---|---|
Trixis | wiking: haha | 10:52 |
lisitsyn | wiking: god bless not pure python :P | 11:11 |
Trixis | lisitsyn: doing math in pure python, lol | 11:12 |
-!- HeikoS [~heiko@host-92-0-169-11.as43234.net] has joined #shogun | 11:57 | |
-!- mode/#shogun [+o HeikoS] by ChanServ | 11:57 | |
-!- zoq [~marcus_zo@urgs.org] has quit [Ping timeout: 268 seconds] | 12:19 | |
-!- shogun-toolbox [~shogun@7nn.de] has quit [Ping timeout: 268 seconds] | 12:20 | |
--- Log closed Tue Aug 08 12:20:23 2017 | ||
--- Log opened Tue Aug 08 12:20:31 2017 | ||
-!- shogun-t1olbox [~shogun@7nn.de] has joined #shogun | 12:20 | |
-!- Irssi: #shogun: Total of 18 nicks [4 ops, 0 halfops, 0 voices, 14 normal] | 12:20 | |
-!- Irssi: Join to #shogun was synced in 8 secs | 12:20 | |
-!- zoq_ is now known as zoq | 12:57 | |
-!- HeikoS [~heiko@host-92-0-169-11.as43234.net] has quit [Ping timeout: 260 seconds] | 13:39 | |
@wiking | lisitsyn, :D | 14:18 |
Trixis | wiking: actually i hear the best thing to do is to do optimisation in /pure/ R | 14:45 |
Trixis | /s | 14:46 |
Trixis | wiking: ye my only issue with the SVMLight is it seems to be the only svm implementation that supports interleaved optimisation | 15:07 |
-!- HeikoS [~heiko@untrust-out.swc.ucl.ac.uk] has joined #shogun | 15:52 | |
-!- mode/#shogun [+o HeikoS] by ChanServ | 15:53 | |
-!- olinguyen [81615ad9@gateway/web/freenode/ip.129.97.90.217] has joined #shogun | 16:40 | |
olinguyen | HeikoS: hey, sorry, i came a little later today | 16:41 |
olinguyen | do you have some time now? | 16:41 |
Trixis | wiking: well after nearly 2 months i finally deployed first gridsearch + evaluation on the cluster in full scale. as per my supervisor, any precision better than random guessing is considered success | 16:44 |
Trixis | so at least the bar is set quite low | 16:45 |
@wiking | Trixis, is this the #joke channel? :) | 17:25 |
Trixis | wiking: what do you mean? | 17:32 |
@wiking | " <Trixis> [14:43:23] wiking: actually i hear the best thing to do is to do optimisation in /pure/ R | 17:32 |
@wiking | " | 17:32 |
@wiking | " | 17:32 |
Trixis | wiking: oh yeah that was a joke | 17:32 |
Trixis | in reference to the "at least its not python" | 17:32 |
Trixis | *response | 17:32 |
Trixis | wiking: do you know if anyone here has solid experience with graph kernels / graph similarity methods? | 17:39 |
@wiking | what do u need? :) | 17:40 |
@wiking | we can talk :) | 17:42 |
@wiking | and then maybe figure out something | 17:42 |
@wiking | that actually make sense | 17:42 |
@wiking | btw Trixis have u checked optimizers in tf? | 17:42 |
Trixis | wiking: nop | 17:42 |
Trixis | well at some point it seems ill require a graph kernel / similrity (pseudo) measure to capture large scale features & structure of a graph | 17:43 |
@wiking | there's an SDCA optimizer in tf | 17:43 |
@wiking | which is quite good | 17:43 |
@wiking | and of course a gd | 17:43 |
Trixis | right now i've got a treelet kernel in play, which however, is capable only of capturing features only involving few neighbouring vertices | 17:44 |
@wiking | ah so you wanna do something like field-aware factorization machines? | 17:45 |
@wiking | https://www.csie.ntu.edu.tw/~cjlin/libffm/ | 17:45 |
@wiking | ? | 17:45 |
@wiking | there you can use fields | 17:46 |
@wiking | see the format | 17:46 |
Trixis | my hypothesis however is that the process we're investigating is guided by overall structure of a molecule and it's overall shape / elasticity rather than pharmacophores (small functional groups) | 17:46 |
@wiking | "<label> <field1>:<feature1>:<value1> <field2>:<feature2>:<value2> ... | 17:46 |
@wiking | " | 17:46 |
@wiking | Trixis, why dont you try structural svm? | 17:46 |
@wiking | or your labeling is really just binary? | 17:46 |
Trixis | its binary | 17:47 |
@wiking | mmm but your features are more | 17:47 |
Trixis | yes im currently using both physical measures | 17:47 |
@wiking | 'interdependet' | 17:47 |
@wiking | ? | 17:47 |
Trixis | and transformed graph features | 17:47 |
Trixis | based on the graph structure of the molecule | 17:47 |
@wiking | mmm have u tried a stupid xgboost for this? :) | 17:47 |
Trixis | i was wondering if perhaps spectral graph theory could help | 17:47 |
@wiking | trees can get you some nice interractions | 17:48 |
@wiking | between features | 17:48 |
@wiking | but ok can u give me an example | 17:48 |
@wiking | of your graphical features? | 17:48 |
Trixis | wiking: the molecule pool is extremely heterogenous, an example of my graph features would be the molecular structure, with labels corresponding to atom. numbers and bond order. (ofc for this kind of feature treelet kernels are pefrect) | 17:49 |
Trixis | however i also have features that are the structure of the molecule labeled as in to capture the charge distributions / flexibility | 17:50 |
Trixis | wiking: this guy';s definitely in my dataset https://upload.wikimedia.org/wikipedia/commons/thumb/6/65/Erythromycin_A.svg/624px-Erythromycin_A.svg.png (and its still one of the prettier ones) | 17:50 |
@wiking | so how do you encode this in a feature? | 17:51 |
@wiking | some categorical value for the atom? | 17:52 |
Trixis | ye | 17:52 |
Trixis | ofc all labels are categorical | 17:52 |
@wiking | mmm | 17:52 |
@wiking | so what is | 17:52 |
@wiking | H, O, HO? | 17:52 |
@wiking | i mean how do encode this? | 17:52 |
Trixis | since continuous labels are highly problematic | 17:52 |
Trixis | wiking: (O)-(H) | 17:52 |
@wiking | i mean do you emphasise that HO is actually a H and an O | 17:53 |
Trixis | ye | 17:53 |
@wiking | but you dont do later an ordinal encoding | 17:53 |
@wiking | but you let your kernel | 17:53 |
@wiking | take care of these categoricals? | 17:53 |
Trixis | everything but implicit hydrogens are excluded | 17:53 |
Trixis | yes | 17:53 |
@wiking | k | 17:53 |
@wiking | and how do you define the edges? | 17:53 |
@wiking | or the order of graph nodes | 17:54 |
Trixis | a few variations on the kernel in use currently group atoms based on some common features | 17:54 |
@wiking | defines the edges? | 17:54 |
Trixis | wiking: edges are explicit as well | 17:54 |
@wiking | ok so what you have like | 17:54 |
Trixis | the graph is treated in the canonical way | 17:54 |
Trixis | i.e. set of edges | 17:54 |
Trixis | set of vertices | 17:54 |
@wiking | yeah so you first have the vertices | 17:54 |
@wiking | like | 17:54 |
@wiking | h, o, ho, h3o | 17:55 |
@wiking | etc | 17:55 |
Trixis | wiking: one vertex per atom | 17:55 |
@wiking | and then write the adjacency matrix | 17:55 |
@wiking | as a verctor? | 17:55 |
@wiking | *vector | 17:55 |
@wiking | since the graph would have that right? :) | 17:55 |
Trixis | yes indeed | 17:55 |
@wiking | i mean these type of molecular adjacency matrices i suppose | 17:55 |
@wiking | are very sparse | 17:55 |
Trixis | yes | 17:55 |
Trixis | hence why you can use treelet kernels at all | 17:56 |
Trixis | because those are NP complete | 17:56 |
@wiking | k | 17:56 |
Trixis | or NP hard | 17:56 |
Trixis | depending on the kernel | 17:56 |
Trixis | one transform which ill possibly try out at some point is to replace rings up to order 8 with a single node --- but that blows up the complexity | 17:57 |
@wiking | ah i see | 17:57 |
@wiking | so kind of take the benzine and replace it as one node? :) | 17:58 |
@wiking | *benzene | 17:58 |
Trixis | like the best way to deal with the ring would be to add a 'ring centre' node, and connect all atoms in the ring to it using a specials labeled edges | 17:58 |
Trixis | yes | 17:58 |
Trixis | but again | 17:58 |
Trixis | that blows up the complexity | 17:58 |
@wiking | haha | 17:58 |
Trixis | for anything treelet based | 17:58 |
@wiking | man i rememberd this from chemistry | 17:58 |
@wiking | :D | 17:58 |
Trixis | hahahahah | 17:58 |
@wiking | which i learnt like 20 years ago | 17:58 |
@wiking | :P | 17:59 |
@wiking | amazing | 17:59 |
@wiking | mmm | 17:59 |
@wiking | but why does it blow up | 17:59 |
@wiking | i mean you can define a sort of hierarchy as well | 17:59 |
@wiking | here | 17:59 |
Trixis | its a combinatorial algorithm | 17:59 |
Trixis | so the complexity is something like O(d^d) | 17:59 |
Trixis | or even worse | 17:59 |
Trixis | now because d is low (max 4), something like 2.5 on avg its not too bad | 18:00 |
Trixis | but rings are very common | 18:00 |
@wiking | ah your proteins are not so complex? | 18:00 |
Trixis | its small molecules and macrolides, so no proteins, and yeah, in general, you dont get more than 4 bonds per atom in chem | 18:01 |
@wiking | k | 18:01 |
Trixis | and because you ignore hydrogens bound to the carbon backbone, its 2 for most | 18:01 |
Trixis | proteins usually use kernels based on 3d shape + sequence | 18:02 |
@wiking | ah i see | 18:02 |
Trixis | you wouldnt want that as a graph | 18:02 |
Trixis | http://www.sciencedirect.com/science/article/pii/S1063520315000214 i really like this paper, however im afraid its of no use for such small graphs | 18:05 |
Trixis | *small and sparse | 18:07 |
@wiking | "windowed Fourier analysis" | 18:09 |
@wiking | on graph? | 18:09 |
@wiking | wtf? | 18:09 |
Trixis | yeah | 18:09 |
@wiking | wtf is this | 18:09 |
@wiking | :D | 18:09 |
Trixis | exactly | 18:09 |
Trixis | ofc its on the laplacian matrix | 18:10 |
Trixis | but still lol | 18:10 |
@HeikoS | olinguyen: hi there | 18:11 |
@HeikoS | could chat a bit now | 18:12 |
@wiking | HeikoS, ping | 18:12 |
@HeikoS | wiking: pong | 18:12 |
olinguyen | ok cool | 18:12 |
@wiking | HeikoS, fyi https://gitter.im/shogun-toolbox/shogun/archives/ | 18:13 |
@wiking | :) | 18:13 |
@wiking | didn't lknow but yeah here's another archive :D | 18:13 |
@HeikoS | cool, half anon | 18:13 |
-!- shogitter [~nodebot@ks312251.kimsufi.com] has joined #shogun | 18:13 | |
@wiking | HeikoS, half anon? | 18:13 |
-!- sukey [~nodebot@ks312251.kimsufi.com] has joined #shogun | 18:13 | |
-!- mode/#shogun [+o sukey] by ChanServ | 18:14 | |
@HeikoS | ah no | 18:14 |
@HeikoS | wiking: whats this bazdmeg @bazdmeg | 18:14 |
@wiking | HeikoS, it's shogitter | 18:14 |
@wiking | that's the relaying bot | 18:14 |
@wiking | sukey, flip | 18:14 |
@sukey | (/¯◡ ‿ ◡)/¯ ~ ┻━┻ | 18:14 |
olinguyen | HeikoS: am I suppose to use the outcome labels of past data (e.g. t-2) when trying to predict t+1? I kinda understand the problem for regression when i'm trying to predict the next value, but i have trouble grasping it for predicitng the binary outcome | 18:14 |
@HeikoS | wiking: okok :) | 18:15 |
@HeikoS | olinguyen: whats the difference? | 18:15 |
shogitter | (vigsterkr) so if you type here in gitter then you'll get the relay from him | 18:15 |
shogitter | (vigsterkr) Heiko :) | 18:16 |
@wiking | that's all | 18:16 |
olinguyen | when constructing the lagged features (shifting the values), doesn't the feature matrix just look like 1's and 0's? | 18:16 |
@HeikoS | wiking: cool | 18:16 |
@HeikoS | you mean lagged labels | 18:16 |
@HeikoS | olinguyen: as in patient died or not | 18:16 |
olinguyen | yea | 18:16 |
@HeikoS | olinguyen: yes sure | 18:16 |
@HeikoS | (X=avg lagged feats until t, y=patient died at t+1) | 18:17 |
@HeikoS | and most y's will be 0 | 18:17 |
@HeikoS | until the patient died | 18:17 |
Trixis | wiking: hm on a second though something like that kernel could perhaps work... i can weight the graph using charge data, and since it considers frequency of paths it could perhaps capture some large scale structural features (e.g. how good of a 'soap' a compound is), but yeah, the sparsness probably wont help | 18:17 |
olinguyen | HeikoS: so for predicting different time points, am I just training distinct classifiers e.g. Random Forests independantly? Sorry, I feel really slow on this concept | 18:19 |
@HeikoS | nono | 18:20 |
@HeikoS | it is more like | 18:20 |
@HeikoS | every (X,y) is a data point | 18:20 |
@HeikoS | that forms the features and labels that a single RF is trained on | 18:20 |
@HeikoS | so the RF picks up "patient dies next week" | 18:20 |
@HeikoS | via seeing all the examples where this happened (or not) | 18:21 |
olinguyen | but some of my (X, y) points are "patient dies in 12 hours" or "patient dies in 24 hours" | 18:21 |
@HeikoS | olinguyen: ah I see | 18:22 |
@HeikoS | well you pick one | 18:22 |
@HeikoS | and then you have one RF per "time that you predict into the future" | 18:22 |
@wiking | btw anybody wanna do this https://www.kaggle.com/c/zillow-prize-1 | 18:22 |
@wiking | ? | 18:22 |
@HeikoS | wiking: phd ;) | 18:23 |
@wiking | HeikoS, :D still we can pick your brain no? :D | 18:23 |
@wiking | just for 20 inutes for a week | 18:23 |
@HeikoS | wiking: sure | 18:23 |
@wiking | :D | 18:23 |
@wiking | i mean actually what would be great | 18:23 |
@wiking | is only use shoung | 18:23 |
@wiking | and that would bring all the missing features | 18:23 |
@wiking | for sure | 18:23 |
@wiking | :) | 18:23 |
@wiking | micmn, ping? | 18:23 |
@HeikoS | olinguyen: see what I mean? | 18:26 |
olinguyen | yea, i think i was just over thinking it | 18:27 |
olinguyen | i thought i would have only 1 classifeir | 18:27 |
olinguyen | using all the different (X, y) at different times | 18:27 |
olinguyen | and couldn't picture that | 18:27 |
@HeikoS | I see | 18:31 |
@HeikoS | nono, one classifier per horizont | 18:31 |
@HeikoS | so if we pick three of them | 18:31 |
@HeikoS | we will have three models | 18:31 |
@HeikoS | it will be easier to predict shorter obviously | 18:32 |
@HeikoS | olinguyen: you had any luck with the plots I mentioned ? | 18:33 |
olinguyen | not yet, i'll get on it soon | 18:34 |
@wiking | https://epjdatascience.springeropen.com/articles/10.1140/epjds/s13688-017-0110-z | 18:47 |
@wiking | :) | 18:47 |
@wiking | lets have a shogun service | 18:48 |
@wiking | for instagrammers | 18:48 |
@wiking | :D | 18:48 |
Trixis | lol | 18:52 |
@wiking | https://qz.com/262595/why-germans-pay-cash-for-almost-everything/ so HeikoS i guess u are an outlier :) | 18:53 |
@HeikoS | wiking: cash is naturally anonymous | 18:54 |
@HeikoS | which is quite ncie | 18:54 |
-!- HeikoS [~heiko@untrust-out.swc.ucl.ac.uk] has quit [Ping timeout: 255 seconds] | 19:01 | |
@wiking | https://www.turbulenceforecast.com | 19:12 |
@wiking | best site ever! | 19:12 |
--- Log closed Wed Aug 09 00:00:31 2017 |
Generated by irclog2html.py 2.10.0 by Marius Gedminas - find it at mg.pov.lt!