--- Log opened Wed Jun 14 00:00:12 2017 | ||
-!- mikeling [uid89706@gateway/web/irccloud.com/x-gfijddxvyivyyzyj] has quit [Quit: Connection closed for inactivity] | 00:22 | |
-!- olinguyen [81615ad9@gateway/web/freenode/ip.129.97.90.217] has quit [Quit: Page closed] | 00:42 | |
-!- OXPHOX [92bd305b@gateway/web/freenode/ip.146.189.48.91] has quit [Quit: Page closed] | 01:13 | |
-!- TingMiao [uid229534@gateway/web/irccloud.com/x-aexwfarcpagulqsp] has quit [Quit: Connection closed for inactivity] | 01:28 | |
-!- lisitsyn_ [~lisitsyn@37.139.2.75] has joined #shogun | 03:02 | |
-!- lisitsyn [~lisitsyn@37.139.2.75] has quit [Write error: Broken pipe] | 03:03 | |
-!- lisitsyn_ is now known as lisitsyn | 03:03 | |
-!- mikeling [uid89706@gateway/web/irccloud.com/x-vejvlfbugbsyqyxg] has joined #shogun | 04:20 | |
-!- OXPHOS [401e476d@gateway/web/freenode/ip.64.30.71.109] has joined #shogun | 04:58 | |
@sukey | Pull Request #3832 "use std::vector instead of DynArray(on going)" - https://github.com/shogun-toolbox/shogun/pull/3832 | 05:09 |
---|---|---|
@wiking | mikeling, ping | 05:13 |
@wiking | around? | 05:13 |
mikeling | wiking: pong | 05:13 |
mikeling | yes... | 05:13 |
@wiking | mikeling, so i was looking into the bug yesterday | 05:13 |
@wiking | couldn't nail it down where the memory thing goes bad | 05:14 |
mikeling | wiking: I got some clue | 05:14 |
@wiking | but it's definitely some memory problem | 05:14 |
@wiking | as gdb fails with a simple malloc | 05:15 |
mikeling | actually I find, first stupid thing I do is shuffle(a.begin(), a.end()) | 05:15 |
mikeling | I will make all the element become zero because | 05:15 |
@wiking | ? | 05:16 |
mikeling | it will shuffle all the element in that vector rather than element been used | 05:16 |
@wiking | ok i dont understand :) | 05:16 |
@wiking | could you explain maybe with code or rephrase? | 05:17 |
mikeling | like, if we have vector and its size is 20 | 05:17 |
@wiking | yes | 05:17 |
@wiking | std::vector v(20); :) | 05:17 |
@wiking | and you do std::shuffle(v.begin(), v.end()); | 05:17 |
mikeling | but things in it actually is v{1,2,3,4,5,6,0,0,0,0,0,0,0,0,0,0,0,0}, | 05:18 |
mikeling | because we only use first 6 element | 05:18 |
mikeling | the number of element is 6 | 05:18 |
mikeling | but the size is 20 | 05:18 |
mikeling | see? | 05:18 |
@wiking | ah ok | 05:18 |
mikeling | so I barly found the problem | 05:18 |
@wiking | so you are saying std::shuffle(v.begin(), v.begin()+num_elements) | 05:18 |
@wiking | ? | 05:18 |
mikeling | I really don't know why it failed until I output them one by one | 05:18 |
mikeling | yep | 05:19 |
mikeling | it works | 05:19 |
mikeling | and I'm working on the other one | 05:19 |
@wiking | i see :) | 05:19 |
mikeling | another problem s | 05:19 |
mikeling | is | 05:19 |
mikeling | (let me paste it on somewhere | 05:20 |
@wiking | k | 05:21 |
mikeling | here is the output of CrossValidation_multithread.LibSVM_unlocked https://pastebin.com/0xu7PP2a. And the vector in line 17 is the breakpoint in https://github.com/shogun-toolbox/shogun/blob/develop/src/shogun/evaluation/CrossValidation.cpp#L308 | 05:33 |
mikeling | you can see the last element | 05:33 |
mikeling | become extremely large | 05:33 |
mikeling | which make thing went wrong I guess | 05:33 |
@wiking | yes | 05:37 |
mikeling | SplittingStrategy look like the starting point broken everything, and I display things like https://pastebin.com/hnHcxwpJ | 05:39 |
mikeling | you can see | 05:39 |
mikeling | there always has a element looks like null | 05:39 |
mikeling | or something else | 05:39 |
mikeling | I guess I'm trying to figure it out now | 05:41 |
mikeling | * I'm trying to figure it out now | 05:42 |
mikeling | all the output is in StratifiedCrossValidationSplitting.cpp like https://github.com/shogun-toolbox/shogun/blob/develop/src/shogun/evaluation/StratifiedCrossValidationSplitting.cpp#L117 | 05:44 |
-!- sonney2k [~shogun@7nn.de] has quit [Ping timeout: 260 seconds] | 06:30 | |
-!- sonney2k [~shogun@7nn.de] has joined #shogun | 06:31 | |
mikeling | wiking: ping | 06:35 |
mikeling | I got a question | 06:35 |
@wiking | pong | 06:35 |
mikeling | if we have a SGVector like SGVecotr<10, true> | 06:35 |
mikeling | what's the content in there? I will got 10 random element in there at begining ? | 06:36 |
mikeling | beginning | 06:36 |
@wiking | yes | 06:36 |
@wiking | if you want it to be all 0 | 06:36 |
@wiking | then you need to call .zero() | 06:36 |
@wiking | as it just calles SG_MALLOC | 06:36 |
mikeling | alright, so I probably know why we have those extremely large number in the list | 06:37 |
@wiking | but we didnt touch SGVector :P | 06:37 |
mikeling | no | 06:39 |
mikeling | not directly related with it, yes. But https://github.com/shogun-toolbox/shogun/blob/develop/src/shogun/evaluation/SplittingStrategy.cpp#L112 | 06:39 |
mikeling | what if I haven't init result with right length | 06:40 |
mikeling | by return a wrong num_elements of to_invert | 06:40 |
@wiking | how? | 06:41 |
@wiking | i mean ok | 06:41 |
@wiking | look | 06:41 |
@wiking | there's an assertation missing | 06:42 |
@wiking | because | 06:42 |
@wiking | what if to_invert->find_element(i) returns with -1 more times than result.vlen | 06:43 |
mikeling | mmmm | 06:43 |
@wiking | you see what i mean? | 06:43 |
mikeling | ok, | 06:44 |
mikeling | I see | 06:44 |
mikeling | forget it :) | 06:44 |
@wiking | so say to_invert contains only -1 | 06:44 |
@wiking | in that case there's gonna be a serious problem | 06:44 |
@wiking | as if (to_invert->find_element(i)==-1) will be true | 06:44 |
@wiking | alwasy | 06:44 |
@wiking | so it wants to set result m_labels->get_num_labels() times | 06:45 |
@wiking | which is not possible | 06:45 |
@wiking | because the size of result is only m_labels->get_num_labels()-to_invert->get_num_elements() | 06:45 |
@wiking | :S | 06:45 |
mikeling | mmmm alright | 06:45 |
mikeling | I see | 06:45 |
@wiking | so there should be a check | 06:46 |
@wiking | that | 06:46 |
@wiking | index < result.vlen | 06:46 |
@wiking | because if that's not the case | 06:46 |
@wiking | there will be a memory problem | 06:46 |
@sukey | Issue #3844 "Out of memory error caused by linalg with vs2017 " opened by OXPHOS - https://github.com/shogun-toolbox/shogun/issues/3844 | 07:00 |
ironstark | wiking: I have anaconda installed on my system and I am using the python that comes bundled with that | 07:12 |
@wiking | ironstark, ok i see | 07:14 |
@wiking | if you can give me instructions | 07:14 |
@wiking | how to setup your setup | 07:14 |
@wiking | i can replicate it locally | 07:14 |
@wiking | and then i can fix it | 07:14 |
@wiking | ironstark, cloud.shogun.ml is actually running on such setup | 07:15 |
@wiking | so it shoudl work | 07:15 |
@wiking | ;) | 07:15 |
ironstark | :) I just installed it using the following command bash ~/Downloads/Anaconda3-4.4.0-Linux-x86_64.sh | 07:21 |
@wiking | ok cool | 07:35 |
@wiking | thnx | 07:35 |
-!- OXPHOS [401e476d@gateway/web/freenode/ip.64.30.71.109] has quit [Ping timeout: 260 seconds] | 07:59 | |
-!- geektoni [~geektoni@93-34-234-212.ip52.fastwebnet.it] has joined #shogun | 09:01 | |
mikeling | wiking: oh my god | 09:16 |
@wiking | what'sup? :) | 09:17 |
mikeling | std::vector's implementation cheat me again | 09:17 |
@wiking | lol | 09:18 |
mikeling | so, when I use std::find | 09:18 |
mikeling | std::find(m_array.begin(), m_array.end(), e); | 09:18 |
@wiking | yes | 09:18 |
mikeling | it doesn't works actually, because all the elements in the array will been init as 0 | 09:18 |
@wiking | ? | 09:19 |
@wiking | what do you mean by it doesn't work | 09:19 |
@wiking | std::vector<int> v = {0, 1, 2, 3, 4} | 09:19 |
@wiking | shoudl return you v.begin() | 09:19 |
@wiking | if you look do find(v.begin(), v.end(), 0) | 09:20 |
@wiking | right? | 09:20 |
mikeling | no, actually if it will be std::vector<int> v = {0, 1, 2, 3, 4, 0,0,0,0,0} if it has size 10 | 09:20 |
mikeling | yep | 09:20 |
@wiking | ah you mean that you should actually only do find | 09:20 |
@wiking | if num_elements > 0 | 09:20 |
@wiking | otherwise return -1 | 09:20 |
@wiking | ? | 09:20 |
@wiking | and just do the | 09:20 |
mikeling | so, if I want it return -1 for std::vector<int> v = {5, 1, 2, 3, 4} | 09:21 |
mikeling | yes | 09:21 |
@wiking | find(v.begin(), v.begin()+num_elements, e) | 09:21 |
mikeling | yes..... | 09:21 |
@wiking | okok | 09:21 |
@wiking | i see | 09:21 |
micmn | wiking: when you have a minute | 09:23 |
@wiking | micmn, here | 09:23 |
@wiking | write and i'll try to get back asap | 09:23 |
micmn | can you explain me why in linalg do we need that macro/define_for_all_types thing instead of using templates? | 09:23 |
@wiking | micmn, how :) | 09:24 |
@wiking | that was the problem | 09:24 |
@wiking | :> | 09:24 |
@wiking | i wanted to have templates as well | 09:24 |
@wiking | as it's nicer | 09:25 |
micmn | which is the problem exactly? | 09:26 |
@wiking | how do you do it | 09:29 |
micmn | i mean: template <typename T> virtual void add_scalar(SGMatrix<T>& a, T b) | 09:32 |
micmn | like in linalgnamespace.h | 09:33 |
@wiking | and so you would have | 09:34 |
@wiking | virtual void add(SGVector<T>& a, SGVector<T>& b, T alpha, T beta, SGVector<T>& result); | 09:35 |
@wiking | virtual void add(SGMatrix<T>& a, SGMatrix<T>& b, T alpha, T beta, SGMatrix<T>& result); | 09:35 |
@wiking | ? | 09:35 |
@wiking | of course i myself as well cannot recall why the hell we had to do that macro hack | 09:36 |
@wiking | but there was a particular reason for it :S | 09:36 |
@wiking | ah yeah | 09:37 |
@wiking | inheritance :D | 09:37 |
-!- HeikoS [~heiko@host-92-0-178-129.as43234.net] has joined #shogun | 09:37 | |
-!- mode/#shogun [+o HeikoS] by ChanServ | 09:37 | |
@wiking | micmn, so templating virtual functions is | 09:38 |
@wiking | bla | 09:38 |
@wiking | that's why : | 09:39 |
@wiking | :( | 09:39 |
micmn | ok I'll read something about that thx | 09:40 |
@wiking | if you find a way around that | 09:41 |
@wiking | then we should definitely clear that macro hack | 09:41 |
@wiking | :) | 09:41 |
@wiking | as it's obviously super aweful | 09:42 |
micmn | nonetheless compiling all that stuff for each file that includes linalg it's insane XD | 09:42 |
@wiking | indeed | 09:42 |
@wiking | it is very fucking insane | 09:42 |
@wiking | bad design :( | 09:43 |
@sukey | New branch feature/premature-stopping created on shogun-toolbox/shogun | 09:43 |
@sukey | New Commit "Merge pull request #3833 from MikeLing/add_unittest_for_CDynamicArray | 09:43 |
@wiking | geektoni, ^ | 09:43 |
@sukey | unit test for DynamicArray" to shogun-toolbox/shogun by vigsterkr: https://github.com/shogun-toolbox/shogun/commit/9efa3b77147ccbab345c76832fc6fe4534dd4dab | 09:43 |
geektoni | wiking: thnx ;) | 09:44 |
-!- HeikoS [~heiko@host-92-0-178-129.as43234.net] has quit [Ping timeout: 268 seconds] | 09:50 | |
-!- johklu [c1abba08@gateway/web/freenode/ip.193.171.186.8] has joined #shogun | 09:54 | |
johklu | wiking, hi, I had a closer look at the saved svm in ascii format | 09:58 |
johklu | there is a section called "dictionary weights" | 09:59 |
johklu | which seems to be what I'm looking for (the feature weights for the k-mers assigned when training the svm) | 10:00 |
johklu | however it only contains {0} | 10:00 |
johklu | no proper numbers | 10:00 |
johklu | Do you think something went wrong while exporting, or is that section something completely different? | 10:01 |
@wiking | mmm | 10:01 |
@wiking | johklu, you are using CommWordStringKernel right? | 10:02 |
johklu | yes | 10:04 |
@wiking | okey | 10:04 |
@wiking | so once you have eveyrthing trained | 10:05 |
@wiking | you should be able to get the weights | 10:05 |
@wiking | by | 10:05 |
johklu | I was thinking, maybe I should use svm$set_store_model_features(TRUE) | 10:05 |
johklu | before svm$train() | 10:05 |
@wiking | kernel$get_dictionary(size, weights); | 10:05 |
@wiking | where size is an integer | 10:06 |
@wiking | weights is gonna be a float array | 10:06 |
@wiking | i'm just wondering how this would look like in R | 10:06 |
@wiking | just a sec | 10:06 |
@wiking | because the c++ function looks like this: void get_dictionary(int32_t& dsize, float64_t*& dweights) | 10:07 |
@wiking | i'm not so sure if R can handle that | 10:07 |
@wiking | :S | 10:07 |
@wiking | probably not | 10:07 |
@wiking | johklu, it should be enough to basically serialize the kernel | 10:08 |
@wiking | after training | 10:08 |
@wiking | as that one should contain the data | 10:08 |
@wiking | what you are looking for | 10:08 |
@wiking | johklu, after train | 10:09 |
@wiking | johklu, call this kernel$print_serializable() | 10:09 |
@wiking | and see what's on the output | 10:10 |
johklu | ok | 10:10 |
johklu | wiking, it looks like a log | 10:12 |
@wiking | yes | 10:12 |
johklu | with definitions | 10:12 |
@wiking | it should dump a lot of things | 10:12 |
johklu | no numbers | 10:12 |
@wiking | mmm | 10:13 |
@wiking | ok try then just to serialise it to a file | 10:13 |
@wiking | so | 10:13 |
@wiking | kernel$save_serializable(.... | 10:14 |
@wiking | the same way you did with the svm | 10:14 |
@wiking | make sure you save after svm$train | 10:14 |
johklu | oh, sorry | 10:14 |
johklu | i thought by kernel you meant the svm | 10:14 |
johklu | i'll try again with the kernel | 10:14 |
johklu | ok, kernel$print_serializable() looks pretty much the same | 10:16 |
@wiking | try saving it | 10:17 |
johklu | done | 10:24 |
johklu | looks pretty much the same though | 10:24 |
johklu | except for the label section at the top | 10:24 |
mikeling | wiking: ping | 10:25 |
mikeling | ping | 10:25 |
mikeling | all the tests passed !https://pastebin.mozilla.org/9024557 | 10:26 |
* mikeling started crying | 10:27 | |
lisitsyn | haha congrats | 10:28 |
mikeling | lisitsyn: thank you! | 10:30 |
johklu | wiking, don't you think i might need svm$set_store_model_features(TRUE) | 10:33 |
johklu | what is it for? | 10:33 |
-!- travis-ci [~travis-ci@ec2-54-224-88-30.compute-1.amazonaws.com] has joined #shogun | 10:35 | |
travis-ci | it's Viktor Gal's turn to pay the next round of drinks for the massacre he caused in shogun-toolbox/shogun: https://travis-ci.org/shogun-toolbox/shogun/builds/242723914 | 10:35 |
-!- travis-ci [~travis-ci@ec2-54-224-88-30.compute-1.amazonaws.com] has left #shogun [] | 10:35 | |
@wiking | mikeling, ? :) | 10:36 |
@wiking | mikeling, so what was it? | 10:36 |
mikeling | wiking: all the tesrs passed | 10:36 |
mikeling | https://pastebin.mozilla.org/9024557 | 10:36 |
@wiking | i mean what was the bug? :) | 10:36 |
mikeling | no bug comes out for now | 10:37 |
mikeling | :) | 10:37 |
mikeling | wiking: oh | 10:37 |
mikeling | you mean what's the bug? right? | 10:37 |
mikeling | ok, so the bug is in find_element | 10:37 |
mikeling | like I said, the std::find(m_array.begin(), m_array.end(), e) will search all the element | 10:38 |
@wiking | :) | 10:38 |
@wiking | ok | 10:38 |
@wiking | so extend the unit test with that | 10:38 |
@wiking | with a case where if we would use | 10:38 |
mikeling | ok, I will | 10:38 |
@wiking | std::find(m_array.begin(), m_array.end(), e) | 10:39 |
@wiking | then that test would fail | 10:39 |
@wiking | this way the next time somebody touches DynamicArray | 10:39 |
@wiking | this pops up right away | 10:39 |
mikeling | wiking: I see, I will do it right away :)! | 10:40 |
@wiking | good | 10:40 |
@wiking | then push it into the pr | 10:40 |
mikeling | and I owe you a blog | 10:41 |
@wiking | and let's see what happens on CIs :))) | 10:41 |
mikeling | and a weekly report | 10:41 |
@wiking | yeah let's have first this pushed into the PR | 10:41 |
mikeling | ok | 10:41 |
@wiking | and have it all green | 10:41 |
@wiking | and then you can write a lot of things | 10:41 |
@wiking | what have you learnt about c++ :)) | 10:41 |
@wiking | and shogun :D | 10:41 |
@wiking | johklu, you could try | 10:42 |
@wiking | it wont hurt :) | 10:42 |
@wiking | but imo it's not that | 10:42 |
@wiking | lisitsyn, m_parameters->add_vector(&dictionary_weights, &dictionary_size, "dictionary_weights", | 10:42 |
@wiking | "Dictionary for applying kernel."); | 10:42 |
@wiking | this should add the dictionary_weights to the parameter fw and it should be serialized | 10:42 |
@wiking | right? | 10:42 |
lisitsyn | well I'd guess so | 10:43 |
johklu | wiking, I tried, but it gives this error: get_num_bits_in_histogram()=4 > get_num_bits()=3 [1;31m[ERROR][0m In file /scratch/adm_informatics/rootbuild/shogun/shogun-shogun_6.0.0/src/shogun/features/Alphabet.cpp line 647: ALPHABET too small to contain all symbols in histogram | 10:45 |
-!- HeikoS [~heiko@89.105.104.229] has joined #shogun | 10:45 | |
-!- mode/#shogun [+o HeikoS] by ChanServ | 10:45 | |
@wiking | HeikoS, ping | 10:45 |
johklu | these are the commands: svm$set_store_model_features(TRUE) | 10:45 |
johklu | svm$train() | 10:45 |
johklu | and then comes the error | 10:45 |
@wiking | johklu, how do you set the kernel for the svm? | 10:45 |
@wiking | or where? | 10:45 |
@HeikoS | wiking: pong, sorry cant talk as of now | 10:45 |
@HeikoS | has to be in 2 hrs | 10:46 |
@wiking | ok i'll email u | 10:46 |
@HeikoS | kk | 10:46 |
-!- HeikoS [~heiko@89.105.104.229] has quit [Remote host closed the connection] | 10:46 | |
-!- WangWang [uid231047@gateway/web/irccloud.com/x-mlezjzdzjfwklcrd] has quit [Quit: Connection closed for inactivity] | 10:46 | |
johklu | wiking, kernel <- CommWordStringKernel(feats, feats, param$usesign) | 10:46 |
johklu | svm <- SVMLight(param$C, kernel, labels) | 10:46 |
@wiking | ok | 10:47 |
@wiking | weird | 10:47 |
@wiking | johklu, but the trained model is good? | 10:47 |
@wiking | svm$apply gives you any good result? | 10:47 |
@wiking | or any reasonable result? :) | 10:48 |
johklu | yes | 10:49 |
@wiking | although whathever... the | 10:49 |
johklu | auc > 0.8 | 10:49 |
@wiking | weight should be there | 10:49 |
@wiking | :S | 10:49 |
@wiking | oh good | 10:49 |
@wiking | johklu, btw since we have barely users | 10:49 |
@wiking | can you share in what context are you using shogun?:))) | 10:49 |
@wiking | :D | 10:49 |
@wiking | or at least our users barely contact us :) | 10:50 |
@wiking | if you can share | 10:50 |
@wiking | johklu, lemme try to see if i can get it work somehow on my end with some dummy data | 10:50 |
johklu | its in the context of genomics | 10:52 |
johklu | I have sequences of ~ 50 bases (A,G,C, or T) lenths | 10:52 |
johklu | each sequence is either categorized as methylated or unmethylated | 10:53 |
johklu | and I want to predict the methylation status from the sequence | 10:53 |
johklu | since the presiction seems quite successful (auc >0.8) | 10:54 |
@wiking | :) | 10:54 |
johklu | I would like to know the k-mers within my sequences that are the most important for prediction | 10:54 |
@wiking | cool | 10:54 |
@wiking | awesome application :D | 10:54 |
@wiking | lemme try to help | 10:55 |
@wiking | will need some minutes | 10:55 |
@wiking | ok? | 10:55 |
johklu | :) Thanks! | 10:55 |
johklu | I could also give you the model or data if that would help | 10:57 |
@sukey | Pull Request #3845 "[PrematureStopping] Add CMake support to search or install RxCpp." opened by geektoni - https://github.com/shogun-toolbox/shogun/pull/3845 | 10:58 |
@wiking | johklu, data would definitely help :) | 10:59 |
johklu | wiking, i could give you a matrix containing the sequences in one column and the labels (-1,1) in another | 11:02 |
@wiking | sure | 11:02 |
johklu | ok, give me some minutes and I will give you a link to download | 11:03 |
@wiking | k | 11:03 |
johklu | wiking, here you can download the data: http://medical-epigenomics.org/bocklab/jklughammer/share/sequence_table.tsv | 11:33 |
@wiking | johklu, great | 11:39 |
@wiking | you used the whole dataset for training? | 11:39 |
johklu | no, half of it for training and the other half for testing | 11:42 |
@wiking | k | 11:42 |
@sukey | Pull Request #3845 "[PrematureStopping] Add CMake support to search or install RxCpp." synchronized by geektoni - https://github.com/shogun-toolbox/shogun/pull/3845 | 11:47 |
-!- geektoni [~geektoni@93-34-234-212.ip52.fastwebnet.it] has quit [Quit: Leaving.] | 13:00 | |
-!- geektoni [~geektoni@93-34-234-212.ip52.fastwebnet.it] has joined #shogun | 13:00 | |
-!- geektoni [~geektoni@93-34-234-212.ip52.fastwebnet.it] has quit [Client Quit] | 13:00 | |
@iglesiasg | lisitsyn geektoni, any news about swig and some? | 13:12 |
-!- geektoni [~geektoni@93-34-234-212.ip52.fastwebnet.it] has joined #shogun | 14:19 | |
geektoni | iglesiasg: nope, no news | 14:20 |
@wiking | mikeling, ping | 14:31 |
mikeling | wiking: pong | 14:40 |
@wiking | any news on your push? | 14:43 |
@sukey | Pull Request #3832 "use std::vector instead of DynArray(on going)" synchronized by MikeLing - https://github.com/shogun-toolbox/shogun/pull/3832 | 14:43 |
-!- leagoetz [~leagoetz@pat-231-65.external.eduroam.ucl.ac.uk] has joined #shogun | 14:46 | |
@wiking | mikeling, does this run for you without error | 14:54 |
@wiking | ? | 14:55 |
mikeling | wiking: For unit test, yes. I found it has serializ error, but I don't know how to solve it. | 14:55 |
-!- leagoetz [~leagoetz@pat-231-65.external.eduroam.ucl.ac.uk] has quit [Remote host closed the connection] | 15:02 | |
mikeling | wiking: I write a mock test for the serialization https://gist.github.com/MikeLing/665b961fae759a58535ac07b1b93e39a, but it passed without error | 15:02 |
-!- leagoetz [~leagoetz@pat-231-65.external.eduroam.ucl.ac.uk] has joined #shogun | 15:03 | |
-!- leagoetz [~leagoetz@pat-231-65.external.eduroam.ucl.ac.uk] has quit [Remote host closed the connection] | 15:06 | |
-!- leagoetz [~leagoetz@eduroam-int-pat-8-52.ucl.ac.uk] has joined #shogun | 15:24 | |
-!- tctara_ [~quassel@128.199.61.169] has quit [Ping timeout: 245 seconds] | 15:39 | |
-!- leagoetz [~leagoetz@eduroam-int-pat-8-52.ucl.ac.uk] has quit [Remote host closed the connection] | 15:47 | |
-!- HeikoS [~heiko@untrust-out.swc.ucl.ac.uk] has joined #shogun | 16:12 | |
-!- mode/#shogun [+o HeikoS] by ChanServ | 16:12 | |
-!- leagoetz [~leagoetz@eduroam-int-pat-8-52.ucl.ac.uk] has joined #shogun | 16:15 | |
micmn | wiking: ping | 16:20 |
@wiking | pong | 16:20 |
@HeikoS | wiking: jojo | 16:20 |
micmn | so the linalg insanity is due mostly to the combination of two things | 16:21 |
micmn | it's header only plus the define_for_all_type macro | 16:21 |
micmn | hence each translation unit recompile EVERYTHING | 16:22 |
@wiking | yes | 16:22 |
@wiking | header libraries tend to have that drawback | 16:22 |
micmn | I see no obstacles in separating the implementation from the headers | 16:22 |
micmn | is there any reason for not doing that? | 16:23 |
@wiking | imo no... | 16:24 |
micmn | in fact I tried splitting the eigen implementation and now the files that include linalg compile in reasonable time | 16:26 |
micmn | I'll push to my repo and put a link in the journal | 16:27 |
-!- leagoetz [~leagoetz@eduroam-int-pat-8-52.ucl.ac.uk] has quit [Remote host closed the connection] | 16:31 | |
-!- leagoetz [~leagoetz@eduroam-int-pat-8-52.ucl.ac.uk] has joined #shogun | 16:32 | |
-!- leagoetz [~leagoetz@eduroam-int-pat-8-52.ucl.ac.uk] has quit [] | 16:45 | |
@wiking | micmn, sounds like a plan | 16:46 |
-!- olinguyen [81615ad9@gateway/web/freenode/ip.129.97.90.217] has joined #shogun | 16:59 | |
@wiking | olinguyen, ping? | 17:05 |
olinguyen | hi! | 17:08 |
olinguyen | wiking: i'm here | 17:08 |
@wiking | olinguyen,ok i realised i wrote u an email | 17:09 |
@wiking | :DD | 17:09 |
@wiking | nevermind | 17:09 |
olinguyen | yea, I got it :). I'll add XGboost | 17:09 |
@wiking | thnx | 17:09 |
-!- tctara_ [~quassel@128.199.61.169] has joined #shogun | 17:22 | |
-!- yamz [400789b6@gateway/web/freenode/ip.64.7.137.182] has joined #shogun | 17:24 | |
yamz | Hi all, I am using the C++ interface with a GaussianNaiveBayesModel. I am wondering if it is possible to save/serialize a trained model to disk. | 17:37 |
yamz | I've found the save_serializable() function, but it does not save any of the trained state. My goal is to have my application which does machine learning not have to retrain models at application startup | 17:39 |
yamz | thank you | 17:39 |
@HeikoS | yamz: that should definitely work | 17:51 |
@HeikoS | Can you put your code up as a github issue so that we can investigate? | 17:51 |
@HeikoS | geektoni, micmn, mikeling one of you guys should definitely look into this at some point with wiking. *Any* Shogun model should be serializable. I think we even had some sort of unit test started for that.... | 17:52 |
@wiking | yamz, should work | 17:53 |
@wiking | yamz, can u share the part of the code | 17:53 |
@wiking | that does the serialization | 17:53 |
@wiking | ? | 17:53 |
@wiking | johklu, ok so | 17:53 |
@wiking | one more question | 17:53 |
@wiking | sorry had a long working day | 17:53 |
@wiking | johklu, can u mail me your R snippet for features and training svm? | 17:54 |
@HeikoS | olinguyen, wiking you mean SGBoost using a python framework right? | 17:55 |
@HeikoS | wiking: I will talk to OXPHOS now about linalg | 17:57 |
@HeikoS | wiking: wanna join? | 17:57 |
olinguyen | wiking, HeikoS: were you referring to this XGBoost library (https://github.com/dmlc/xgboost) popularly used in Kaggle competitions? | 17:57 |
@wiking | HeikoS, there's python wrapper for xgboost | 17:57 |
@wiking | olinguyen, yes | 17:59 |
@wiking | HeikoS, where? | 17:59 |
@HeikoS | wiking: hangout | 17:59 |
@wiking | yamz, if u can share the snippet of serialization we might be able to help | 17:59 |
@wiking | HeikoS, 11:57pm | 17:59 |
@HeikoS | wiking: you dont have to | 17:59 |
@HeikoS | wiking: just asking | 18:00 |
@wiking | HeikoS, irc i can | 18:00 |
@wiking | talking i cannot | 18:00 |
@HeikoS | ah I see | 18:00 |
@HeikoS | we will talk | 18:00 |
@HeikoS | more efficient | 18:00 |
micmn | Heikos: *Any* Shogun model should be serializable, yeah I was working on that sometime ago https://github.com/shogun-toolbox/shogun/pull/3751 | 18:01 |
@wiking | micmn, i've pinged u on that :) | 18:01 |
@wiking | HeikoS, micmn should join | 18:01 |
@HeikoS | micmn: maybe a good idea to pick that up soon, especially the test at least | 18:01 |
@wiking | if he can | 18:01 |
@HeikoS | wiking: for the meeting? | 18:01 |
@HeikoS | ok | 18:01 |
@wiking | as he had some ideas mentioned previously | 18:01 |
@wiking | in fact | 18:01 |
@HeikoS | micmn: you want to join a hnagout with Pan and me discussing linalg stuff? | 18:01 |
@wiking | he has a proposal | 18:01 |
@wiking | micmn, right? | 18:01 |
@HeikoS | https://gist.github.com/lisitsyn/a6d8ff6e8690431f967c5318c3750919#file-gistfile1-txt-L129 | 18:01 |
@HeikoS | wiking: so I was gonna talk about this idea here | 18:01 |
@HeikoS | i.e. have un-templated linalg interface | 18:02 |
-!- OXPHOS [92bd15c8@gateway/web/freenode/ip.146.189.21.200] has joined #shogun | 18:02 | |
@wiking | yeah | 18:02 |
@HeikoS | so that the algos do not need to have templates inside | 18:02 |
@wiking | but that requires | 18:02 |
@HeikoS | wiking: there is a few open questions .. | 18:02 |
@wiking | refactor | 18:02 |
@wiking | of the features | 18:02 |
@HeikoS | wiking: I know | 18:02 |
@wiking | :) | 18:02 |
@HeikoS | features? | 18:02 |
@wiking | yeah that one as well | 18:02 |
@HeikoS | yeah sure | 18:03 |
@wiking | and we need Matrix | 18:03 |
@wiking | and Vector | 18:03 |
@HeikoS | i requires quite a bit of stuff | 18:03 |
@wiking | classes | 18:03 |
@HeikoS | but would be good to design that soon | 18:03 |
@wiking | i mean to decouple from | 18:03 |
@wiking | SGV | 18:03 |
@wiking | and SGM | 18:03 |
@HeikoS | we can keep templated linalg for now, and then transition once the design is done | 18:03 |
@HeikoS | wiking: yeah features need iterator access | 18:03 |
@wiking | yeah | 18:03 |
@wiking | indeed | 18:03 |
@HeikoS | no explicit vectors/matrices | 18:03 |
@wiking | but | 18:03 |
@wiking | yeah | 18:03 |
@HeikoS | OXPHOS: hi there | 18:04 |
@wiking | micmn, can u share the idea u had? | 18:04 |
@HeikoS | micmn: you wanna join the meeting ? | 18:04 |
@wiking | anything that is half working is fine as well | 18:04 |
micmn | sorry | 18:04 |
OXPHOS | HeikoS hi | 18:04 |
@wiking | just to see what is your take on the refactor | 18:04 |
@wiking | of the horrendeous | 18:04 |
@wiking | linalg | 18:04 |
@HeikoS | micmn: feel free to share things, this is just about brainstorming a few ideas and get OXPHOS up to speed with the latest sruff | 18:04 |
@HeikoS | stuff | 18:04 |
micmn | terrible headache I don't think I'm able to join the meeting | 18:04 |
@HeikoS | micmn: ok dont worry | 18:05 |
@HeikoS | OXPHOS: lets just talk the two of us then | 18:05 |
@HeikoS | ill call you | 18:05 |
@HeikoS | see you later wiking, micmn | 18:05 |
OXPHOS | HeikoS sure | 18:05 |
@wiking | micmn, no worries | 18:05 |
@wiking | micmn, in any case u have something | 18:06 |
@wiking | for HeikoS and OXPHOS that'd be great | 18:06 |
micmn | yep | 18:06 |
yamz | @wiking | 18:06 |
yamz | auto features_train = some<CDenseFeatures<float64_t>>(f_feats_train); auto labels_train = some<CMulticlassLabels>(f_labels_train); auto gnb = some<CGaussianNaiveBayes>(features_train, labels_train); gnb->train(); auto saveFile = some<CSerializableAsciiFile>("/home/myamada/tmp/shogun_model2.out", 'w'); gnb->save_serializable(saveFile); | 18:06 |
micmn | I'll make a recap | 18:06 |
yamz | oops | 18:06 |
@wiking | yamz, it's good | 18:06 |
yamz | so i've noticed the save_serializable() function has the same output regardless of if i call it before or after calling train() | 18:07 |
@wiking | yamz, and what's in shogun_model2.out? | 18:07 |
@wiking | yamz, :O | 18:08 |
yamz | *long lines incoming* | 18:08 |
yamz | <<_SHOGUN_SERIALIZABLE_ASCII_FILE_V_00_>> max_train_time float64 0 solver_type int32 0 labels SGSerializable* MulticlassLabels [ subset_stack SGSerializable* SubsetStack [ active_subset SGSerializable* null [] active_subsets_stack SGSerializable* DynamicObjectArray [ array Vector<SGSerializable*> 0 () resize_granularity int32 128 use_sg_malloc bool t free_array bool t dim1_size int32 1 dim2_size int32 1 dim3_size int32 1 ] ] labels SGVec | 18:08 |
@wiking | yamz, use | 18:08 |
@wiking | yamz, pastebin.com | 18:08 |
yamz | right | 18:08 |
@wiking | yamz, for big pastes | 18:08 |
-!- geektoni [~geektoni@93-34-234-212.ip52.fastwebnet.it] has quit [Remote host closed the connection] | 18:08 | |
yamz | https://pastebin.com/3n8qwNGx | 18:08 |
@wiking | yamz, 980 = datapoints? | 18:10 |
yamz | yes 980 rows | 18:10 |
yamz | and here is the source: https://pastebin.com/2wSb4MsV | 18:10 |
@wiking | ok | 18:11 |
@wiking | the data is shareable? | 18:11 |
yamz | sure | 18:11 |
@wiking | if u could | 18:12 |
@wiking | then i could debug right away | 18:12 |
@wiking | maybe we have some serious problem | 18:12 |
@wiking | with the serializaiton fw | 18:12 |
@wiking | (would not be surprised) | 18:12 |
@wiking | :DDDDDDDDD | 18:12 |
yamz | hang on. I think I may just need to specify the separator in the csv | 18:12 |
@wiking | yamz, sure... have u tested the model itself after trainign? | 18:13 |
yamz | Not with this data. | 18:14 |
@wiking | anything is fine actually | 18:14 |
yamz | Training data: https://pastebin.com/7xAKzdLx | 18:15 |
yamz | Labels: https://pastebin.com/nDA3xqR2 | 18:15 |
@wiking | k | 18:17 |
yamz | OK so. i've changed my program to use the shogun supplied data, data/classifier_4class_2d_linear_features_train.dat | 18:20 |
yamz | and modified my program to dump model before and after training | 18:21 |
yamz | auto saveFile = some<CSerializableAsciiFile>("/home/myamada/tmp/shogun_model_before_train.out", 'w'); | 18:21 |
yamz | gnb->save_serializable(saveFile); | 18:21 |
yamz | gnb->train(); | 18:21 |
yamz | auto saveFile2 = some<CSerializableAsciiFile>("/home/myamada/tmp/shogun_model_after_train.out", 'w'); | 18:21 |
yamz | gnb->save_serializable(saveFile2); | 18:21 |
@wiking | and? | 18:22 |
@wiking | (btw i'm just debugging | 18:22 |
@wiking | ) | 18:22 |
yamz | both files are the same | 18:22 |
yamz | [myamada@wtl-lbuild-1 tmp]$ diff shogun_model_before_train.out shogun_model_after_train.out [myamada@wtl-lbuild-1 tmp]$ | 18:22 |
yamz | :( | 18:22 |
@wiking | k | 18:22 |
@wiking | yep | 18:24 |
@wiking | i can see that the model has some info | 18:25 |
yamz | seems to have the label info only | 18:25 |
@wiking | yes | 18:26 |
@wiking | i have to debug a bit | 18:26 |
yamz | ok. i appreaciate the help very much | 18:26 |
micmn | HeikoS, OXPHOS, wiking: sent an email with my thoughts on the current state of linalg :) | 18:27 |
@wiking | micmn, got it thnx | 18:27 |
@wiking | yamz, that's alright | 18:27 |
@wiking | yamz, ok i see what's the problem | 18:36 |
-!- OXPHOS [92bd15c8@gateway/web/freenode/ip.146.189.21.200] has quit [Ping timeout: 260 seconds] | 18:37 | |
yamz | great | 18:37 |
yamz | details? | 18:38 |
-!- OXPHOS [92bd305b@gateway/web/freenode/ip.146.189.48.91] has joined #shogun | 18:41 | |
OXPHOS | micmn: thx! | 18:42 |
@wiking | just trying to fix it | 18:42 |
@wiking | yamz, will push the fix soon after i tested it | 18:42 |
@wiking | :P | 18:42 |
@wiking | HeikoS, we need to do something aobut serialization | 18:42 |
@wiking | :D | 18:42 |
yamz | awesome | 18:42 |
@wiking | we fail biiiiiiiiiiiiiig time | 18:42 |
@wiking | :> | 18:42 |
@HeikoS | wiking: yep | 18:42 |
@HeikoS | wiking: thats why i initiated the unit test thing in the first place | 18:42 |
@HeikoS | because I wanted to know how many models dont work actually | 18:42 |
@wiking | NaiveBayes has 0 params registered :D | 18:43 |
@HeikoS | wiking: yep | 18:46 |
@HeikoS | it was written before that was possible | 18:46 |
@HeikoS | I made a few model serializable as part of my pre GSoC contributions in 2011 :D | 18:46 |
@HeikoS | wiking: so first step here is to get the unit test running for that, so that as much as possible, we automate detecting such problems | 18:47 |
@wiking | HeikoS, micmn should finish :) | 18:52 |
@wiking | miju | 18:52 |
@wiking | yamz, ok fix is coming | 18:52 |
micmn | :) | 18:53 |
@HeikoS | micmn: hi | 18:53 |
@HeikoS | saw my email? | 18:53 |
micmn | from what I remember it was more or less working | 18:53 |
micmn | yep | 18:53 |
@HeikoS | micmn: I guess the main thing is to get this serialization testing working for *all* models in shogun | 18:54 |
micmn | ok, I'll look into that | 18:55 |
micmn | is there a list of *all* models? :D | 18:56 |
@wiking | :> | 18:57 |
@wiking | HeikoS, you should run ./bin/shogun-unit-test | 18:57 |
@wiking | to see the progress heaven | 18:57 |
@wiking | ;) | 18:57 |
@HeikoS | haha | 18:57 |
micmn | if I remember correctly i was working on CMachine | 18:57 |
@HeikoS | micmn: yeah linear machine | 18:58 |
@HeikoS | wiking: was curious what happens in ipython :D | 18:58 |
@HeikoS | micmn: we would want all classes of: linear machine, svm, kernel machine, gaussian process, multiclass, preprocessing, something like that order | 18:58 |
@wiking | HeikoS, didn't u say that you wanna ditch python? | 18:58 |
@sukey | New Commit "Fix serialization of GaussianNaiveBayes" to shogun-toolbox/shogun by vigsterkr: https://github.com/shogun-toolbox/shogun/commit/25c71a509cb6746d2b9085fbc1e4190cacbb170b | 18:58 |
@wiking | yamz, ^ that's your fix | 18:59 |
@HeikoS | wiking: ? ditch? | 18:59 |
@HeikoS | wiking: haha | 18:59 |
@wiking | HeikoS, one day you've joined that you never wanna touch python again | 18:59 |
@wiking | :) | 18:59 |
@HeikoS | would love to | 18:59 |
@HeikoS | wiking: yeah, just had another nightmare about (portable) exclusive file access | 18:59 |
@HeikoS | and used a "library" for that | 19:00 |
@wiking | :> | 19:00 |
@HeikoS | which messed up our NFS :D | 19:00 |
@wiking | nice one | 19:00 |
@wiking | HeikoS, ok so we havea nother candidate | 19:00 |
@wiking | who needs help | 19:00 |
shogun-buildbot | build #279 of trusty - libshogun - viennacl is complete: Failure [failed test] Build details are at http://buildbot.shogun-toolbox.org/builders/trusty%20-%20libshogun%20-%20viennacl/builds/279 blamelist: Viktor Gal <viktor.gal@maeth.com> | 19:00 |
@wiking | eeeeeeeeeeeee | 19:00 |
@wiking | wtf? | 19:00 |
@wiking | SERIALIZATION error | 19:01 |
@wiking | lloooooooooool | 19:01 |
@HeikoS | wiking: these tests serialize everything | 19:01 |
@HeikoS | all classes | 19:01 |
@wiking | yeah | 19:01 |
@wiking | but seems there's every now and then an error | 19:01 |
@wiking | :S | 19:01 |
@HeikoS | if you register a parameter and they fail | 19:01 |
@wiking | need to fucking fix this | 19:01 |
@wiking | :( | 19:01 |
@wiking | nono unrelated | 19:01 |
@HeikoS | this usually means that the praameter wasnt initialised in default constructor | 19:01 |
@wiking | SerializationAscii.SpectrumMismatchRBFKernel | 19:01 |
@HeikoS | yeah? | 19:02 |
@HeikoS | ah ok | 19:02 |
@wiking | i haven't even touched that | 19:02 |
@HeikoS | haha | 19:02 |
@HeikoS | nice | 19:02 |
@wiking | HeikoS, so i was saying | 19:02 |
@wiking | CCommWordStringKernel | 19:02 |
@wiking | once done | 19:03 |
@wiking | one would like to get access to the weights | 19:03 |
@wiking | now the api for that atim is | 19:03 |
@wiking | *atm | 19:03 |
@wiking | void get_dictionary(int32_t& dsize, float64_t*& dweights) | 19:03 |
@wiking | { | 19:03 |
@wiking | dsize=dictionary_size; | 19:03 |
@wiking | dweights = dictionary_weights; | 19:03 |
@wiking | } | 19:03 |
@wiking | this will just not work in SWIG interface | 19:03 |
@wiking | should we switch it to | 19:04 |
@wiking | SGVector<float64_t> get_dictionary() const; | 19:04 |
@wiking | ? | 19:04 |
yamz | Thank you @wiking! | 19:09 |
@wiking | yamz, no worries | 19:10 |
-!- HeikoS [~heiko@untrust-out.swc.ucl.ac.uk] has quit [Ping timeout: 240 seconds] | 19:21 | |
@sukey | New Commit "Expose dictionary of CommWordStringKernel to the modular interfaces" to shogun-toolbox/shogun by vigsterkr: https://github.com/shogun-toolbox/shogun/commit/c65c2c54362c1964d79df67a651e71f55fefd2be | 19:23 |
@wiking | johklu, so this commit (https://github.com/shogun-toolbox/shogun/commit/c65c2c54362c1964d79df67a651e71f55fefd2be) should fix your problem of getting the dictionaries via the API in R | 19:23 |
@wiking | johklu, once you have your svm trained you should be able to call: weights <- kernel$get_dictionary() | 19:24 |
shogun-buildbot | build #280 of trusty - libshogun - viennacl is complete: Success [build successful] Build details are at http://buildbot.shogun-toolbox.org/builders/trusty%20-%20libshogun%20-%20viennacl/builds/280 | 19:24 |
@wiking | let's see what does buildbot say | 19:25 |
@wiking | sukey flip | 19:25 |
@wiking | :) | 19:25 |
@sukey | ┬─┬ ノ( ゜-゜ノ) | 19:25 |
@wiking | johklu, of course this means that you should use the latest shogun from github | 19:27 |
@wiking | to be able to have this in your R interface | 19:27 |
-!- johklu [c1abba08@gateway/web/freenode/ip.193.171.186.8] has quit [Ping timeout: 260 seconds] | 20:41 | |
-!- mikeling [uid89706@gateway/web/irccloud.com/x-vejvlfbugbsyqyxg] has quit [Quit: Connection closed for inactivity] | 20:59 | |
-!- HeikoS [~heiko@host-92-0-178-129.as43234.net] has joined #shogun | 21:59 | |
-!- mode/#shogun [+o HeikoS] by ChanServ | 21:59 | |
--- Log closed Thu Jun 15 00:00:14 2017 |
Generated by irclog2html.py 2.10.0 by Marius Gedminas - find it at mg.pov.lt!