--- Log opened Thu Jul 05 00:00:55 2018 | ||
-!- HeikoS [~heiko@host109-151-250-28.range109-151.btcentralplus.com] has joined #shogun | 10:01 | |
-!- mode/#shogun [+o HeikoS] by ChanServ | 10:01 | |
lisitsyn | HeikoS: the failed test in lazy any is interesting | 10:33 |
---|---|---|
@HeikoS | whats up with it? | 10:33 |
lisitsyn | it seems I accidentally found some null derefernce | 10:33 |
lisitsyn | because I added a check to avoid that | 10:33 |
lisitsyn | and exactly that failed | 10:33 |
lisitsyn | I am debugging | 10:33 |
lisitsyn | https://github.com/shogun-toolbox/shogun/pull/4343/files#diff-8ea96286d95b52029d31636117e0fe55R341 | 10:34 |
lisitsyn | this one | 10:34 |
lisitsyn | HeikoS: once I am done, I will add a flag to ignore that in clone/equals | 10:37 |
lisitsyn | ah sorry that's 'Empty' | 10:38 |
lisitsyn | :D | 10:38 |
@HeikoS | sure | 10:47 |
@HeikoS | mmmh | 10:48 |
@HeikoS | seems weird but I cannot see through all this atm | 10:48 |
wuwei | heiko: hi, i still have some problems with string features meta examples | 13:05 |
wuwei | https://github.com/shogun-toolbox/shogun/blob/31ed13beba78984199a361756506178dd866ac57/examples/undocumented/python/kernel_comm_word_string.py#L18 | 13:05 |
wuwei | maybe we need a bit refactor on string features apis | 13:06 |
@HeikoS | wuwei: hi | 16:09 |
@HeikoS | ah yes | 16:09 |
@HeikoS | specialized method | 16:09 |
@HeikoS | what does it do? | 16:09 |
wuwei | hi | 16:09 |
wuwei | it map the string list to high order real vectors | 16:10 |
@HeikoS | what does that mean? | 16:11 |
wuwei | there's actually a problem with alphabet, which i have asked viktor | 16:12 |
wuwei | https://github.com/shogun-toolbox/shogun/blob/31ed13beba78984199a361756506178dd866ac57/src/shogun/features/Alphabet.cpp#L755 | 16:12 |
wuwei | it will call CAlphabet::translate_from_single_order | 16:13 |
@HeikoS | I feel like we will need another factory for this or? | 16:13 |
wuwei | take a subsequence of string, and compute a real value | 16:13 |
@HeikoS | sure ok | 16:14 |
@HeikoS | this feels like some user decision to represent the string in a particular way | 16:14 |
wuwei | i think we need some clean up with string features first | 16:15 |
@HeikoS | I mean let's face it | 16:16 |
@HeikoS | some of the things that the string features API offers | 16:16 |
@HeikoS | is actually required to be accessible by the user | 16:16 |
@HeikoS | so there is very little way arouind somehow offering an API in swig that does similar things | 16:16 |
wuwei | you mean we should create factory method to wrap obtain_*? | 16:19 |
@HeikoS | in these lines | 16:21 |
@HeikoS | a factory for converting string features | 16:21 |
@HeikoS | OR parametrize the existing string_features factory do be able to do such things | 16:21 |
@HeikoS | you see, we need an API for doing similar things | 16:21 |
@HeikoS | so I guess a good idea would be to go through the string examples | 16:21 |
@HeikoS | and see what is needed | 16:21 |
@HeikoS | then design an API for it | 16:21 |
@HeikoS | and implement it :) | 16:22 |
wuwei | yeah | 16:22 |
@HeikoS | one thing to remember is that the new API is still experimental, so we can change it | 16:22 |
@HeikoS | if the approach doesnt work | 16:22 |
wuwei | but i have another problem with obtain* | 16:22 |
@HeikoS | we are currently changing it all the time | 16:22 |
@HeikoS | ok what is it? | 16:22 |
wuwei | that's still not fixed in transformers | 16:22 |
wuwei | let me check quickly | 16:23 |
@HeikoS | sure | 16:24 |
wuwei | heiko: that's histogram of alphabet | 16:28 |
wuwei | for example, https://github.com/shogun-toolbox/shogun/blob/31ed13beba78984199a361756506178dd866ac57/examples/undocumented/python/preprocessor_sortulongstring.py#L19 | 16:28 |
wuwei | when you create a string features with some alphabet | 16:28 |
wuwei | and then call obtain_from_char | 16:28 |
@HeikoS | so the features are transformed to a space | 16:28 |
@HeikoS | that has one component per alphabet entry | 16:29 |
@HeikoS | and then it just counts | 16:29 |
wuwei | then you will have data beyond the alphabet | 16:29 |
@HeikoS | i.e. the actual features are histograms | 16:29 |
wuwei | that will cause a problem | 16:29 |
@HeikoS | not sure I understand | 16:29 |
wuwei | i mean, after calling obtain_from_char, what you have is actually real vectors, which are not in the alphabet | 16:31 |
@HeikoS | i see | 16:31 |
wuwei | so check_alphabet_size() will fail | 16:31 |
@HeikoS | but that is a problem that is already there right? | 16:32 |
wuwei | when you try to create a copy of string features, it will call check_alphabet_size() then fails | 16:32 |
@HeikoS | i get it | 16:32 |
wuwei | yes that's a problem with transformers | 16:32 |
@HeikoS | well it would be the dimension of the real vectors or? | 16:32 |
wuwei | the reason check_alphabet_size() fails is that the data is not in alphabet | 16:33 |
@HeikoS | yes it is transformed | 16:33 |
wuwei | since that's real values, instead of something like DNA characters | 16:33 |
@HeikoS | it should be CDenseFeatures then or? | 16:34 |
@HeikoS | or even sparse | 16:34 |
@HeikoS | because string features by construction have an alphabet | 16:34 |
@HeikoS | they are discrete objects | 16:34 |
wuwei | yes i think that should be dense features | 16:34 |
wuwei | but i'm not sure, since many algorithms use transformered string features | 16:34 |
@HeikoS | mmh | 16:35 |
wuwei | for example string kernels | 16:35 |
@HeikoS | but string kernels use the alphabet as well no? | 16:35 |
@HeikoS | so how does it work | 16:35 |
@HeikoS | it just replaces the string list | 16:35 |
@HeikoS | with a list of SGString<index_t> ? | 16:35 |
@HeikoS | and then the elements contain the counts? | 16:35 |
wuwei | yes | 16:36 |
wuwei | it replaces the string list | 16:36 |
@HeikoS | so all strings are of the same length | 16:36 |
@HeikoS | I would call this an embedding | 16:36 |
@HeikoS | and yes, the resulting features should be dense/sparse | 16:36 |
@HeikoS | i.e. real | 16:36 |
wuwei | current workaround in transformers is to prevent creating a new copy, but that means string preprocessors work differently from other preprocessors | 16:37 |
@HeikoS | the concept of CDenseFeatures is that all vectors are of the same length | 16:37 |
@HeikoS | and all vectors are dense | 16:37 |
@HeikoS | in stringFeatures, we allow the elements to have different lengths | 16:37 |
@HeikoS | tbh I think in the case of say histogram embedding, the output should be dense/sparse | 16:37 |
@HeikoS | anyways, you are right, the API is inconsistent | 16:38 |
@HeikoS | if there is a way of StringFeatures to be defined in a real space, without an alphabet, then there should not be a method to access the alphabet | 16:39 |
@HeikoS | and also I think there should be a distinction of discrete and numerical spaces | 16:39 |
wuwei | ah yeah the embedding will have different length as well | 16:39 |
@HeikoS | will they? | 16:40 |
@HeikoS | how so? | 16:40 |
@HeikoS | maybe then we just need another base class | 16:40 |
@HeikoS | DiscreteStringFeatures | 16:40 |
@HeikoS | ContinuousStringFeatures | 16:40 |
@HeikoS | or something in the lines | 16:40 |
@HeikoS | uh all this code looks scary | 16:41 |
@HeikoS | long time it wasnt touched :D | 16:41 |
@HeikoS | so you have a suggestion how to proceed with this? | 16:41 |
@HeikoS | seems like this is a different problem to the one with the API or? | 16:41 |
wuwei | ah no I don't have idea now | 16:43 |
@HeikoS | usually, the best strategy is to not solve everything at once | 16:43 |
@HeikoS | but one thing after the other | 16:43 |
@HeikoS | so maybe lets start with the factory API | 16:45 |
@HeikoS | and then deal with the transformer branch later? | 16:46 |
wuwei | sure | 16:48 |
lisitsyn | HeikoS: check the latest commit of https://github.com/shogun-toolbox/shogun/pull/4343 | 16:57 |
@HeikoS | lisitsyn: checking | 16:57 |
lisitsyn | oops continue missed | 16:57 |
lisitsyn | HeikoS: ok i basically add ignore ifs for the non-cloneable and non-visitable anys | 16:58 |
@HeikoS | lisitsyn: commented! | 17:00 |
@HeikoS | but yes! | 17:00 |
@HeikoS | that is is! | 17:00 |
lisitsyn | HeikoS: uhmm I don't get your comment | 17:01 |
lisitsyn | ah you mean re-using the message Cloning/Comparing? | 17:01 |
lisitsyn | in case of race it would be useful | 17:01 |
lisitsyn | if two objects are compared at the same time for example | 17:02 |
@HeikoS | ah i see | 17:02 |
@HeikoS | yeah ok sure | 17:02 |
@HeikoS | the first one needs continue! | 17:02 |
lisitsyn | HeikoS: yes I fixed that | 17:02 |
@HeikoS | otherwise I am fine! | 17:02 |
@HeikoS | merge it! | 17:02 |
@HeikoS | we can port more examples then | 17:02 |
lisitsyn | let me build once again | 17:03 |
lisitsyn | I would not wait for travis | 17:03 |
lisitsyn | because it will take infty | 17:03 |
@HeikoS | sure | 17:03 |
@HeikoS | just merge | 17:03 |
@HeikoS | dont wait | 17:03 |
wuwei | Heiko: btw currently mixin base classes broke swig, e.g. in python, perceptron.train is undefined | 17:07 |
@HeikoS | really | 17:15 |
@HeikoS | thx for letting me know | 17:15 |
@HeikoS | so the python meta example for perceptron doesnt work? | 17:15 |
@HeikoS | wuwei: ^ | 17:15 |
@HeikoS | Test #452: generated_python-binary-perceptron ... Passed | 17:16 |
@HeikoS | wuwei: can you tell me how to reproduce the error? | 17:16 |
wuwei | let me check | 17:17 |
wuwei | i didn't test meta examples locally | 17:17 |
wuwei | in python, you create a perceptron instance | 17:18 |
@HeikoS | where did you see the error? | 17:18 |
wuwei | and then call train | 17:18 |
wuwei | it will throw an error | 17:18 |
wuwei | in my machine | 17:18 |
wuwei | thers's warning about swig | 17:19 |
wuwei | like base class CIterativeMachine is unknown | 17:20 |
@HeikoS | I am checking | 17:21 |
@HeikoS | can you run | 17:21 |
@HeikoS | ctest -R perceptroin | 17:21 |
wuwei | Nothing known about base class 'CIterativeMachine< CLinearMachine >'. Ignored. Maybe you forgot to instantiate 'CIterativeMachine< CLinearMachine >' using %template. | 17:22 |
wuwei | ^ the warning | 17:22 |
@HeikoS | when running what? | 17:23 |
wuwei | i'm build develop branch | 17:23 |
@HeikoS | what is the command that gives the error? | 17:24 |
@HeikoS | is it running the meta example? | 17:24 |
@HeikoS | and more importantly | 17:25 |
@HeikoS | how do you instatiate the Perceptron | 17:25 |
@HeikoS | using new? | 17:25 |
wuwei | the warning above is thrown when building with `make all` | 17:25 |
@HeikoS | or using machine("Perceptron") | 17:25 |
wuwei | shogun.Perceptron | 17:25 |
wuwei | using the ctor | 17:25 |
@HeikoS | yes | 17:25 |
@HeikoS | ok | 17:25 |
@HeikoS | no problem then | 17:26 |
@HeikoS | we actually do not want to expose this into swig anymore anyways | 17:26 |
@HeikoS | this is why the meta example works | 17:26 |
@HeikoS | it uses the factory | 17:26 |
@HeikoS | wuwei: but thanks for letting me know anyways | 17:26 |
wuwei | oh yeah i see | 17:27 |
@HeikoS | wuwei: https://github.com/shogun-toolbox/shogun/issues/4354 | 17:31 |
@HeikoS | wuwei: btw | 17:32 |
@HeikoS | are there any more RealFeatures instances in undocumented/python ? | 17:32 |
@HeikoS | wuwei: because once we have brought that down to zero, we can remove RealFeatures from swig | 17:33 |
@HeikoS | (big compile speedup) | 17:33 |
wuwei | there are still many | 17:33 |
@HeikoS | wuwei: any of those are already portable? | 17:43 |
wuwei | yeah, that's much work to be done | 17:44 |
@HeikoS | wuwei: btw do you remember which example was in need of a lazy evaluated member | 17:45 |
@HeikoS | so I can do 'get' | 17:45 |
@HeikoS | but then it computes something | 17:46 |
@HeikoS | I forgot which examples this was | 17:46 |
wuwei | one is GaussianKernel::get/set width | 17:47 |
wuwei | cuz it's log_width stored | 17:47 |
@HeikoS | we have only passive | 17:48 |
@HeikoS | so could do | 17:48 |
@HeikoS | get('width') | 17:48 |
@HeikoS | but that is not the best illustration | 17:48 |
@HeikoS | somethign with distance | 17:48 |
wuwei | i remember another example is kmeans | 17:50 |
@HeikoS | kmeans | 17:51 |
@HeikoS | cool | 17:51 |
@HeikoS | let me check thx | 17:51 |
@HeikoS | you remember the method? | 17:51 |
@HeikoS | compute_cluster_variances | 17:51 |
@HeikoS | ? | 17:51 |
wuwei | get_cluster_center | 17:53 |
@HeikoS | lisitsyn: jo | 17:58 |
lisitsyn | HeikoS: yes | 17:58 |
@HeikoS | so now watch_param | 17:58 |
@HeikoS | watch_param("cluster_centers", std::function<SGMatrix<float64_t>()>(get_cluster_centers)); | 17:58 |
@HeikoS | like this? | 17:58 |
@HeikoS | or how? | 17:58 |
@HeikoS | lisitsyn: doesnt compile ;0 | 17:59 |
lisitsyn | HeikoS: you gotta bind 'this' | 18:00 |
lisitsyn | std::bind(&Object::computed_member, obj) | 18:00 |
@HeikoS | lemme try | 18:00 |
@HeikoS | watch_param("cluster_centers", std::bind(&KMeansBase::get_cluster_centers, this)); | 18:01 |
lisitsyn | yes | 18:01 |
@HeikoS | doesnt like it | 18:02 |
@HeikoS | wathc_param that is | 18:02 |
lisitsyn | ah | 18:02 |
lisitsyn | yeah | 18:02 |
lisitsyn | watch_param does not work probably | 18:02 |
@HeikoS | error: no matching function for call to 'shogun::CKMeansBase::watch_param(const char [16], std::_Bind_helper<false, shogun::SGMatrix<double> (shogun::CKMeansBase::*)(), shogun::CKMeansBase*>::type)' | 18:02 |
@HeikoS | watch_param("cluster_centers", std::bind(&CKMeansBase::get_cluster_centers, this)); | 18:02 |
@HeikoS | mmh | 18:02 |
@HeikoS | do we need a new watch_lazy maybe? | 18:02 |
lisitsyn | yes | 18:02 |
lisitsyn | I think so | 18:02 |
lisitsyn | something like watch_method("cluster_centers", get_cluster_centers); | 18:03 |
@HeikoS | lisitsyn: mind adding that? :D | 18:03 |
lisitsyn | yeah lemme try | 18:03 |
lisitsyn | not now though | 18:03 |
lisitsyn | interviewing someone right now | 18:03 |
lisitsyn | :D :D | 18:03 |
@HeikoS | ok | 18:07 |
@HeikoS | pose as interview q! | 18:07 |
@HeikoS | lisitsyn: enjoz! | 18:07 |
lisitsyn | HeikoS: I think I am just in the middle of my worst interview ever | 18:07 |
@HeikoS | why is she so bad? | 18:08 |
lisitsyn | ?\_(?)_/? | 18:08 |
lisitsyn | is irc still logged? :D | 18:09 |
@HeikoS | lol | 18:11 |
@HeikoS | it is | 18:11 |
lisitsyn | then I'd stop at this point heh | 18:12 |
-!- HeikoS [~heiko@host109-151-250-28.range109-151.btcentralplus.com] has quit [Ping timeout: 260 seconds] | 18:40 | |
-!- travis-ci [~travis-ci@ec2-54-158-152-243.compute-1.amazonaws.com] has joined #shogun | 19:22 | |
travis-ci | it's Sergey Lisitsyn's turn to pay the next round of drinks for the massacre he caused in shogun-toolbox/shogun: https://travis-ci.org/shogun-toolbox/shogun/builds/400484409 | 19:22 |
-!- travis-ci [~travis-ci@ec2-54-158-152-243.compute-1.amazonaws.com] has left #shogun [] | 19:22 | |
--- Log closed Fri Jul 06 00:00:56 2018 |
Generated by irclog2html.py 2.10.0 by Marius Gedminas - find it at mg.pov.lt!