--- Log opened Thu May 16 00:00:17 2019 | ||
-!- anvan [~androirc@103.252.200.48] has quit [Ping timeout: 268 seconds] | 05:09 | |
-!- anvan [~androirc@20.203-211-155.idc-office.qala.com.sg] has joined #shogun | 05:36 | |
-!- anvan [~androirc@20.203-211-155.idc-office.qala.com.sg] has quit [Ping timeout: 250 seconds] | 05:49 | |
-!- AndroUser2 [~androirc@137.132.214.3] has joined #shogun | 06:17 | |
-!- AndroUser2 [~androirc@137.132.214.3] has quit [Ping timeout: 255 seconds] | 06:28 | |
-!- AndroUser2 [~androirc@10.203-211-155.idc-office.qala.com.sg] has joined #shogun | 06:37 | |
-!- wiking [~wiking@huwico/staff/wiking] has quit [Remote host closed the connection] | 06:46 | |
-!- AndroUser2 [~androirc@10.203-211-155.idc-office.qala.com.sg] has quit [Ping timeout: 252 seconds] | 06:50 | |
-!- AndroUser2 [~androirc@58.185.251.86] has joined #shogun | 06:59 | |
-!- wiking [~wiking@huwico/staff/wiking] has joined #shogun | 07:15 | |
-!- mode/#shogun [+o wiking] by ChanServ | 07:15 | |
-!- AndroUser2 [~androirc@58.185.251.86] has quit [Ping timeout: 255 seconds] | 07:15 | |
-!- wiking [~wiking@huwico/staff/wiking] has quit [Ping timeout: 258 seconds] | 07:20 | |
-!- wiking [~wiking@huwico/staff/wiking] has joined #shogun | 07:37 | |
-!- mode/#shogun [+o wiking] by ChanServ | 07:37 | |
-!- essam [c5351150@gateway/web/freenode/ip.197.53.17.80] has joined #shogun | 07:54 | |
-!- anvan [~androirc@103.252.200.48] has joined #shogun | 08:18 | |
-!- gf712 [c13cdcfd@gateway/web/freenode/ip.193.60.220.253] has joined #shogun | 09:46 | |
-!- geektoni [c1cdd253@gateway/web/freenode/ip.193.205.210.83] has joined #shogun | 09:57 | |
-!- HeikoS [~heiko@176.pool85-48-188.static.orange.es] has joined #shogun | 10:21 | |
-!- mode/#shogun [+o HeikoS] by ChanServ | 10:21 | |
@HeikoS | gf712: yo | 10:25 |
---|---|---|
-!- gf712 [c13cdcfd@gateway/web/freenode/ip.193.60.220.253] has quit [Ping timeout: 256 seconds] | 10:34 | |
-!- wiking [~wiking@huwico/staff/wiking] has quit [Remote host closed the connection] | 10:34 | |
-!- geektoni [c1cdd253@gateway/web/freenode/ip.193.205.210.83] has quit [Ping timeout: 256 seconds] | 10:35 | |
-!- gf712 [c13cdcfd@gateway/web/freenode/ip.193.60.220.253] has joined #shogun | 10:40 | |
gf712 | HeikoS: hey | 10:40 |
@HeikoS | gf712: hi! | 10:40 |
gf712 | sorry had a building evacuation at the ati | 10:40 |
@HeikoS | what? | 10:40 |
@HeikoS | practice? | 10:40 |
gf712 | fire alarm | 10:40 |
@HeikoS | ah | 10:40 |
gf712 | no, actual thing | 10:40 |
gf712 | but nothing happened | 10:40 |
gf712 | this is when you realise how many ppl go to the British library :D | 10:41 |
gf712 | HeikoS: managed to get a speed up on the parsing btw | 10:41 |
@HeikoS | yeah I saw | 10:41 |
@HeikoS | quite nice | 10:41 |
@HeikoS | gf712: all of euston road blocked? | 10:42 |
gf712 | no, just the courtyard was full of people | 10:42 |
gf712 | but staff can use the side entrance so managed to get back quickly | 10:42 |
-!- wiking [~wiking@huwico/staff/wiking] has joined #shogun | 10:44 | |
-!- mode/#shogun [+o wiking] by ChanServ | 10:44 | |
@HeikoS | gf712 wiking you have thoughts on adding domains to varialbes | 10:45 |
@HeikoS | like "positive" | 10:45 |
-!- wiking [~wiking@huwico/staff/wiking] has quit [Ping timeout: 248 seconds] | 10:48 | |
@HeikoS | gf712: btw what is the state of the model selection stuff | 10:51 |
@HeikoS | anything I can help with there? | 10:52 |
gf712 | HeikoS: what do you mean with "positive"? | 10:52 |
@HeikoS | >=0 | 10:52 |
gf712 | ah I haven't touched that for a while | 10:52 |
@HeikoS | >0 | 10:52 |
gf712 | just getting openml in shogun done | 10:52 |
@HeikoS | gf712: yeah no worries | 10:52 |
@HeikoS | just wondering | 10:52 |
gf712 | to have a nice example in all targets | 10:52 |
gf712 | languages | 10:52 |
@HeikoS | ok cool | 10:53 |
gf712 | mhhh when would you need positive? | 10:53 |
@HeikoS | meta example right? | 10:53 |
gf712 | yup | 10:53 |
@HeikoS | gf712: like K in KNN | 10:53 |
@HeikoS | or Kmeans | 10:53 |
@HeikoS | put("k", -1) // kaboom | 10:53 |
gf712 | oh you mean to enforce that the user gives positive? | 10:53 |
gf712 | ah ok | 10:54 |
@HeikoS | there is the option to assert that in the "train" method | 10:54 |
@HeikoS | which is more effective | 10:54 |
-!- wiking [~wiking@huwico/staff/wiking] has joined #shogun | 10:54 | |
-!- mode/#shogun [+o wiking] by ChanServ | 10:54 | |
@HeikoS | but also more indirect | 10:54 |
gf712 | yea, we can add yet another param descriptor | 10:54 |
gf712 | in anyparameter | 10:54 |
gf712 | not sure what the cost is | 10:54 |
gf712 | or do you mean have a separate class for each domain | 10:55 |
gf712 | ? | 10:55 |
gf712 | because with the anyparameter I can see having a lambda that does such a check | 10:55 |
@HeikoS | gf712: yeah that is what I was wondering | 10:55 |
gf712 | and that can be put in anyparameter | 10:55 |
@HeikoS | if we add more properties | 10:55 |
@HeikoS | then we need to check all of them in every put | 10:55 |
@HeikoS | -> slow | 10:56 |
gf712 | but put is a setter | 10:56 |
gf712 | it doesn't need to be super fast | 10:56 |
@HeikoS | yep also true | 10:56 |
gf712 | and it's just for interfaces | 10:56 |
@HeikoS | well yes and no | 10:56 |
@HeikoS | algorithms are supposed to change the internals of models using "put" | 10:56 |
gf712 | ah | 10:56 |
gf712 | but not inside the loop right? | 10:57 |
-!- geektoni [c1cdd253@gateway/web/freenode/ip.193.205.210.83] has joined #shogun | 10:57 | |
@HeikoS | iterative methods once per iteration | 10:57 |
@HeikoS | so that might be once per passing all data | 10:57 |
@HeikoS | still neglectable I guess | 10:57 |
gf712 | hmm I think a check for an enum is still much cheaper than finding stuff in a map | 10:58 |
@HeikoS | gf712: point is: more and more properties to be checked in put | 10:58 |
@HeikoS | ok | 10:58 |
@HeikoS | and what about adding a lambda | 10:58 |
gf712 | I mean its just a switchtable | 10:58 |
@HeikoS | that is attached to a variable | 10:58 |
@HeikoS | like geektoni suggested | 10:58 |
gf712 | well lambdas are zero cost | 10:58 |
@HeikoS | SG_ADD("k", &k, "bla", my_positive_lambda) | 10:59 |
gf712 | until they are casted to a function pointer | 10:59 |
@HeikoS | and then we can have a list of general purpose lambdas where devs can pick from | 10:59 |
@HeikoS | and if one is provided, then it is executed | 10:59 |
geektoni | HeikoS: soo discussing put() with constraints? :) | 10:59 |
@HeikoS | otherwise it isnt | 10:59 |
@HeikoS | geektoni: yes :) | 10:59 |
@HeikoS | gf712: I like the lambda idea more tbh...what do you think? | 10:59 |
gf712 | HeikoS: that is what we did with the auto stuff | 10:59 |
@HeikoS | gf712: since we can have as many as we want | 10:59 |
gf712 | the lambda I mean | 10:59 |
@HeikoS | gf712: yes | 10:59 |
gf712 | we added these factories | 11:00 |
@HeikoS | and then we can also have more complex checks | 11:00 |
@HeikoS | like PSD | 11:00 |
gf712 | yes, we can do that | 11:00 |
gf712 | I am not sure what the cost is | 11:00 |
@HeikoS | and for the constraints, we would only need to have the option to have multiple ones, no need to offer users setting them | 11:00 |
@HeikoS | so it is a bit easier than the auto stuff, where users should be able to change | 11:00 |
gf712 | from a developer side it is really useful | 11:01 |
@HeikoS | gf712: yes that is the q. I imagine that all we need is a single check "is_there_a_lambda_attached()", and then execute if there is | 11:01 |
gf712 | from the performance it's hard to tell | 11:01 |
gf712 | yup, that is a single machine operation | 11:01 |
gf712 | it's just a bit shift | 11:01 |
@HeikoS | whereas with the properties, we need to check for every single property added | 11:02 |
gf712 | what do you mean? | 11:02 |
@HeikoS | if we have two properties | 11:02 |
@HeikoS | say POSITIVE and PSD | 11:02 |
@HeikoS | or rather | 11:02 |
@HeikoS | POSITIVE and GREATER_10 | 11:02 |
@HeikoS | then there need to be two checks | 11:02 |
gf712 | ah ok | 11:02 |
gf712 | that is a bit more difficult | 11:03 |
gf712 | but we can do some hackery to move stuff to compile time | 11:03 |
gf712 | and then the runtime check should be quick | 11:03 |
gf712 | basically we need a container of lambdas | 11:04 |
gf712 | which required function pointers | 11:04 |
gf712 | but in C++17 you can do some nice iterations over arrays | 11:04 |
gf712 | with lambdas | 11:04 |
@HeikoS | yes that is what I thought | 11:04 |
gf712 | and then it's not casted to a pointer | 11:04 |
gf712 | and then should be fast at runtime | 11:04 |
@HeikoS | cool, I think that'd be ace | 11:05 |
gf712 | the only thing is that I don't know how we can do that in c++14 | 11:05 |
@HeikoS | can we do it in a way that it is runtime for now, and becomes compile time once we compile with c++17? | 11:06 |
gf712 | yes, we can just cast it to function pointers | 11:06 |
gf712 | put it in a vector | 11:06 |
gf712 | but I think the lambda will have to be stateless | 11:06 |
gf712 | I'll have to check how to do it | 11:06 |
gf712 | this is what I did for that in c++17 https://github.com/gf712/ProStruct/blob/master/src/prostruct/utils/tuple_utils.h | 11:07 |
-!- geektoni_ [c1cdd253@gateway/web/freenode/ip.193.205.210.83] has joined #shogun | 11:07 | |
gf712 | that is all compile time | 11:07 |
gf712 | and then execute kernels at runtime | 11:08 |
-!- geektoni [c1cdd253@gateway/web/freenode/ip.193.205.210.83] has quit [Ping timeout: 256 seconds] | 11:08 | |
@HeikoS | gf712: stateless should be fine or? | 11:09 |
geektoni_ | HeikoS: LDA can go in the feature branch | 11:10 |
gf712 | HeikoS: well need to capture a value if we have something like GREATER_X | 11:11 |
@HeikoS | geektoni_: why not develop? | 11:11 |
@HeikoS | gf712: ah sorry yes | 11:11 |
lisitsyn | please no enums :P | 11:12 |
@HeikoS | gf712: but the param would be read-only | 11:12 |
@HeikoS | const ref | 11:12 |
@HeikoS | lisitsyn: hello! | 11:12 |
lisitsyn | hey | 11:12 |
lisitsyn | you don't want enums | 11:12 |
@HeikoS | lisitsyn: discussing to attach a lambda to parameters | 11:12 |
@HeikoS | to check stuff | 11:12 |
lisitsyn | yes or | 11:12 |
lisitsyn | interface Constraint | 11:12 |
lisitsyn | add().positive().lessThan(10) | 11:13 |
geektoni_ | HeikoS: because it uses the observable stuff which is only in the feature branch | 11:13 |
lisitsyn | Builder positive() { add(PositiveConstraint()); } | 11:13 |
@HeikoS | lisitsyn: ok and that would be checked at runtime when calling ::put | 11:13 |
lisitsyn | enums will be PITA because you need parameters sometimes | 11:13 |
lisitsyn | yes | 11:13 |
lisitsyn | should go into AnyParamete | 11:13 |
@HeikoS | lisitsyn: yeah no enums dont worry | 11:13 |
lisitsyn | just a list of requirements | 11:14 |
@HeikoS | lisitsyn: the lambda thing gf712 suggested would be compile time | 11:14 |
lisitsyn | compile time? que how | 11:14 |
lisitsyn | ain't possible in python, no? | 11:14 |
@HeikoS | true | 11:14 |
gf712 | as in the lambda would be added at compile time to a tuple | 11:14 |
@HeikoS | sorry | 11:14 |
@HeikoS | what I mean is more | 11:15 |
@HeikoS | ^ | 11:15 |
gf712 | the requirements would be known at compile time | 11:15 |
gf712 | so for the K in KNN we know at compile time it has to be positive | 11:15 |
gf712 | no need to add a constraint like that at runtime | 11:15 |
gf712 | let the compiler optimise that call | 11:15 |
lisitsyn | ohh you had quite a lot messages above | 11:15 |
lisitsyn | can you outline how? | 11:15 |
gf712 | yes, but only in C++17 | 11:16 |
@HeikoS | geektoni_: but it doesnt use observable stuff | 11:16 |
gf712 | basically have a tuple | 11:16 |
@HeikoS | geektoni_: only put | 11:16 |
lisitsyn | value + constraint? | 11:16 |
gf712 | you can then tell the compiler that the tuple has to be executed every time a value is changed | 11:17 |
lisitsyn | it might be a good idea to make them composable and lambdas are not composable | 11:17 |
gf712 | something like (apply(std::get<Idx>(lambda_tuple), lambda_args, result(Idx)), ...) | 11:17 |
gf712 | but why composable? they are independent operations no? | 11:17 |
geektoni_ | you're right indeed | 11:17 |
lisitsyn | e.g. lessThan(10).greaterThan(2) | 11:17 |
geektoni_ | HeikoS: ah! I see what you mean | 11:17 |
gf712 | but you still have two independent operations no? | 11:18 |
lisitsyn | I am not sure if checking constraints needs to be really fast | 11:18 |
lisitsyn | ah so is it like you tuple a tuple? | 11:18 |
gf712 | ah no, probably not | 11:18 |
gf712 | mhh not sure what you mean. this just calls the lambdas | 11:19 |
gf712 | without casting them to pointers | 11:19 |
gf712 | so the compiler would inline it properly | 11:20 |
lisitsyn | I have very vague understanding atm | 11:20 |
lisitsyn | where exactly this apply thing happens? | 11:20 |
gf712 | well that would be inside the call that determines if there is something to apply, so in anyparameter? | 11:22 |
gf712 | in any case, this is c++17 stuff so might not be worth thinking about it for a while | 11:22 |
gf712 | let's just do some composition | 11:22 |
gf712 | it should be some light structs right? | 11:23 |
lisitsyn | I think so | 11:25 |
lisitsyn | with something like virtual check(); | 11:25 |
@HeikoS | lisitsyn: I have another q for you | 11:33 |
@HeikoS | lisitsyn: it is about labels | 11:33 |
lisitsyn | aha | 11:33 |
@HeikoS | lisitsyn: you have a min? | 11:33 |
lisitsyn | yes! | 11:33 |
@HeikoS | ok | 11:33 |
@HeikoS | so the current/old way of dealing with labels is | 11:34 |
@HeikoS | we have binary (-1+1), multiclass (0,1,2,...), regression, etc | 11:34 |
@HeikoS | and then each algo enforces that the labels are exactly this type | 11:34 |
-!- gf712 [c13cdcfd@gateway/web/freenode/ip.193.60.220.253] has quit [Ping timeout: 256 seconds] | 11:34 | |
@HeikoS | which can be annoying say if someone has a binary labels instance and wants to run knn, they see an error | 11:35 |
@HeikoS | furthermore, multiclass labels need to be contiguous | 11:35 |
@HeikoS | have to be integers | 11:35 |
lisitsyn | yeah thats stupid :) | 11:35 |
@HeikoS | and also one cannot pass multiclasslabels(0,1,1,1,0) to a binary machine | 11:35 |
lisitsyn | especially contiguous thing | 11:35 |
@HeikoS | yep | 11:35 |
@HeikoS | so ideally we would want | 11:36 |
@HeikoS | discreteLabels | 11:36 |
@HeikoS | which can be anything discrete, represented as say an int | 11:36 |
@HeikoS | and then this replaces both binary and multiclass | 11:36 |
@HeikoS | and the check rather is that it contains only two elements | 11:36 |
@HeikoS | for binary | 11:36 |
lisitsyn | I am not sure about the name | 11:36 |
@HeikoS | well doesnt matter | 11:36 |
@HeikoS | ClassificationLabels | 11:36 |
lisitsyn | hmmm | 11:37 |
lisitsyn | ok nevermin | 11:37 |
@HeikoS | now the issues start | 11:37 |
@HeikoS | SVM algorithm | 11:37 |
@HeikoS | its math formulatio needs +1, -1 | 11:37 |
@HeikoS | internally | 11:37 |
@HeikoS | and naturally, internal apply returns +1, -1 | 11:37 |
lisitsyn | oh that should never be visible to the user | 11:37 |
@HeikoS | (sign of w*x) | 11:37 |
@HeikoS | yes | 11:37 |
@HeikoS | so we need a conversion | 11:38 |
@HeikoS | to the user facing representation | 11:38 |
@HeikoS | in both ways | 11:38 |
@HeikoS | so the question now is: where to do that | 11:38 |
@HeikoS | and the problem is: we have these meta-learning algorithms (xvalidaiton, parameter tuning, multiclass machine) | 11:38 |
@HeikoS | if we did it in the ::train call, it would be done multiple times | 11:39 |
@HeikoS | you see the problem? | 11:39 |
lisitsyn | hmm | 11:39 |
lisitsyn | but the conversion is really fast | 11:39 |
lisitsyn | so it might be that we do not call get() but get_svm_compatible() | 11:40 |
@HeikoS | we have something in place | 11:41 |
@HeikoS | in some cases | 11:41 |
@HeikoS | "binary_labels(m_labels)" | 11:41 |
@HeikoS | that could change | 11:41 |
@HeikoS | to do the conversion | 11:41 |
@HeikoS | and then apply converts back | 11:41 |
@HeikoS | ok now more problems | 11:41 |
@HeikoS | xvalidation | 11:42 |
@HeikoS | say a fold doesnt contain one label instance | 11:42 |
@HeikoS | i.e. it is missing class "2" of (0,1,2,3,4) | 11:42 |
@HeikoS | can happen right? | 11:42 |
@HeikoS | so now the mapping most likely changes | 11:42 |
@HeikoS | lisitsyn: or you have an idea how to avoid that? | 11:42 |
lisitsyn | I don't get it yet | 11:43 |
lisitsyn | why a missing class is a problem? | 11:43 |
@HeikoS | so when you compute a mapping | 11:43 |
@HeikoS | say you have | 11:43 |
@HeikoS | labels(0,0,1,1,2,2) | 11:43 |
@HeikoS | or rather say | 11:44 |
@HeikoS | labels(A,A,B,B,C,C) | 11:44 |
@HeikoS | and pass that to a multiclass machine that needs contiguous | 11:44 |
@HeikoS | so then we map | 11:44 |
@HeikoS | A->0 | 11:44 |
@HeikoS | B->1 | 11:44 |
@HeikoS | C->2 | 11:44 |
@HeikoS | and run stuff internally | 11:44 |
@HeikoS | but now some fold in xvlalidation misses the B in the labels | 11:44 |
@HeikoS | then the mapping might become | 11:45 |
@HeikoS | A->0 | 11:45 |
@HeikoS | C->1 | 11:45 |
lisitsyn | ahh ok | 11:45 |
lisitsyn | yes then mapping should not happen after the split | 11:45 |
lisitsyn | uhmm lets think how to ensure that | 11:45 |
@HeikoS | could argue that xvalidation stuff is internal, but observers and stuff ... | 11:45 |
lisitsyn | I think in sklearn you do fit_transform | 11:45 |
@HeikoS | lisitsyn: yes exactly | 11:45 |
lisitsyn | so we should do maybe | 11:45 |
@HeikoS | lisitsyn: it should always happen at the "highest" level | 11:45 |
lisitsyn | ah but in sklearn it is a property of model iirc | 11:46 |
lisitsyn | like you pass a b c and the mapping is stored in the classifier | 11:46 |
@HeikoS | that could work | 11:46 |
lisitsyn | but I don't like the approach | 11:47 |
lisitsyn | I think it should be in lables | 11:47 |
@HeikoS | the problem is that we want labels to be const | 11:47 |
lisitsyn | it sounds more reasonable | 11:47 |
@HeikoS | thread safety etc | 11:47 |
@HeikoS | so I mean CLabels could store a mapping, and then xvalidation invokes computation of the mapping, and then ::train just reads that | 11:48 |
@HeikoS | i.e. lazy generation of the mapping | 11:48 |
@HeikoS | but again we dont want to modify labels in training, so maybe the model is a better place | 11:49 |
@HeikoS | but then what if the labels change ...... | 11:49 |
@HeikoS | and basically, that is where I was discussing this with Gil last time | 11:49 |
@HeikoS | suggestions? :D | 11:49 |
@HeikoS | lisitsyn: disappeared? :) | 11:52 |
lisitsyn | sorry | 11:53 |
lisitsyn | back | 11:53 |
lisitsyn | 1 min :) | 11:53 |
lisitsyn | HeikoS: well immutable is solved with copies | 11:55 |
lisitsyn | I guess we can re-use the same original labels | 11:56 |
lisitsyn | and have a light-weight object that has a mapping | 11:56 |
@HeikoS | can you pseudo code a bit? | 11:56 |
@HeikoS | you are essentially saying that inside ::train(..., const CLabels* labels), we first check whether labels is already in the right space, and if not we create a new instance with the mapped values? | 11:58 |
lisitsyn | so say | 12:03 |
lisitsyn | you have Labels original_labels | 12:03 |
lisitsyn | ah yeah | 12:03 |
lisitsyn | once we train we create mapped_labels yes | 12:03 |
lisitsyn | HeikoS: something like that maybe | 12:03 |
@HeikoS | Ok and then xvalidation does this | 12:04 |
@HeikoS | and then splits the converted labels | 12:04 |
@HeikoS | and passes those on | 12:04 |
@HeikoS | and every train call also does this, but it is a nop if the labels are already mapped | 12:05 |
-!- gf712 [c13cdcfd@gateway/web/freenode/ip.193.60.220.253] has joined #shogun | 12:06 | |
-!- HeikoS1 [~heiko@221.pool85-48-188.static.orange.es] has joined #shogun | 12:09 | |
HeikoS1 | lisitsyn: sorry got disconnected | 12:09 |
-!- HeikoS [~heiko@176.pool85-48-188.static.orange.es] has quit [Ping timeout: 258 seconds] | 12:11 | |
lisitsyn | HeikoS1: yeah it seems that it could be the easiest way | 12:24 |
lisitsyn | it is important to re-use the memory though | 12:24 |
lisitsyn | but it seems to be easy | 12:24 |
HeikoS1 | the copy would only store the mapping | 12:24 |
HeikoS1 | and the original vector | 12:25 |
HeikoS1 | although | 12:25 |
HeikoS1 | not sure | 12:25 |
HeikoS1 | as many algos are based on vectorized label access | 12:25 |
HeikoS1 | so there is at least twice the memory | 12:25 |
HeikoS1 | when converted | 12:25 |
lisitsyn | HeikoS1: it should not be a critical issue I think | 12:29 |
lisitsyn | labels are just like one feature anyway | 12:29 |
lisitsyn | HeikoS1: ok then I guess we can do it lazy with explicit compute | 12:29 |
lisitsyn | so if a method uses no vectorized access we use get() that maps labels | 12:30 |
lisitsyn | once it gets labels as vectors they are mapped into the new vector | 12:30 |
lisitsyn | I don't remember: what methods do use the vectorized access? | 12:31 |
HeikoS1 | quite a few | 12:31 |
HeikoS1 | kernel stuff mostly | 12:32 |
HeikoS1 | (not SVM) | 12:32 |
HeikoS1 | KRR | 12:32 |
-!- geektoni_ [c1cdd253@gateway/web/freenode/ip.193.205.210.83] has quit [Quit: Page closed] | 12:33 | |
lisitsyn | HeikoS1: ah ok then yes | 12:43 |
lisitsyn | get_vector() maps | 12:43 |
lisitsyn | and get(i) does not | 12:43 |
lisitsyn | sounds valid :) | 12:43 |
-!- HeikoS [~heiko@239.pool85-48-188.static.orange.es] has joined #shogun | 12:53 | |
-!- mode/#shogun [+o HeikoS] by ChanServ | 12:53 | |
@HeikoS | lisitsyn: here is another issue | 12:53 |
@HeikoS | so say I have trained my svm | 12:53 |
@HeikoS | I received user-labels (A,A,B,C) | 12:54 |
lisitsyn | aha | 12:54 |
-!- HeikoS1 [~heiko@221.pool85-48-188.static.orange.es] has quit [Ping timeout: 252 seconds] | 12:54 | |
-!- wiking [~wiking@huwico/staff/wiking] has quit [Remote host closed the connection] | 12:54 | |
@HeikoS | I convert internally to CBinaryLabels | 12:54 |
@HeikoS | I train | 12:54 |
@HeikoS | and then the user wants to apply | 12:54 |
@HeikoS | my internal thing returns +1, +1, -1 | 12:54 |
@HeikoS | how do I map back? | 12:54 |
@HeikoS | I didn't store the labels | 12:54 |
lisitsyn | we can basically store a bimap | 12:55 |
lisitsyn | so once we map we reverse it | 12:55 |
lisitsyn | that might solve the problem | 12:55 |
@HeikoS | yeah sure | 12:56 |
@HeikoS | but SVM doesnt store labels | 12:56 |
@HeikoS | or say another model | 12:56 |
@HeikoS | storing labels you mean | 12:56 |
lisitsyn | ahhh that's actually a point why mapping might correspond to a machine | 12:57 |
@HeikoS | I guess so | 12:57 |
gf712 | HeikoS: for some reason mkl tests are failing? | 12:59 |
gf712 | as intel mkl build | 12:59 |
-!- wiking [~wiking@huwico/staff/wiking] has joined #shogun | 13:00 | |
-!- mode/#shogun [+o wiking] by ChanServ | 13:00 | |
@HeikoS | gf712: since whne? | 13:00 |
gf712 | since the kernel merge I think | 13:04 |
gf712 | where the swig stuff was taken out | 13:04 |
gf712 | not sure how.. | 13:04 |
gf712 | HeikoS: actually before that | 13:05 |
gf712 | must be some update with mkl? | 13:05 |
gf712 | I don't think it was caused by anything merged to shogun develop | 13:05 |
@HeikoS | I guess some update then | 13:10 |
@HeikoS | CI was green on all merged PRs iirc | 13:10 |
gf712 | yea it failed for the first time in the arff pr | 13:11 |
gf712 | so must be caused by an external lib | 13:12 |
gf712 | but mkl hasn't been update for 2 months https://anaconda.org/intel/mkl-devel | 13:12 |
gf712 | and eigen is from a specific release right? | 13:12 |
@HeikoS | okok | 13:15 |
@HeikoS | lisitsyn: http://collabedit.com/4qfwu | 13:35 |
@HeikoS | gf712: yes eigen is by git hash | 13:35 |
lisitsyn | HeikoS: reading | 13:36 |
lisitsyn | HeikoS: well looks valid so far | 13:40 |
@HeikoS | lisitsyn: cool | 13:44 |
@HeikoS | I'll let it sink in a bit | 13:45 |
@HeikoS | and then discuss again | 13:45 |
@HeikoS | gf712: we also discussed labels, see above for an idea how to move between internal/user-facing space | 13:45 |
gf712 | HeikoS: having a look | 13:51 |
-!- essam [c5351150@gateway/web/freenode/ip.197.53.17.80] has quit [Quit: Page closed] | 13:52 | |
@HeikoS | gf712: cool, I'll have lunch now, might be back later | 13:53 |
-!- HeikoS [~heiko@239.pool85-48-188.static.orange.es] has quit [Quit: Leaving.] | 13:55 | |
-!- wiking [~wiking@huwico/staff/wiking] has quit [Remote host closed the connection] | 14:00 | |
-!- wiking [~wiking@huwico/staff/wiking] has joined #shogun | 14:07 | |
-!- mode/#shogun [+o wiking] by ChanServ | 14:07 | |
-!- wiking [~wiking@huwico/staff/wiking] has quit [Read error: Connection reset by peer] | 14:09 | |
-!- wiking_ [~wiking@huwico/staff/wiking] has joined #shogun | 14:09 | |
-!- mode/#shogun [+o wiking_] by ChanServ | 14:09 | |
-!- wiking_ [~wiking@huwico/staff/wiking] has quit [Ping timeout: 252 seconds] | 14:13 | |
-!- wiking [~wiking@huwico/staff/wiking] has joined #shogun | 14:21 | |
-!- mode/#shogun [+o wiking] by ChanServ | 14:21 | |
gf712 | wiking: ping | 14:48 |
@wiking | hola | 14:48 |
gf712 | hey, do you use instruments much? | 14:48 |
gf712 | I.e. time profiler | 14:48 |
@wiking | yep | 14:49 |
@wiking | on the oher hand define 'much' | 14:49 |
@wiking | use it when i can | 14:49 |
@wiking | :) | 14:49 |
gf712 | ok! | 14:50 |
gf712 | do you ever look at the highlighted code that is used heavily? | 14:50 |
gf712 | because it doesn't make a lot of sense to me | 14:50 |
@wiking | ah | 14:51 |
@wiking | that is tricky | 14:51 |
gf712 | I am seeing calls to function specialisations that weren't even use... | 14:51 |
@wiking | i always just use the inverse tree shit | 14:51 |
@wiking | and dont look at the code hightlighting | 14:51 |
gf712 | ah ok ok | 14:51 |
gf712 | I didn't see there was this thing | 14:51 |
gf712 | much clearer now | 14:51 |
-!- HeikoS [~heiko@73.red-83-46-178.dynamicip.rima-tde.net] has joined #shogun | 15:00 | |
-!- mode/#shogun [+o HeikoS] by ChanServ | 15:00 | |
@HeikoS | lisitsyn, gf712 actually I realised one thing ... it is actually OK if the mappings are different during xvalidation, as the only thing that matters is the result of the "apply" function | 15:01 |
@HeikoS | so this thing of precomputing the mapping in that case is only to save some cpu cycles | 15:01 |
-!- wiking [~wiking@huwico/staff/wiking] has quit [Remote host closed the connection] | 15:03 | |
gf712 | HeikoS: i am not sure I follow | 15:06 |
gf712 | isn't the issue still that a label in test might not have been seen in train? | 15:06 |
gf712 | in one of the folds | 15:06 |
-!- wiking [~wiking@huwico/staff/wiking] has joined #shogun | 15:10 | |
-!- mode/#shogun [+o wiking] by ChanServ | 15:10 | |
-!- geektoni [973e0080@gateway/web/freenode/ip.151.62.0.128] has joined #shogun | 15:19 | |
@HeikoS | gf712: yes, I guess it doesnt matter if the mapping changes in between folds | 15:25 |
@HeikoS | as the predictions will be mapped back into the original label space in ::apply | 15:26 |
gf712 | ah ok ok, I think I see what you mean | 15:29 |
gf712 | HeikoS: btw what are the types of strings that are supported in shogun then? | 15:30 |
@HeikoS | over all ptypes | 15:30 |
gf712 | I added the templating now but I just have char | 15:30 |
gf712 | ah ok | 15:30 |
@HeikoS | but in practice | 15:30 |
@HeikoS | char | 15:30 |
gf712 | and then the EAlphabet? | 15:30 |
@HeikoS | uint16_t | 15:30 |
@HeikoS | yeah | 15:30 |
@HeikoS | can add them one by one | 15:30 |
gf712 | isn't the largest type char? | 15:30 |
@HeikoS | just need to make sure that is easy | 15:30 |
@HeikoS | I added some uin16_t string stuff recently | 15:31 |
@HeikoS | check out string.sg | 15:31 |
@HeikoS | meta example | 15:31 |
gf712 | ok! thanks :) | 15:31 |
geektoni | ping HeikoS | 15:37 |
@HeikoS | geektoni: pong | 15:37 |
geektoni | HeikoS: when you say that we do not have testing for KMeans, you mean that also the meta example is not useful to check consistency between the two KMeans versions? :/ | 15:38 |
@HeikoS | ah no | 15:38 |
@HeikoS | I meant testing the observable stuff | 15:38 |
@HeikoS | like ... does it work what you add there :) | 15:38 |
geektoni | ahh I see I see | 15:38 |
@HeikoS | no need to have a test | 15:39 |
@HeikoS | just check it and put some evidence | 15:39 |
geektoni | sure sure, that's easy then ;) | 15:39 |
@HeikoS | should be | 15:40 |
geektoni | HeikoS: ah btw, what's going on with MKL stuff? :/ | 15:42 |
@HeikoS | what do you mean? | 15:43 |
geektoni | HeikoS: like on the CI, MacOS MKL happens to fail on many test which are fine on the other environments | 15:47 |
geektoni | I saw that you and gf712 were discussing about it earlier :) | 15:48 |
@HeikoS | ah ok | 15:50 |
@HeikoS | yeah idk tbh | 15:50 |
@HeikoS | stopped working at some point | 15:50 |
geektoni | HeikoS: kk, I'll merge LDA then since it is the cause of those errors | 15:52 |
@HeikoS | why "since it is the cause of those errors" ? | 15:53 |
geektoni | ahh no | 15:54 |
geektoni | since it is *not* the cause | 15:54 |
geektoni | can't write today | 15:54 |
@HeikoS | hehe | 15:55 |
@HeikoS | but we want the refactors you are doing into develop or? | 15:55 |
@HeikoS | since that will compile | 15:55 |
geektoni | as you said before, if I'm using just put() they can go into develop. If I need to use also observe(), then they need to go in the feature branch | 15:58 |
@HeikoS | ah yes the observe | 15:59 |
@HeikoS | there was an observe in LDA | 15:59 |
@HeikoS | but I didnt get why that was needed | 15:59 |
geektoni | HeikoS: mmh there is no observe in LDA, you mean KMeans? | 16:01 |
@HeikoS | ah yes | 16:01 |
@HeikoS | sorry | 16:01 |
geektoni | HeikoS: I use observe() sometimes since there may be methods which act directly on the registered variable (like mus for KMeans) | 16:03 |
geektoni | therefore | 16:03 |
geektoni | there is no need to "put" them again | 16:03 |
@HeikoS | and so you avoid the put call | 16:03 |
geektoni | yep | 16:03 |
@HeikoS | which basically would be wasted cpi | 16:03 |
@HeikoS | cpu | 16:03 |
@HeikoS | to copy | 16:03 |
geektoni | well the copy is still done by the observe() method | 16:04 |
geektoni | but I do not want to have all the (possible) put overhead | 16:04 |
@HeikoS | both do an any_cast or? | 16:04 |
geektoni | observe() does not | 16:04 |
@HeikoS | ah ok then | 16:04 |
@HeikoS | does observe also copy data if there is no observer registered? | 16:04 |
geektoni | mmh I guess it still does | 16:05 |
@HeikoS | can that be avoided :D | 16:05 |
geektoni | since there is no explicit check | 16:05 |
geektoni | ye ye | 16:05 |
@HeikoS | mmmmh | 16:05 |
@HeikoS | so just thinking | 16:05 |
@HeikoS | I mean basically all algorithms directly modify member variables right? | 16:05 |
@HeikoS | and the put can always be avoided | 16:05 |
@HeikoS | inside an algorithm | 16:05 |
@HeikoS | ? | 16:05 |
geektoni | yep, most of them so far | 16:07 |
@HeikoS | it is a bit weird | 16:07 |
@HeikoS | lisitsyn: you still here? | 16:08 |
-!- wiking [~wiking@huwico/staff/wiking] has quit [Remote host closed the connection] | 16:08 | |
@HeikoS | geektoni: you see, we wanted to avoid adding all these observe calls for model parameters | 16:08 |
@HeikoS | via putting "observe" into put | 16:09 |
@HeikoS | but now we don't use it | 16:09 |
@HeikoS | then there wouldn't have been the need to add it (which simplifies things quite a bit, as this makes parameter framework and observable framework interdependent) | 16:09 |
@HeikoS | but also.... in general we want to observe all model parameters *by default* in every iteration | 16:10 |
@HeikoS | without us changing the algorithms | 16:10 |
geektoni | The problem is that we would need to refactor those algorithms to use put() instead of access directly the member variables | 16:10 |
geektoni | we would need to touch them anyway :) | 16:11 |
@HeikoS | yes but just once | 16:11 |
@HeikoS | adding all those observe calls makes it much more likely that we need to touch everything again | 16:11 |
@HeikoS | I wonder now whether we shouldnt just remove the observe call inside ::put | 16:12 |
@HeikoS | and then instead put observe calls inside CIterativeMachine | 16:12 |
@HeikoS | either explicit | 16:12 |
@HeikoS | or implicit with a filter on parameter properties | 16:12 |
@HeikoS | but we basically dont need to observe call inside put if we are adding those things anyways | 16:13 |
@HeikoS | you get my point? | 16:13 |
geektoni | mmh kind of | 16:13 |
geektoni | the problem with the iterative machine approach is that we do not know exactly which variables it will have | 16:14 |
geektoni | since it is a mixin | 16:14 |
@HeikoS | we know | 16:14 |
@HeikoS | ParameterProperties::MODEL | 16:15 |
geektoni | ahh I see what you mean | 16:15 |
geektoni | like | 16:15 |
geektoni | every iteration, we emit every registered variable, without actually caring if it was modified or not | 16:16 |
@HeikoS | yes | 16:17 |
@HeikoS | this way we can avoid this explicit observe stuff for the registered model parameters at least | 16:18 |
@HeikoS | and we dont need observe inside put | 16:18 |
geektoni | okay soo | 16:18 |
@HeikoS | mmh | 16:18 |
@HeikoS | but wait | 16:18 |
@HeikoS | on the other hand | 16:18 |
@HeikoS | emitting things that did not change is not good either | 16:18 |
@HeikoS | so that is where the old design was nice in the sense that only if put is called, something is emitted | 16:19 |
@HeikoS | but now some algos dont call put but modify their members directly | 16:19 |
@HeikoS | so now you do the "observe" call | 16:19 |
@HeikoS | to avoid put overhead | 16:20 |
@HeikoS | okok | 16:20 |
geektoni | I guess we need to find the best tradeoff | 16:20 |
@HeikoS | yeah seems like ups and downs | 16:21 |
@HeikoS | maybe leave it as it is for now | 16:21 |
@HeikoS | and we see whether there are more problems | 16:21 |
-!- wiking [~wiking@huwico/staff/wiking] has joined #shogun | 16:21 | |
-!- mode/#shogun [+o wiking] by ChanServ | 16:21 | |
geektoni | I mean, we would need to use at least two approaches to observe things | 16:21 |
geektoni | HeikoS: ah btw, https://gist.github.com/geektoni/487fa1c3eac5fbd31c70c9dc54d67fb1 | 16:22 |
@HeikoS | i think maybe use "put" instead of observe for the changed parameters at least | 16:22 |
@HeikoS | because that makes it clear | 16:22 |
@HeikoS | put means that a parameter was changed | 16:22 |
geektoni | kmeans works. at the end of the gist there is the output | 16:22 |
@HeikoS | otherwise it gets convoluted | 16:22 |
@HeikoS | cool that it works :) | 16:22 |
@HeikoS | because we might use the fact that put means parameter has been modified later on | 16:23 |
geektoni | I see | 16:23 |
@HeikoS | geektoni: one thing: it would be good if there was no copy being performed if no observers are subscribed | 16:23 |
@HeikoS | you agree? | 16:23 |
geektoni | I agree | 16:24 |
@HeikoS | ok cool | 16:24 |
@HeikoS | so let's do that | 16:24 |
@HeikoS | and also make observe->put | 16:24 |
@HeikoS | for model parameters | 16:24 |
geektoni | kk, even if there are change in-place? | 16:25 |
@HeikoS | could you benchmark it? | 16:25 |
@HeikoS | for say kmeans on a reasonably sized problem? | 16:25 |
geektoni | sure, let me do some testing | 16:26 |
-!- wiking [~wiking@huwico/staff/wiking] has quit [Read error: Connection reset by peer] | 17:00 | |
-!- wiking_ [~wiking@huwico/staff/wiking] has joined #shogun | 17:00 | |
-!- mode/#shogun [+o wiking_] by ChanServ | 17:00 | |
-!- wiking_ [~wiking@huwico/staff/wiking] has quit [Remote host closed the connection] | 17:05 | |
-!- wiking [~wiking@huwico/staff/wiking] has joined #shogun | 17:07 | |
-!- mode/#shogun [+o wiking] by ChanServ | 17:07 | |
-!- geektoni [973e0080@gateway/web/freenode/ip.151.62.0.128] has quit [Quit: Page closed] | 17:09 | |
-!- wiking [~wiking@huwico/staff/wiking] has quit [Ping timeout: 250 seconds] | 17:11 | |
-!- wiking [~wiking@huwico/staff/wiking] has joined #shogun | 17:29 | |
-!- mode/#shogun [+o wiking] by ChanServ | 17:29 | |
-!- wiking [~wiking@huwico/staff/wiking] has quit [Ping timeout: 264 seconds] | 17:34 | |
-!- HeikoS [~heiko@73.red-83-46-178.dynamicip.rima-tde.net] has quit [Ping timeout: 246 seconds] | 17:43 | |
-!- geektoni [973e524e@gateway/web/freenode/ip.151.62.82.78] has joined #shogun | 17:49 | |
-!- HeikoS [~heiko@73.red-83-46-178.dynamicip.rima-tde.net] has joined #shogun | 17:51 | |
-!- mode/#shogun [+o HeikoS] by ChanServ | 17:51 | |
-!- HeikoS [~heiko@73.red-83-46-178.dynamicip.rima-tde.net] has quit [Ping timeout: 255 seconds] | 17:59 | |
-!- wiking [~wiking@huwico/staff/wiking] has joined #shogun | 18:32 | |
-!- mode/#shogun [+o wiking] by ChanServ | 18:32 | |
-!- wiking [~wiking@huwico/staff/wiking] has quit [Ping timeout: 264 seconds] | 18:36 | |
-!- gf712 [c13cdcfd@gateway/web/freenode/ip.193.60.220.253] has quit [Ping timeout: 256 seconds] | 18:44 | |
-!- wiking [~wiking@huwico/staff/wiking] has joined #shogun | 18:46 | |
-!- mode/#shogun [+o wiking] by ChanServ | 18:46 | |
-!- wiking [~wiking@huwico/staff/wiking] has quit [Ping timeout: 252 seconds] | 18:51 | |
-!- lambday [a7dcee98@gateway/web/freenode/ip.167.220.238.152] has quit [Ping timeout: 256 seconds] | 19:22 | |
-!- HeikoS [~heiko@73.red-83-46-178.dynamicip.rima-tde.net] has joined #shogun | 19:26 | |
-!- mode/#shogun [+o HeikoS] by ChanServ | 19:26 | |
-!- wiking [~wiking@huwico/staff/wiking] has joined #shogun | 19:31 | |
-!- mode/#shogun [+o wiking] by ChanServ | 19:31 | |
@HeikoS | geektoni: hi | 19:34 |
@HeikoS | you still here? | 19:34 |
geektoni | HeikoS: yes yes still here | 19:34 |
@HeikoS | I sent you an email | 19:34 |
geektoni | let me check | 19:35 |
-!- wiking [~wiking@huwico/staff/wiking] has quit [Ping timeout: 246 seconds] | 19:35 | |
geektoni | HeikoS: okay, everything make sense to me | 19:37 |
geektoni | this way we could also remove ObservedValue from SGObject | 19:37 |
@HeikoS | yes | 19:37 |
@HeikoS | I think this is worth the effort | 19:38 |
geektoni | I mean yeah, surely it will bring less problems in the future | 19:38 |
geektoni | soo I guess the benchmark for KMeans put is not exactly needed anymore, since we will just use put() | 19:39 |
@HeikoS | yes | 19:39 |
@HeikoS | that's why I came back | 19:39 |
@HeikoS | hoping that you hadnt written that yet :) | 19:40 |
geektoni | ahaha too late man | 19:40 |
@HeikoS | nooooooooo | 19:40 |
geektoni | I mean | 19:40 |
geektoni | I found a undocumented cpp example which was basically doing the job | 19:41 |
geektoni | so nw | 19:41 |
@HeikoS | ah ok | 19:41 |
@HeikoS | good :) | 19:41 |
-!- wiking [~wiking@huwico/staff/wiking] has joined #shogun | 19:45 | |
-!- mode/#shogun [+o wiking] by ChanServ | 19:45 | |
@HeikoS | lisitsyn: still here? | 19:50 |
-!- wiking [~wiking@huwico/staff/wiking] has quit [Remote host closed the connection] | 20:16 | |
-!- geektoni [973e524e@gateway/web/freenode/ip.151.62.82.78] has quit [Quit: Page closed] | 20:20 | |
-!- HeikoS [~heiko@73.red-83-46-178.dynamicip.rima-tde.net] has quit [Ping timeout: 258 seconds] | 20:34 | |
-!- wiking [~wiking@huwico/staff/wiking] has joined #shogun | 20:48 | |
-!- mode/#shogun [+o wiking] by ChanServ | 20:48 | |
-!- wiking [~wiking@huwico/staff/wiking] has quit [Remote host closed the connection] | 21:06 | |
-!- wiking [~wiking@huwico/staff/wiking] has joined #shogun | 21:13 | |
-!- mode/#shogun [+o wiking] by ChanServ | 21:13 | |
-!- HeikoS [~heiko@73.red-83-46-178.dynamicip.rima-tde.net] has joined #shogun | 23:12 | |
-!- mode/#shogun [+o HeikoS] by ChanServ | 23:12 | |
-!- anvan [~androirc@103.252.200.48] has quit [Read error: Connection reset by peer] | 23:41 | |
-!- HeikoS [~heiko@73.red-83-46-178.dynamicip.rima-tde.net] has quit [Ping timeout: 245 seconds] | 23:42 | |
--- Log closed Fri May 17 00:00:18 2019 |
Generated by irclog2html.py 2.10.0 by Marius Gedminas - find it at mg.pov.lt!