--- Log opened Sun May 26 00:00:18 2013 | ||
-!- hushell [~hushell@8-92.ptpg.oregonstate.edu] has joined #shogun | 00:05 | |
-!- HeikoS [~heiko@176.248.212.176] has joined #shogun | 00:27 | |
-!- mode/#shogun [+o HeikoS] by ChanServ | 00:27 | |
@HeikoS | sonney2k: can we do multiple inheritence in shogun? | 00:32 |
---|---|---|
@HeikoS | I think I remember no, but I forgot | 00:32 |
@HeikoS | whats the best way to do a thing similar to java's interface (which is inheriting a couple of pure virtual methods) | 00:32 |
@HeikoS | lisitsyn: ^ | 00:33 |
@lisitsyn | HeikoS: no MI :) | 00:34 |
@HeikoS | lisitsyn: how else=? | 00:34 |
@HeikoS | lisitsyn: also, I am thinking of adding a general ComputationTask class | 00:34 |
@HeikoS | which can be registered in another class | 00:35 |
@HeikoS | which then handles computation of all those | 00:35 |
@HeikoS | and different implementations may do it differently (multicore, mpi, etc) | 00:35 |
@lisitsyn | HeikoS: hmm | 00:35 |
@lisitsyn | when did you come to that idea? | 00:35 |
@lisitsyn | it reproduces something that is in my mind too | 00:36 |
@HeikoS | lisitsyn: I need this for log-det project | 00:36 |
@HeikoS | but would be better to have this in general | 00:37 |
@HeikoS | every task impleentation should have code how to solve it | 00:37 |
@HeikoS | and the std impleentation of CComputationPool | 00:37 |
@HeikoS | just does everything sequentially | 00:37 |
@HeikoS | then we program against that interface | 00:37 |
@HeikoS | and people might come up with more fancy things | 00:37 |
@HeikoS | without changing algorithm code | 00:38 |
@lisitsyn | HeikoS: I see | 00:38 |
@HeikoS | lisitsyn: so lets think about this a bit | 00:38 |
@lisitsyn | but why do you need interfaces there? | 00:38 |
@HeikoS | lisitsyn: nevermind about this | 00:38 |
@HeikoS | lets talk about the computation | 00:39 |
@HeikoS | :) | 00:39 |
@lisitsyn | HeikoS: hah ok | 00:39 |
@lisitsyn | HeikoS: what is computation pool? | 00:39 |
@HeikoS | ok | 00:39 |
@HeikoS | so CComputationPool | 00:40 |
@HeikoS | is an abstract base where one can register tasks | 00:40 |
@HeikoS | and one can call solve_all(), which gives a list of CComputationTaskResult instances | 00:40 |
@HeikoS | register(CComputationTask task) | 00:40 |
@lisitsyn | what is real instances of computation pool? | 00:40 |
@HeikoS | one example: | 00:40 |
@HeikoS | CSequentialComputationPool | 00:41 |
@HeikoS | solve_all just loops over all Tasks and solves them | 00:41 |
@HeikoS | each task knows how it gets solved | 00:41 |
@HeikoS | another exaple: | 00:41 |
@lisitsyn | okay sequential parallel what else? | 00:41 |
@HeikoS | MPI | 00:41 |
@HeikoS | group like structures | 00:41 |
@lisitsyn | ohh hah | 00:41 |
@HeikoS | problem dependent | 00:41 |
@HeikoS | there are only few generic ones | 00:41 |
@HeikoS | most of them will be problem specific | 00:41 |
@HeikoS | still only one interface from main algorithm | 00:42 |
@HeikoS | CIndependentParallelComputationPool | 00:42 |
@HeikoS | I think of using external libraries for more structured stuff | 00:43 |
@HeikoS | but for now, I am just interested in the interface | 00:43 |
@lisitsyn | what libraries? | 00:43 |
@HeikoS | something to schedule for example | 00:43 |
@HeikoS | but doesnt matter now | 00:43 |
@HeikoS | you could imagine that one class uses graphlab for example | 00:43 |
@HeikoS | if there is a lot of structure | 00:44 |
@HeikoS | but even multicore might be nice | 00:44 |
@lisitsyn | but it looks like doing a task in multicore manner is a more frequent case | 00:44 |
@lisitsyn | there is a strong reason to do tasks multicore | 00:45 |
@HeikoS | yes | 00:45 |
@HeikoS | definitely | 00:45 |
@HeikoS | once class could for example do grid-search in a multicore way | 00:45 |
@lisitsyn | when you do totally different things your context is switching like crazy | 00:45 |
@HeikoS | but bad example since grid-search is already impleented, and not in terms of tasks that one regiusters | 00:45 |
@lisitsyn | HeikoS: I'd rather call it Queue btw | 00:46 |
@HeikoS | but new things could be written in terms of tasks that one first registers, and then solves | 00:46 |
@lisitsyn | Pool is a different pattern | 00:46 |
@HeikoS | lisitsyn: it is not a queue | 00:46 |
@HeikoS | but agreed on pool is not good | 00:46 |
@lisitsyn | it is not a pool too ;) | 00:46 |
@HeikoS | Set? :) | 00:46 |
@lisitsyn | pool is a set of prepared objects | 00:46 |
@lisitsyn | set is so neutral that it doesn't tell anything | 00:47 |
@HeikoS | Organizer ?= | 00:47 |
@lisitsyn | engine may be | 00:47 |
@HeikoS | Engine is good! :) | 00:47 |
@lisitsyn | well it is engine in graphlab | 00:47 |
@lisitsyn | :D | 00:47 |
@lisitsyn | I've seen they have some fancy algorithms | 00:48 |
@lisitsyn | for philosophers thing | 00:48 |
@HeikoS | indeed | 00:48 |
@HeikoS | this is not what I want to do | 00:48 |
@lisitsyn | HeikoS: why do you need it btw? | 00:49 |
@HeikoS | log-det estimates *have* to be parallelized | 00:49 |
@HeikoS | can do up to factor few hundred | 00:49 |
@lisitsyn | HeikoS: did you consider opencling it too btw? | 00:49 |
@HeikoS | lisitsyn: I dont want to actually do this for now, but rather prepare it | 00:50 |
@HeikoS | its an experiment | 00:50 |
@HeikoS | other way would be to say: | 00:50 |
@HeikoS | ah nevermind | 00:50 |
@HeikoS | so I want to try it | 00:50 |
@HeikoS | even computing 1 estimate can be parallelized massively | 00:50 |
@lisitsyn | HeikoS: I like idea of formulating *all* operations as jobs/tasks | 00:50 |
@HeikoS | usually one needs a few hundred of them | 00:51 |
@HeikoS | lisitsyn: yes, thats the experiment, if we can make this work, things might be easier to parallelize | 00:51 |
@HeikoS | which they should | 00:51 |
@lisitsyn | I mean if we call train | 00:51 |
@HeikoS | so many loops of independent things in our code | 00:51 |
@lisitsyn | we should just enqueue some operation | 00:51 |
@HeikoS | exactly | 00:51 |
@HeikoS | also this would separate the code structure froim the actual computation a bit more | 00:52 |
@lisitsyn | HeikoS: as for pools - I hope we will get to them too | 00:52 |
@lisitsyn | would be cool to have a thing that manages memory | 00:53 |
@HeikoS | indeed | 00:54 |
@HeikoS | lets experiment with those! | 00:54 |
@lisitsyn | HeikoS: I personally have difficulties with experimenting in shogun | 00:54 |
@lisitsyn | it is big and I have superstitions :D | 00:55 |
@HeikoS | lisitsyn: the best point to do this is when the framework is extended | 00:55 |
@HeikoS | which the log-det project does | 00:56 |
@HeikoS | quite a few classes are necessary for this | 00:56 |
@HeikoS | I wouldnt do it for GP for example | 00:56 |
@HeikoS | there is already too much in single-thred logic | 00:56 |
@HeikoS | thread | 00:56 |
@lisitsyn | HeikoS: I would not go for generic design of that actually | 00:57 |
@lisitsyn | so lets just gradually do that specifically for your task | 00:57 |
@HeikoS | how do you mean that? | 00:57 |
@HeikoS | yes thats my plan | 00:57 |
@lisitsyn | and then generalize when we see a generalization point | 00:57 |
@lisitsyn | HeikoS: I failed too many times with generic design :D | 00:58 |
@HeikoS | haha :) | 00:58 |
@HeikoS | I will send you the class diagram once lambday and I have worked this out | 01:00 |
@HeikoS | he is a smart guy and probably can help a lot there... | 01:00 |
@HeikoS | it makes no sense to do this stuff single-threaded btw | 01:00 |
@HeikoS | lisitsyn: and we should have at least a general framework for multicore stuff with a unified interface | 01:02 |
@HeikoS | since so many tasks are like that | 01:03 |
@HeikoS | I mean independent loops | 01:03 |
@lisitsyn | HeikoS: yes true | 01:04 |
@lisitsyn | just avoid trying to do that general right now | 01:04 |
@HeikoS | well, a little bit at least :) | 01:05 |
@HeikoS | general enough to have multiple forms for the log-det stuff | 01:05 |
@lisitsyn | HeikoS: it would be possible to design a general thing if we had experience | 01:05 |
@lisitsyn | otherwise we have to do that evolutionary | 01:06 |
@lisitsyn | HeikoS: I can design multiagent systems now - but soooo many mistakes have been fixed | 01:07 |
@HeikoS | i see | 01:08 |
@lisitsyn | so is that thing I am sure | 01:09 |
@lisitsyn | :) | 01:09 |
@HeikoS | one should never start coding too early :) | 01:09 |
@HeikoS | want to spend some time planning this | 01:09 |
@lisitsyn | we are just unexperienced to foreseen that | 01:09 |
@lisitsyn | nahh that fails too | 01:09 |
@lisitsyn | HeikoS: it depends on the experience again | 01:10 |
@lisitsyn | in this case I'd rather plan something not really detailed then code it | 01:10 |
@lisitsyn | then see everything is wrong | 01:10 |
@lisitsyn | and refactor | 01:10 |
@lisitsyn | then guess what :D | 01:10 |
@lisitsyn | HeikoS: should a task have a separate object to store data? how to store dependencies? what are types of dependencies? | 01:12 |
@HeikoS | lisitsyn: no dependencies | 01:12 |
@HeikoS | as I said, this is not my goal | 01:12 |
@HeikoS | independent loops | 01:12 |
@lisitsyn | HeikoS: yeah I mean there are a lot of questions | 01:12 |
@HeikoS | data is stored within task, or reference | 01:13 |
@lisitsyn | and not all of them are answer-able design-time | 01:13 |
@HeikoS | this depends on the implementation | 01:13 |
@lisitsyn | HeikoS: I see a lot of possibilities there anyway | 01:13 |
@HeikoS | lisitsyn: yes | 01:13 |
@HeikoS | lisitsyn: I mean, I just want to have something for the log-dets | 01:14 |
@lisitsyn | most of them are usually unforeseen so be ready to refactor and refactor ;) | 01:14 |
@HeikoS | I have coded up all of this in Matlab, both seq and par, so I know what happens, but maybe you are right and I should not be so general | 01:15 |
@lisitsyn | HeikoS: no I just warn you and lambday to not strive for generality from the very beginning | 01:15 |
@HeikoS | lisitsyn: again, he is not meant to implement parallel things | 01:16 |
@HeikoS | just write the sequential one against an interface that might be able to handle this | 01:16 |
@lisitsyn | I see | 01:16 |
@HeikoS | so and the interface I wanted to have btw is | 01:16 |
@HeikoS | that a class can inherit a set of methods | 01:17 |
@HeikoS | that are: register stuff, solve subproblem, etc | 01:17 |
@HeikoS | so whats a good way to simulat interfaces? | 01:17 |
@HeikoS | java doesnt have MI, thats why they have interfaces, but how do we do this? | 01:17 |
@lisitsyn | HeikoS: we are tied to no MI so forget about java :) | 01:17 |
@lisitsyn | I don't know | 01:18 |
@HeikoS | no way? | 01:18 |
@lisitsyn | it is problem dependent | 01:18 |
@HeikoS | by hand | 01:18 |
@lisitsyn | MI is totally troublesome | 01:18 |
@lisitsyn | HeikoS: you mean they form an hierarchy of classes to share some methods | 01:18 |
@lisitsyn | but all of them implement Task | 01:19 |
@lisitsyn | ? | 01:19 |
@HeikoS | yes for example | 01:19 |
@lisitsyn | HeikoS: well I see no problem putting Task to the very top of that hierarchy | 01:20 |
@lisitsyn | so all depends.. | 01:21 |
@HeikoS | lisitsyn: Ill show you the class diagram :) | 01:23 |
@HeikoS | when its more or less done | 01:23 |
@lisitsyn | HeikoS: interfacing is java idiom so may be it just requires to change a point of view | 01:23 |
@lisitsyn | we will see | 01:23 |
@lisitsyn | HeikoS: alright will try to schlafen :) | 01:26 |
@HeikoS | good night lisitsyn! :) | 01:26 |
@lisitsyn | HeikoS: good night | 01:27 |
-!- foulwall [~foulwall@2001:da8:215:6901:93a:5fb3:ab52:7a68] has joined #shogun | 02:03 | |
-!- HeikoS [~heiko@176.248.212.176] has quit [Quit: Leaving.] | 03:06 | |
-!- nube is now known as out | 04:12 | |
-!- out is now known as nube | 04:12 | |
shogun-buildbot | build #407 of nightly_default is complete: Failure [failed test] Build details are at http://www.shogun-toolbox.org/buildbot/builders/nightly_default/builds/407 | 04:17 |
-!- gsomix [~gsomix@83.149.21.63] has joined #shogun | 05:02 | |
gsomix | good morning | 05:02 |
foulwall | gsomix: morning | 05:02 |
-!- gsomix [~gsomix@83.149.21.63] has quit [Ping timeout: 264 seconds] | 06:41 | |
-!- nube [~rho@49.244.28.55] has quit [Ping timeout: 256 seconds] | 07:36 | |
-!- foulwall [~foulwall@2001:da8:215:6901:93a:5fb3:ab52:7a68] has quit [Remote host closed the connection] | 07:40 | |
-!- nube [~rho@49.244.116.16] has joined #shogun | 07:51 | |
-!- flxb_ [~flxb@master.ml.tu-berlin.de] has joined #shogun | 07:53 | |
-!- flxb [~flxb@master.ml.tu-berlin.de] has quit [Write error: Broken pipe] | 07:54 | |
@sonney2k | morning... | 08:33 |
-!- sijin [~smuxi@144.214.222.109] has quit [Read error: Connection reset by peer] | 08:57 | |
@sonney2k | pickle27, any insights? | 08:59 |
-!- iglesiasg [d58f32ac@gateway/web/freenode/ip.213.143.50.172] has joined #shogun | 09:27 | |
-!- mode/#shogun [+o iglesiasg] by ChanServ | 09:28 | |
-!- sijin [~smuxi@144.214.222.109] has joined #shogun | 09:40 | |
-!- hushell [~hushell@8-92.ptpg.oregonstate.edu] has quit [Ping timeout: 264 seconds] | 10:05 | |
-!- foulwall [~foulwall@2001:da8:215:503:d9a2:88ea:88e3:5e47] has joined #shogun | 10:15 | |
-!- hushell [~hushell@c-67-189-100-116.hsd1.or.comcast.net] has joined #shogun | 10:23 | |
-!- votjakovr [~votjakovr@host-46-241-3-209.bbcustomer.zsttk.net] has joined #shogun | 10:30 | |
-!- foulwall [~foulwall@2001:da8:215:503:d9a2:88ea:88e3:5e47] has quit [Remote host closed the connection] | 10:49 | |
-!- votjakovr [~votjakovr@host-46-241-3-209.bbcustomer.zsttk.net] has quit [Quit: Leaving] | 11:18 | |
-!- iglesiasg [d58f32ac@gateway/web/freenode/ip.213.143.50.172] has quit [Ping timeout: 250 seconds] | 12:43 | |
-!- vgorbati [5f8777f7@gateway/web/freenode/ip.95.135.119.247] has joined #shogun | 13:16 | |
-!- van51 [~van51@athedsl-320452.home.otenet.gr] has joined #shogun | 13:18 | |
-!- foulwall_ [~foulwall@2001:da8:215:503:746c:70bc:a9be:cac0] has joined #shogun | 13:30 | |
-!- lisitsyn [~blackburn@109-226-74-97.clients.tlt.100megabit.ru] has quit [Ping timeout: 246 seconds] | 13:31 | |
-!- vgorbati_ [5f85daa8@gateway/web/freenode/ip.95.133.218.168] has joined #shogun | 14:01 | |
-!- vgorbati [5f8777f7@gateway/web/freenode/ip.95.135.119.247] has quit [Ping timeout: 250 seconds] | 14:03 | |
-!- vgorbati_ is now known as vgorbati | 14:05 | |
-!- zxtx [~zv@ool-457e751d.dyn.optonline.net] has quit [Ping timeout: 246 seconds] | 14:08 | |
-!- zxtx [~zv@ool-457e751d.dyn.optonline.net] has joined #shogun | 14:10 | |
-!- vgorbati [5f85daa8@gateway/web/freenode/ip.95.133.218.168] has quit [Ping timeout: 250 seconds] | 14:12 | |
-!- vgorbati [5f85daa8@gateway/web/freenode/ip.95.133.218.168] has joined #shogun | 14:24 | |
-!- gsomix [~gsomix@188.168.2.227] has joined #shogun | 14:34 | |
gsomix | hi | 14:34 |
gsomix | sonney2k, sent PR. | 14:35 |
-!- van51 [~van51@athedsl-320452.home.otenet.gr] has left #shogun ["JOIN #shogun"] | 14:38 | |
gsomix | sonney2k, I hope it's readable now. :) | 14:38 |
-!- van51 [~van51@athedsl-320452.home.otenet.gr] has joined #shogun | 14:38 | |
-!- vgorbati [5f85daa8@gateway/web/freenode/ip.95.133.218.168] has quit [Ping timeout: 250 seconds] | 14:50 | |
-!- foulwall_ [~foulwall@2001:da8:215:503:746c:70bc:a9be:cac0] has quit [Remote host closed the connection] | 15:27 | |
-!- foulwall [~foulwall@2001:da8:215:c252:4b2:f64d:b662:b135] has joined #shogun | 16:50 | |
gsomix | cu later, guys | 16:52 |
-!- sanyam [uid10602@gateway/web/irccloud.com/x-myercfhnlmkikdyu] has quit [Ping timeout: 252 seconds] | 17:39 | |
-!- foulwall [~foulwall@2001:da8:215:c252:4b2:f64d:b662:b135] has quit [Ping timeout: 240 seconds] | 17:45 | |
-!- nube [~rho@49.244.116.16] has quit [Ping timeout: 264 seconds] | 18:23 | |
-!- nube [~rho@49.126.16.146] has joined #shogun | 18:26 | |
-!- nube [~rho@49.126.16.146] has quit [Ping timeout: 256 seconds] | 18:54 | |
-!- nube [~rho@49.244.8.172] has joined #shogun | 19:17 | |
-!- gsomix [~gsomix@188.168.2.227] has quit [Ping timeout: 245 seconds] | 19:25 | |
-!- gsomix [~gsomix@188.168.2.227] has joined #shogun | 19:26 | |
-!- van51 [~van51@athedsl-320452.home.otenet.gr] has left #shogun ["PING 1369589370"] | 19:29 | |
-!- sanyam [uid10602@gateway/web/irccloud.com/x-nsunvhvtlukipqcp] has joined #shogun | 19:36 | |
-!- katia_ [5f43c1c3@gateway/web/freenode/ip.95.67.193.195] has joined #shogun | 19:48 | |
-!- deerishi [c649b206@gateway/web/freenode/ip.198.73.178.6] has joined #shogun | 19:56 | |
-!- vgorbati [5f6ff438@gateway/web/freenode/ip.95.111.244.56] has joined #shogun | 20:00 | |
pickle27 | sonney2k: sorry haven't had a chance to work on that yet | 20:10 |
-!- deerishi [c649b206@gateway/web/freenode/ip.198.73.178.6] has quit [Ping timeout: 250 seconds] | 20:18 | |
-!- katia_ [5f43c1c3@gateway/web/freenode/ip.95.67.193.195] has quit [Ping timeout: 250 seconds] | 20:22 | |
-!- katia_ [5f43c1c3@gateway/web/freenode/ip.95.67.193.195] has joined #shogun | 20:38 | |
gsomix | good evening | 20:46 |
-!- vgorbati [5f6ff438@gateway/web/freenode/ip.95.111.244.56] has quit [Ping timeout: 250 seconds] | 21:21 | |
-!- vgorbati [5f6ff438@gateway/web/freenode/ip.95.111.244.56] has joined #shogun | 21:30 | |
pickle27 | sonney2k: valgrind didn't complain about qda | 21:32 |
pickle27 | sonney2k: paste is here http://pastebin.com/xc3SERUR | 21:34 |
-!- vgorbati [5f6ff438@gateway/web/freenode/ip.95.111.244.56] has quit [Ping timeout: 250 seconds] | 21:35 | |
@sonney2k | pickle27, well yeah it is no memory leak but something else | 21:42 |
@sonney2k | pickle27, how about you pickle.dump all the input that the function gets when you run tester.py | 21:43 |
@sonney2k | and then load that to reproduce/debug the issue | 21:43 |
* sonney2k off | 21:43 | |
pickle27 | sonney2k: I thought valgrind might complain because I thought it might be a result that is bigger than its return allocation if that makes sense | 21:44 |
pickle27 | sonney2k: okay | 21:44 |
pickle27 | sonney2k: the function doesn't get any input from tester.py it just runs the example | 22:05 |
pickle27 | sonney2k: at least thats what it looks like to me | 22:05 |
gsomix | nite | 22:07 |
pickle27 | sonney2k: if I run the modular example on my own it runs fine | 22:07 |
pickle27 | sonney2k: I'll try the same data in the c++ example | 22:07 |
@sonney2k | pickle27, no | 22:15 |
@sonney2k | pickle27, did you pickle dump? | 22:15 |
pickle27 | sonney2k: I was just looking through to see what tester actually did | 22:15 |
pickle27 | sonney2k: doesn't it just run classifier_qdq_modular.py? | 22:16 |
@sonney2k | pickle27, yes but did you dump the data it gets? | 22:16 |
pickle27 | what do you main it doesn't get data the data is loaded in the example itself | 22:17 |
pickle27 | *mean | 22:17 |
@sonney2k | so did you dump it or not? | 22:18 |
pickle27 | theres no need to dump it, its the data/fm_train_real data | 22:18 |
@sonney2k | ok then let me do it | 22:18 |
@sonney2k | pickle27, alright so the reason is that m_store_covs is True in one test | 22:22 |
@sonney2k | pickle27, so just put a true as last argument and you can reproduce the crash in the example | 22:23 |
pickle27 | sonney2k: I thought that might be the problem but it still runs for me when I do that | 22:23 |
pickle27 | sonney2k: ahh got the bug now | 22:24 |
@sonney2k | pickle27, parameter_list = [[traindat, testdat, label_traindat, 1e-4, True], \ | 22:24 |
@sonney2k | then it will crash | 22:24 |
pickle27 | sonney2k: and I see sort of whats happening with the tester | 22:24 |
pickle27 | okay I'll work on fixing this now | 22:25 |
@sonney2k | thanks | 22:26 |
pickle27 | sonney2k: got it now, I just switched to ozansener's covar calc instead | 22:39 |
pickle27 | theres a lot of room for better use of Eigen3 in QDA but it'll work for now | 22:40 |
@sonney2k | pickle27, heh feel free to do it - ohh and benchmarks welcome too! | 22:46 |
pickle27 | sonney2k: yeah I'd like to give it a try in the next bit! | 22:48 |
-!- katia_ [5f43c1c3@gateway/web/freenode/ip.95.67.193.195] has quit [Quit: Page closed] | 22:49 | |
pickle27 | sonney2k: the test runs now but the result is different in a 2 places (unsure why slight numerical differences?) should I make a PR with the fix now and continue investigating? | 22:53 |
@sonney2k | gsomix, yes readable finally :-) | 22:54 |
-!- HeikoS [~heiko@176.248.212.176] has joined #shogun | 23:02 | |
-!- mode/#shogun [+o HeikoS] by ChanServ | 23:02 | |
@sonney2k | HeikoS, hey there! | 23:03 |
@HeikoS | sonney2k: hi! | 23:03 |
@HeikoS | how is it going? | 23:04 |
@sonney2k | tomorrow is the day students will be notified | 23:04 |
@HeikoS | I know | 23:06 |
@HeikoS | sonney2k btw, discussing something with lambday | 23:08 |
@HeikoS | which is basically a class CIndependentComputationEngine | 23:08 |
@HeikoS | which can take instances of a CIndependentComputationTask | 23:08 |
@HeikoS | and run all of them in parallel | 23:09 |
@HeikoS | or sequentially | 23:09 |
@HeikoS | or whatever | 23:09 |
@HeikoS | we need this for the log-det stuff | 23:09 |
@sonney2k | HeikoS, ohh I think sergey had some thoughts on that too | 23:09 |
@HeikoS | and maybe it might be worth to think about generalising it for other things | 23:09 |
@HeikoS | yes we already discussed | 23:09 |
@sonney2k | and wiking would need this for his bagging machine and you for your xval stuff | 23:09 |
@HeikoS | so algorithms just produce a set of tasks instead of doing computations | 23:10 |
@HeikoS | those are given to the computation class | 23:10 |
@HeikoS | it returns results | 23:10 |
@HeikoS | results are being passed to algorithm which aggregate | 23:10 |
@HeikoS | but only for independent/trivially parallelizable stuff | 23:10 |
@HeikoS | otherwise it will be too complicated | 23:10 |
@HeikoS | but this way, many things might benefit | 23:10 |
@HeikoS | grid-serach for example | 23:11 |
@sonney2k | HeikoS, I am not sure how exactly this would work | 23:11 |
@HeikoS | we could have one class which does things in a multicore way | 23:11 |
@sonney2k | yeah multi core / multiple machines | 23:11 |
@sonney2k | machines == computeres | 23:11 |
@HeikoS | yes | 23:11 |
@HeikoS | and future implementations might be coded against this | 23:11 |
@HeikoS | sergey had some doubts however | 23:12 |
@HeikoS | and he is right, its not easy to do this in general | 23:12 |
@sonney2k | howe would it work in case of say bagging? | 23:12 |
@sonney2k | how do you tell which stuff is to be transferred to the remote machine and which not? | 23:13 |
@HeikoS | so the way I would do the abstraction is this | 23:13 |
@sonney2k | I currently can see this work with threads and just a couple of parameters | 23:13 |
@HeikoS | one has a class for indepedent tasks | 23:14 |
@HeikoS | which has abstract method solve | 23:14 |
@sonney2k | (beware already here - you have to set obj->parallel->set_num_threads(1) then) | 23:14 |
@HeikoS | The task itself know everything it needs to know | 23:14 |
@sonney2k | better compute() | 23:14 |
@HeikoS | you one can just call compute/solve | 23:14 |
@HeikoS | and the implementation of the task does everything and returns an instance of an abstract base for result | 23:14 |
@HeikoS | so then your algorithm just produces a set of those tasks | 23:15 |
@HeikoS | these may share data for now (as long as its not modified) | 23:15 |
@HeikoS | but the point is that they hold a complete representation of the subproblem | 23:15 |
@HeikoS | you pass them to computation enginge class | 23:15 |
@HeikoS | basic case: sequential: just a loop over all task.compute() | 23:16 |
@HeikoS | returns a set with result instances | 23:16 |
@HeikoS | one passes them to the algorithm which knows how to aggregate the results if it has produced the tasks | 23:16 |
@HeikoS | multicore implementation would run things at once | 23:16 |
@HeikoS | for this, one needs to clone stuff which is modified | 23:17 |
@HeikoS | read-only things can stay in shared memory | 23:17 |
@HeikoS | distributed implementation might serialize objects and send them to computer | 23:17 |
@HeikoS | s | 23:17 |
@HeikoS | since we are only considereing independent stuff, we dont have scheduling problems | 23:18 |
@sonney2k | HeikoS, so to get it right the task creates all required objects that are passed to the compute engine | 23:18 |
@HeikoS | yes | 23:18 |
@HeikoS | computation engine just calls compute() method in some way | 23:18 |
@sonney2k | how can one do that efficiently? I mean you don't want to create 10 copies of a data set in memory? | 23:18 |
@HeikoS | sonney2k, indeed | 23:18 |
@HeikoS | the thing is: if data is modified, there is no way around that | 23:19 |
@sonney2k | so you pass references only | 23:19 |
@HeikoS | anyway | 23:19 |
@sonney2k | and they are copied if needed | 23:19 |
@HeikoS | exactly | 23:19 |
@HeikoS | so multicore works on the same objects | 23:19 |
@sonney2k | yes for single machines | 23:19 |
@HeikoS | for multiple machines, data needs to be transfered anyway | 23:19 |
@HeikoS | no way around that | 23:19 |
@sonney2k | but for clusters we would just serialize | 23:20 |
@HeikoS | yes | 23:20 |
@HeikoS | to a byte stream or so | 23:20 |
@sonney2k | there is the issue with crashing parts | 23:20 |
@HeikoS | what do you mean with that? | 23:20 |
@sonney2k | (I get the picture and it should be OK) | 23:20 |
@sonney2k | say a cluster node crashes | 23:21 |
@HeikoS | I see | 23:21 |
@sonney2k | how do you fail over | 23:21 |
@sonney2k | or a thread cannot be created etc | 23:21 |
@HeikoS | well thats all to be handled by the computation engine implementation | 23:21 |
@HeikoS | so we can do this later | 23:21 |
@HeikoS | no problem, just try, if it doesnt work, try another machine | 23:21 |
@sonney2k | we have to somehow be able to 'resume' or to restart failed stuff or how $BIGCOMPANY does it start say 30% more jobs | 23:22 |
@sonney2k | to anticipate failures | 23:22 |
@HeikoS | sonney2k, I wouldnt do that | 23:22 |
@HeikoS | rather make tasks smaller | 23:22 |
@HeikoS | subtasks | 23:22 |
@HeikoS | an algorithm can even produce a set of different tasks | 23:22 |
@HeikoS | as long as it knows to aggregate the results | 23:22 |
@HeikoS | resuming is very difficult | 23:23 |
@HeikoS | (I think at least) | 23:23 |
@HeikoS | sonney2k: so I dont want to get involved in too much techical distributed programming, but rather start thinking about a framework that could be extended to this | 23:23 |
@HeikoS | for now, just multicore | 23:24 |
@HeikoS | but formulate algorithm in terms of that task-based framework | 23:24 |
@HeikoS | for independent stuff | 23:24 |
@HeikoS | so quite simple actually | 23:24 |
@sonney2k | HeikoS, I see, but IIRC you have a cluster @work? | 23:25 |
@HeikoS | yes, can do | 23:25 |
@sonney2k | qsub based stuff? | 23:25 |
@HeikoS | yes | 23:25 |
@HeikoS | so I have in mind to create a computation engine that submits qsub jobs | 23:25 |
@HeikoS | at some point | 23:25 |
@sonney2k | so IMHO it would be worth to do that | 23:26 |
@sonney2k | qsub and (just ssh based) | 23:26 |
@HeikoS | yes definitely, we have many independent subproblems in shogun | 23:26 |
@sonney2k | do your nodes share a common file system? | 23:26 |
@HeikoS | I am currently runnign a thing on 100 nodes, thats quite a big speedup factor. | 23:26 |
@HeikoS | yes | 23:26 |
@HeikoS | so one could indeed serialize | 23:27 |
@HeikoS | to a file | 23:27 |
@sonney2k | yes data to one big file and then load only the modified parameters from different files | 23:27 |
@sonney2k | I recall very much the limits we hit with a shared file system | 23:28 |
@HeikoS | this can even be handled by the tasks - give a filename for the main data, and just store the parameters in local variables | 23:28 |
@HeikoS | sonney2k, well one usually doesnt get ones hands on more than a few hundred nodes | 23:28 |
@sonney2k | back then I used bittorrent to cache data on all cluster nodes - all copying would otherwise have taken a week | 23:29 |
@HeikoS | sonney2k: haha :) when was that? | 23:29 |
@sonney2k | hmmhh 2007 or 8? | 23:29 |
@HeikoS | sonney2k: I mean these are all details on how the tasks are implemented, but it all works under the interface | 23:29 |
@sonney2k | looong time ago | 23:29 |
@HeikoS | so if one invests a lot of brainpower, one gets good results | 23:30 |
@HeikoS | if one does not, it might not scale | 23:30 |
@sonney2k | yes | 23:30 |
@HeikoS | point now would be more the general structure | 23:30 |
@sonney2k | the standard map-reduce scheme would work aswell with this | 23:30 |
@HeikoS | true | 23:31 |
@sonney2k | issue is still no loops possible | 23:31 |
@HeikoS | I would rather go for this independent task based stuff | 23:31 |
@HeikoS | more intuitive | 23:31 |
@HeikoS | also, shogun is not really parallel based | 23:31 |
@HeikoS | I mean its focus is not on this | 23:31 |
@HeikoS | but these independent things are just so easy and so useful | 23:32 |
@HeikoS | that we could focus on just them | 23:32 |
@sonney2k | true | 23:32 |
@sonney2k | shogun is meant to run on single machines | 23:32 |
@sonney2k | with lots of cores | 23:32 |
@HeikoS | I would say this stuff that one would parallelize on qsub clusters would be very useful though | 23:33 |
@HeikoS | parameter sweeps etc | 23:33 |
@HeikoS | and exactly, first engine would be one with a shared memory model | 23:33 |
@HeikoS | then we could start modifying existing algorithms | 23:34 |
@HeikoS | and once this is more or less stable | 23:34 |
@HeikoS | one could try adding distributed things | 23:34 |
@HeikoS | step by step | 23:34 |
@sonney2k | HeikoS, the problem really is that you hardly get speedups by just switching multi-core -> multi-machine | 23:35 |
@sonney2k | the algorithm needs to be designed for that usually | 23:35 |
@HeikoS | lets see how it goes, will start with the log-det stuff, which is already a bit of a challenge under this framework. Many linear systems that share a lot of stuff | 23:35 |
@sonney2k | so yes only the big independent jobs will benefit | 23:35 |
@HeikoS | sonney2k: yes | 23:35 |
@sonney2k | but that is what you have in mind | 23:35 |
@HeikoS | the rest is too complicated anyway | 23:36 |
@HeikoS | grid-search is the best example | 23:36 |
@sonney2k | so bagging/ms/etc | 23:36 |
@HeikoS | and random forests etc | 23:36 |
@sonney2k | when I had to parallelize I did mostly ms | 23:36 |
@HeikoS | yes same, usually only independent stuff | 23:36 |
@sonney2k | sometimes data was too big | 23:36 |
@sonney2k | so I trained or applied on chunks | 23:37 |
@HeikoS | I see | 23:37 |
@HeikoS | sonney2k: gotta go now, diner is ready :) be back later | 23:37 |
@sonney2k | cu | 23:38 |
@sonney2k | nice talking to you as always :D | 23:38 |
-!- HeikoS [~heiko@176.248.212.176] has quit [Quit: Leaving.] | 23:39 | |
--- Log closed Mon May 27 00:00:19 2013 |
Generated by irclog2html.py 2.10.0 by Marius Gedminas - find it at mg.pov.lt!