--- Log opened Sat Jun 11 00:00:17 2011 | ||
-!- ins__ [~ins@p5DDCE1B6.dip.t-dialin.net] has quit [Ping timeout: 240 seconds] | 01:27 | |
-!- in3xes_ [~in3xes@180.149.49.227] has joined #shogun | 04:09 | |
-!- in3xes [~in3xes@180.149.49.227] has quit [Ping timeout: 276 seconds] | 04:13 | |
-!- in3xes_ is now known as in3xes | 05:01 | |
-!- in3xes_ [~in3xes@180.149.49.227] has joined #shogun | 07:06 | |
-!- in3xes [~in3xes@180.149.49.227] has quit [Ping timeout: 276 seconds] | 07:10 | |
-!- in3xes1 [~in3xes@180.149.49.227] has joined #shogun | 07:18 | |
-!- in3xes_ [~in3xes@180.149.49.227] has quit [Ping timeout: 276 seconds] | 07:22 | |
-!- in3xes_ [~in3xes@210.212.58.111] has joined #shogun | 07:49 | |
-!- in3xes1 [~in3xes@180.149.49.227] has quit [Ping timeout: 276 seconds] | 07:52 | |
-!- in3xes_ is now known as in3xes | 08:40 | |
-!- in3xes [~in3xes@210.212.58.111] has quit [Ping timeout: 276 seconds] | 10:47 | |
-!- in3xes [~in3xes@59.163.196.121] has joined #shogun | 10:47 | |
-!- alesis-novik [~alesis@78-56-236-240.static.zebra.lt] has joined #shogun | 13:49 | |
-!- blackburn1 [~blackburn@188.122.224.102] has joined #shogun | 14:21 | |
-!- blackburn1 is now known as blackburn | 14:21 | |
-!- blackburn1 [~blackburn@188.122.224.102] has joined #shogun | 17:42 | |
-!- blackburn [~blackburn@188.122.224.102] has quit [Read error: No route to host] | 17:42 | |
-!- blackburn1 [~blackburn@188.122.224.102] has quit [Ping timeout: 240 seconds] | 18:03 | |
-!- blackburn [~blackburn@188.122.224.102] has joined #shogun | 18:04 | |
-!- serialhex [~quassel@99-101-148-183.lightspeed.wepbfl.sbcglobal.net] has quit [Remote host closed the connection] | 18:16 | |
-!- blackburn1 [~blackburn@188.122.224.102] has joined #shogun | 18:17 | |
-!- blackburn [~blackburn@188.122.224.102] has quit [Ping timeout: 255 seconds] | 18:17 | |
-!- blackburn [~blackburn@188.122.224.102] has joined #shogun | 19:19 | |
-!- blackburn1 [~blackburn@188.122.224.102] has quit [Ping timeout: 276 seconds] | 19:20 | |
-!- blackburn [~blackburn@188.122.224.102] has quit [Ping timeout: 240 seconds] | 20:02 | |
-!- blackburn [~blackburn@188.122.224.102] has joined #shogun | 20:03 | |
-!- f-x [~user@117.192.200.14] has joined #shogun | 20:47 | |
-!- cwidmer [~quassel@connect.tuebingen.mpg.de] has joined #shogun | 21:05 | |
@sonney2k | blackburn, back | 21:14 |
---|---|---|
f-x | sonney2k: hey! is the get_feature_vector(float64_t** dst, int32_t* len, int32_t num) in DotFeatures used anywhere? | 21:21 |
f-x | or is it always supposed to be reimplemented in a derived class? | 21:21 |
f-x | i'm asking since DotFeatures is not supposed to be specific to a data type (eg. float64_t) but in the definition dst is explicitly mentioned as float64_t | 21:22 |
@sonney2k | f-x, it is actually using the add operation to create the feature vector | 21:23 |
@sonney2k | and add is using float64_t* | 21:23 |
f-x | ok.. so is it possible to use DotFeatures as a standalone features class? | 21:24 |
f-x | or do you have to use one of the derived classes like SimpleFeatures<T> ? | 21:24 |
CIA-18 | shogun: Sergey Lisitsyn master * r24b8f99 / (4 files in 2 dirs): Added CIsomap - http://bit.ly/jvk3pQ | 21:25 |
@sonney2k | f-x, dotfeatures itself needs some derived feature class (in which the add etc operations are defined) | 21:25 |
CIA-18 | shogun: Soeren Sonnenburg master * rd761ec2 / examples/undocumented/ruby_modular/classifier_libsvm_minimal_modular.rb : Merge branch 'master' of git://github.com/serialhex/shogun - http://bit.ly/lX40eY | 21:27 |
f-x | hmm.. add_to_dense_vec hasn't been defined in DotFeatures | 21:27 |
@sonney2k | f-x, note that get_feature_vector of dot features cannot be fast as it uses add to compute the vector | 21:28 |
@sonney2k | f-x, exactly | 21:28 |
f-x | sonney2k: so is the get_feature_vector() of dotfeatures used anywhere? i see it has been redefined in simplefeatures | 21:30 |
@sonney2k | f-x, I think every sensible feature class redefines it for speed reasons | 21:30 |
f-x | yeah, it seems so | 21:31 |
f-x | and btw i mailed you about a possible inheritance method | 21:31 |
f-x | but for that StreamingFeatures will have to be templated | 21:31 |
f-x | and i guess some derivations in DotFeatures and SimpleFeatures etc will have to be made virtual | 21:32 |
@sonney2k | f-x, I've seen your mail but didn't have time yet to read it in detail will do this evening | 21:34 |
f-x | ok sure | 21:34 |
f-x | just wanted to know if there were any glaring problems with that | 21:34 |
f-x | i'm trying to change stuff according to that right now, will see if things work out | 21:35 |
-!- cwidmer [~quassel@connect.tuebingen.mpg.de] has quit [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.] | 21:38 | |
blackburn | sonney2k: here | 21:39 |
blackburn | sonney2k: Chris will sent you my 'objectives' for mid-term | 21:39 |
blackburn | we just had had a discuss about goals and so on | 21:39 |
@sonney2k | blackburn, very good | 21:40 |
blackburn | sonney2k: about my issues with distances - I think I have solved it | 21:40 |
@sonney2k | blackburn, even better :D | 21:40 |
blackburn | sonney2k: in the version you've pushed isomap is duplicated to classicmds | 21:41 |
blackburn | sonney2k: need your opinion | 21:41 |
@sonney2k | f-x, btw did you finish the getline thing I mean using some buffer? | 21:41 |
f-x | sonney2k: that problem isn't sorted yet | 21:41 |
f-x | about the file pointer not being immediately after the \n | 21:41 |
f-x | after every call to getline | 21:41 |
f-x | fgets() seems a more direct route now | 21:42 |
f-x | i haven't done it yet | 21:42 |
blackburn | I have to use geodesic distance (just shortest paths between all objects) and now it is just computed inside Isomap routines - shall I create that specific distance? | 21:42 |
f-x | so is it okay that the file pointer is at some unexpected position after a getline? | 21:43 |
@sonney2k | f-x, if you make it a function of ascii file - sure | 21:44 |
@sonney2k | I mean ascii file is line based and so it will only ever call getline repeatedly | 21:44 |
f-x | okay, i think i'll just call it something other than 'getline' as that would confuse users | 21:45 |
@sonney2k | ok | 21:45 |
@sonney2k | blackburn, wow! just got chris' email | 21:45 |
@sonney2k | blackburn, lots of things you intend to finish | 21:45 |
@sonney2k | blackburn, when is your last exam? | 21:45 |
blackburn | 21 | 21:46 |
blackburn | sonney2k: why lots? :) | 21:46 |
@bettyboo | he | 21:46 |
blackburn | LLE, classic MDS and Isomap is ready | 21:46 |
blackburn | only LMDS to be implemented | 21:46 |
@sonney2k | blackburn, great - that almost leaves you with 1 month after your exams to recover and code | 21:46 |
blackburn | sonney2k: yeap, right after my exams serious work would be started :D | 21:47 |
blackburn | sonney2k: is it ok to delete and replace features in apply_to_feature_matrix? | 21:50 |
@sonney2k | blackburn, you can totally modify feature matrix (and reduce its size) - but tha't sit | 22:02 |
blackburn | sonney2k: well I have two ways: | 22:03 |
blackburn | method apply_to_distance returning CSimpleFeatures* | 22:03 |
blackburn | or method apply_to_distance using some args to return matrix | 22:04 |
blackburn | the only problem: is it possible to 'typemap' this arg list: (CDistance, float64_t*, int32_t, int32_t)? | 22:04 |
blackburn | I mean in python it should be: preprocessor.apply_to_distance(dist) and returning matrix or features | 22:05 |
blackburn | ah I think I have an idea | 22:06 |
blackburn | CSimpleFeatures<float64_t>* apply_to_distance(CDistance* distance); for modular | 22:06 |
blackburn | and | 22:06 |
blackburn | void apply_to_distance(CDistance* distance, float64_t* dst, int32_t rows, int32_t cols); for internals | 22:07 |
@sonney2k | blackburn, what is wrong with SGMatrix? | 22:07 |
@sonney2k | f-x, I read your email now | 22:07 |
blackburn | sonney2k: you mean why I don't use SGMatrix? | 22:07 |
@sonney2k | blackburn, instead of float64_t* dst, int32_t rows, int32_t cols | 22:08 |
blackburn | sonney2k: it is not a solution anyway but yes, I'll refactor to it SGMatrix a bit later | 22:08 |
f-x | sonney2k: any suggestions? | 22:08 |
@sonney2k | blackburn, why not? then it is typemapped | 22:09 |
blackburn | seems trying to explain problem helps to understand ways to solve :) | 22:09 |
blackburn | okay | 22:09 |
blackburn | thank | 22:09 |
blackburn | s | 22:09 |
@sonney2k | f-x, I really would like to avoid multiple inheritance | 22:09 |
f-x | sonney2k: that would be ideal... but how else do we make streaming simplefeatures compatible with dotfeatures and simplefeatures? | 22:11 |
@sonney2k | f-x, I don't mind streaming features to become a templated class - however I would split it up into say StreamingStringFeatures, StreamingSimpleFeatures, StreamingSparseFeatures | 22:11 |
f-x | yes, that has to be done | 22:11 |
f-x | StreamingSimpleFeatures would be further templated | 22:12 |
@sonney2k | (and each of them tempalted) | 22:12 |
@sonney2k | yes | 22:12 |
f-x | so you mean to say StreamingFeatures<T> is okay? | 22:12 |
@sonney2k | no | 22:12 |
@sonney2k | StreamingSimpleFeatures<T> | 22:12 |
@sonney2k | is ok | 22:12 |
f-x | StreamingSimpleFeatures<T>: public StreamingFeatures | 22:13 |
@sonney2k | I am not sure what StreamingFeatures<T> should be | 22:13 |
f-x | like this? | 22:13 |
f-x | the template in StreamingFeatures is for the buffer to know what kind of data it is storing | 22:13 |
f-x | while allocating memory, i've hardcoded new float64_t[size] into the buffer/parser code | 22:13 |
f-x | i was thinking that should be replaced with new T[size] to make it more general | 22:14 |
@sonney2k | f-x, but you cannot derive from general templated classes | 22:14 |
blackburn | sonney2k: is it ok to you to use RowMajor (not ColumnMajor) in algos? | 22:14 |
@sonney2k | blackburn, why would one? | 22:14 |
f-x | sonney2k: hmm... i thought you just had to put the method definitions in the .h file and it was fine | 22:14 |
blackburn | sonney2k: don't know, it is implemented this way now in LLE and MDS | 22:15 |
f-x | can you give me an example of something which you think won't work? | 22:15 |
f-x | so that i can test it | 22:15 |
blackburn | I could change it if it is not ok :) | 22:15 |
@sonney2k | f-x, StreamingSimpleFeatures<T>: public StreamingFeatures<T> | 22:15 |
f-x | sonney2k: so you're sure that won't work? | 22:16 |
@sonney2k | blackburn, I mean all matrices and everything in shogun are column major (like lapack likes it) | 22:16 |
@sonney2k | it makes only sense to deviate from that if it is faster in your setup | 22:16 |
f-x | right now the basic 'example' class is defined with float64_t* fv | 22:16 |
f-x | and i'd really like it if it was more generalized | 22:17 |
blackburn | sonney2k: okay, I will change it | 22:17 |
blackburn | most of matrices anyway are symmetrical | 22:17 |
f-x | so the templating begins all the way from example<T>, buffer<T>, parser<T>, streamingfeatures<T> ... | 22:17 |
@sonney2k | blackburn, if they are symmetric it doesn't really matter right :) | 22:18 |
@sonney2k | f-x, well you have this thing http://womble.decadent.org.uk/c++/template-faq.html#base-lookup | 22:18 |
f-x | yes, i'd have to use all those this->member | 22:18 |
f-x | or "using x::y" | 22:18 |
f-x | declarations | 22:18 |
blackburn | sonney2k: exactly but I have to change some things for non-symmetrical. will do it a bit later | 22:19 |
f-x | so it is possible, but it may be messy | 22:20 |
@sonney2k | messy == avoid | 22:21 |
blackburn | sonney2k: reported to ML | 22:22 |
f-x | but the parser/buffer are not linked with any features class.. how do they know what kind of features they're expected to read/store? | 22:23 |
@sonney2k | f-x, there is one benefit when deriving from CFeatures - it could use the preprocessors | 22:23 |
f-x | sonney2k: yes.. but it will derive from CFeatures when it derives from say CDotFeatures | 22:24 |
@sonney2k | f-x, you can defintely construct a parser<T> as member variable | 22:24 |
@sonney2k | so just using CStremingSipleFeatures<T> works just fine | 22:25 |
@sonney2k | f-x, I am not sure that deriving from e.g. SimpleFeatures etc is a good idea | 22:26 |
@sonney2k | I mean these feature objects assume that the number of examples is known | 22:26 |
@sonney2k | and so what would happen if you apply them to an algorithM/ | 22:27 |
@sonney2k | ? | 22:27 |
f-x | agreed.. it's probably easier to write the operations from scratch | 22:27 |
f-x | the algorithm would have to be changed a lot to consider the 'streaming' modifications | 22:27 |
f-x | (if we apply SimpleFeatures to the algorithm) | 22:27 |
@sonney2k | I would derive all the templated CStreaming*Features<T> from CFeatures | 22:27 |
f-x | and write the functions separately? | 22:28 |
@sonney2k | underneath use your templated buffers/parsers (here we should really check that we don't dupllicate too much with what is available in the CFile thing - I would rather add code to that then having a new serparete interface) | 22:28 |
f-x | i was thinking of somehow integrating with SimpleFeatures, StringFeatures etc and using the functions already present, but that's more problematic | 22:28 |
@sonney2k | however you will then have to convert all the online methods to use that interface | 22:29 |
f-x | yes, the methods based on say CDotFeatures won't work | 22:30 |
f-x | if we derive directly from CFeatures | 22:30 |
@sonney2k | and I still don't see how one could do better then CStreamingDotFeatures and define all the op's in there again... | 22:30 |
f-x | the amount of work is probably the same even if we derive it from DotFeatures... anyhow most of the functions have to be redefined | 22:31 |
f-x | the only advantage was that the present algorithms would work (or at least accept as input parameters) with these features | 22:31 |
@sonney2k | f-x, the problem I see is that e.g. LibSVM can work with dotfeatures | 22:31 |
@sonney2k | but not with any kind of stremaing features | 22:31 |
@sonney2k | so most of the algorithms won't work... | 22:32 |
f-x | unless we redefine them to work with streaming features | 22:33 |
@sonney2k | the only online methods we have in shogun are perceptron, sgd, liblinear, larank | 22:33 |
@sonney2k | so modifying these to work with either CDotFeatures / CStreamingDotFeatures sounds much easier too me | 22:33 |
@sonney2k | f-x, but for libsvm we don't even know any efficient online algorithm | 22:34 |
f-x | so LibSVM isn't supposed to work with streaming features at all? | 22:34 |
f-x | okay.. good | 22:34 |
f-x | :D | 22:34 |
@sonney2k | f-x, I guess someone would first have to write a paper about it how to do it properly | 22:34 |
@sonney2k | I am afraid though that this is not so easy | 22:35 |
@sonney2k | kernel methods are not really suited for online learning | 22:35 |
f-x | so for the time being our targets are perceptron, sgd, liblinear, larank and vw | 22:35 |
@sonney2k | these are all linear methods (ok larank might be able to use kernels) | 22:36 |
f-x | i'm hoping the structure we're adopting now will be compatible with vw | 22:36 |
@sonney2k | why not? | 22:37 |
@sonney2k | in the end fw needs some vectorial input | 22:37 |
f-x | i guess most methods just follow the input()->train()->predict() sequence | 22:38 |
f-x | you're right.. but i'll need john to help me out on exactly what should go into vw's "train" and "predict" functions | 22:38 |
f-x | anyway, that comes a bit later | 22:38 |
f-x | sonney2k: could we just finalize the inheritance? (just so i have some concrete record of it) | 22:39 |
@sonney2k | to avoid any misunderstanding - could you please summarize what you would do now | 22:39 |
@sonney2k | exactly :D | 22:39 |
f-x | :) | 22:39 |
f-x | read my mind | 22:40 |
f-x | err.. StreamingFeatures derives from CFeatures first | 22:40 |
f-x | correct? | 22:40 |
@sonney2k | yes | 22:40 |
@sonney2k | well but only if you need streamingfeatures | 22:41 |
@sonney2k | you could aswell directly start with CStreamingStringFeatures<T> : public CFeatures | 22:41 |
f-x | exactly what i was thinking.. | 22:41 |
f-x | plus it's difficult to have a StreamingFeatures without a type specified | 22:42 |
@sonney2k | I would introduce a new feature property STREAMING | 22:42 |
@sonney2k | currently we only have FP_DOT as property | 22:43 |
f-x | hmm.. right | 22:43 |
@sonney2k | f-x, well it could be used to limit your algorithm to streamingfeatures | 22:43 |
f-x | make my own property? | 22:43 |
f-x | okay.. STREAMING | 22:44 |
@sonney2k | but on the other hand you could as well require CFeatures and check that the streaming property is set | 22:44 |
@sonney2k | f-x, libshogun/features/FeatureTypes.h add it to EFeatureProperty | 22:44 |
@sonney2k | then call set_property(FP_DOT); in constructor | 22:45 |
@sonney2k | that's it | 22:45 |
f-x | cool :) | 22:45 |
f-x | so now i'm allowed to change the member functions and stuff right? | 22:45 |
f-x | (not expecting to maintain compatibility with the current SimpleFeatures<T> since the algorithm must be changed too) | 22:45 |
@sonney2k | yeah whatever streaming features fits most | 22:46 |
f-x | good, since it was a knotty problem with all that loopy inheritance | 22:46 |
@sonney2k | f-x, if we really want SimpleFeatues<T> to use StreamingFeatures one could add a function to CSImpleFeatures<T>::obtain_from_streaming(CStreamingSimpleFeatures<T>* f) | 22:46 |
@sonney2k | this would then load all the features into memory and done | 22:47 |
@sonney2k | then all algo's would work again | 22:47 |
f-x | good idea | 22:47 |
@sonney2k | actaully a good way to test things | 22:47 |
f-x | so all online->batch would be possible | 22:47 |
@sonney2k | should give same results :D | 22:47 |
f-x | and a reverse batch->online wouldn't be difficult | 22:47 |
@sonney2k | yeah | 22:47 |
f-x | looks interesting :) | 22:48 |
@sonney2k | f-x, true even that | 22:48 |
@sonney2k | so the only thing I am not too sure of now is whether CStreamingDotFeatures should derive from CFeatures or not | 22:49 |
@sonney2k | I mean should CStreamingSimpleFeatures derive from CStreamingDotFeatures? | 22:50 |
@sonney2k | (that is how it is done currently for non-streaming features0 | 22:50 |
@sonney2k | ) | 22:50 |
f-x | It wouldn't be a problem, i think | 22:50 |
@sonney2k | blackburn, thanks - looks good | 22:51 |
f-x | i guess if we're using more features that use the dot operation, it might help to have a CStreamingDotFeatures class | 22:51 |
f-x | as long as the inheritance is linear, things look fine | 22:52 |
@sonney2k | f-x, I mean I would directly copy the interface of CDotFeatures | 22:52 |
@sonney2k | because most of the online methods are linear ones and do benefit from dotfeatures quite a bit | 22:52 |
f-x | copy into CStreamingDotFeatures? | 22:52 |
@sonney2k | yeah - if necessary adjust API | 22:53 |
@sonney2k | and then when you modify e.g. SVMSGD you don't have much to do | 22:53 |
@sonney2k | because the same functions are there | 22:53 |
f-x | SVMSGD, online version, would be a different function though, right? | 22:55 |
f-x | with most of the code same | 22:55 |
f-x | so the caller uses something like SVMSGD_Online() whatever for the online version and SVMSGD() for the batch version | 22:55 |
@sonney2k | f-x, I would modify SVMSGD to accept either CDotFeatures or CStreamingDotFeatures | 22:57 |
f-x | okay, overload? | 22:57 |
@sonney2k | I am not sure yet - templated or overloaded - definitely no if else mess | 22:58 |
@sonney2k | I guess overloaded | 22:59 |
f-x | overloaded is safer, i agree | 22:59 |
@sonney2k | and encapsulate the functions that do add_dense_vec etc | 22:59 |
f-x | in CStreamingSimpleFeatures | 23:01 |
@sonney2k | I mean if you use templates then you could plug in CDotFeatures or CStreamingDotFeatures | 23:01 |
@sonney2k | and the rest stays the same | 23:01 |
@sonney2k | if not you will have to encapsulate all the functions that use the DotFeatures | 23:02 |
f-x | i'm not sure if i'm totally clear with this.. even if i use templates, i'd still have to modify the train() method | 23:04 |
f-x | basically train() should train using one example, and repeated calls to train() would be required to train for the whole dataset | 23:05 |
f-x | or a new function "train_one_example()" could be made and train() itself makes repeated calls to train_one_example() until all examples are used up | 23:05 |
@sonney2k | f-x, true | 23:09 |
@sonney2k | even the DotFeatures require an example inde | 23:10 |
@sonney2k | x | 23:10 |
@sonney2k | so the online algorithm is very different | 23:10 |
f-x | the index should be removed, or should always be set to zero.. in effect, functioning as a dummy | 23:11 |
@sonney2k | I mean you need to call the get_next_vector things only | 23:11 |
@sonney2k | (in the online variant of sgd) | 23:12 |
@sonney2k | so StreamingDotFeatures would not have any index | 23:12 |
f-x | so basically this means modfiying the algorithm will not be a search-and-replace exercise :/ | 23:13 |
f-x | but still it looks pretty doable | 23:14 |
f-x | i guess this should be good enough for me to implement the SGD at least :) | 23:17 |
f-x | thanks a lot, sonney2k.. things seem a lot clearer now | 23:17 |
@sonney2k | f-x, so what do you think is reasonable for midterm? | 23:17 |
f-x | what do you have in mind, sonney2k? :) | 23:18 |
@sonney2k | I mean the long term plan for gsoc would probably be all the streaming feature types and converting all the algorithms | 23:18 |
f-x | + vw | 23:18 |
@sonney2k | f-x, I am not sure what John would want you to do on the vw side | 23:18 |
@sonney2k | yeah | 23:18 |
f-x | i mean vw for shogun | 23:18 |
@sonney2k | yes | 23:19 |
f-x | what's the most important feature type according to you? | 23:19 |
f-x | SimpleFeatures? | 23:19 |
@sonney2k | I mean you have 2.5 months full time - so it is possible I think | 23:19 |
f-x | it's possible, i agree.. unless some big structural change comes up later and everything requires a whole rewrite... that's all i'm worried about | 23:20 |
@sonney2k | I would say that with ascii based files SimpleFeatures and SparseFeatures are really easy to do | 23:20 |
@sonney2k | f-x, I don't see that coming though | 23:20 |
@sonney2k | (famous last words...) | 23:20 |
f-x | :D | 23:20 |
f-x | it's a relief, nonetheless | 23:20 |
@sonney2k | So I think you should aim for getting simple and sparse streaming features to work together with dotfeatures | 23:21 |
@sonney2k | and then SGD or so on top of it | 23:21 |
@sonney2k | this month | 23:22 |
f-x | sounds pretty good.. | 23:22 |
@sonney2k | then I would rather change more of the online algorithms (say liblinear) | 23:22 |
f-x | i haven't seen much of the sparse features code, but shouldn't be that much of a problem, not that the structure is resolved | 23:22 |
f-x | *now that the | 23:22 |
@sonney2k | f-x, I mean in the end look at DataType.h | 23:23 |
@sonney2k | all you need to do is return a SGSparseVector<T> | 23:23 |
@sonney2k | it is basically index, value | 23:23 |
@sonney2k | and all the dotfeatures code can be generalized from what you find in simplefeatures / sparsefeatures | 23:24 |
@sonney2k | I would create some static functions in there that you could use in the streaming* variants | 23:24 |
@sonney2k | so it should be very doable in 1-2 weeks for some simple ascii format | 23:24 |
f-x | true, true | 23:25 |
@sonney2k | f-x, the problem I see with all this i/o business is that shogun already has the CFile* interface | 23:26 |
@sonney2k | and it would be best if you could utilize this somehow | 23:26 |
f-x | or incorporate any new functions i make into CFile | 23:26 |
@sonney2k | or derived files | 23:26 |
@sonney2k | yes | 23:26 |
f-x | what exactly do you want me to use in CFile? | 23:27 |
f-x | CStreamingFile derives from it | 23:27 |
f-x | and the built-in functions read the whole file | 23:27 |
@sonney2k | f-x, just the API - it is perfectly fine to have StreamingFile | 23:29 |
@sonney2k | and incrementally return the next vector when one calls get_float32_vector etc | 23:29 |
f-x | yeah.. that makes sense | 23:30 |
f-x | right now i use get_real_vector in parser.h which does the same i think | 23:30 |
@sonney2k | f-x, I guess you should rename this then to CStreamingAsciiFile and implemnet the API functions | 23:31 |
@sonney2k | and use these functions in your StreamingFeature classes | 23:31 |
f-x | will do. since 'StreamingFile' is actually a modification of AsciiFile. | 23:32 |
@sonney2k | I guess the buffering is also only necessary for asciffiles? | 23:32 |
f-x | wouldn't buffering be beneficial for binary files too? | 23:33 |
f-x | i mean the parsing step would be simplified, but the example objects would be read and stored in the buffer in a similar fashion | 23:33 |
@sonney2k | f-x,I am loosing track of all the buffers | 23:34 |
f-x | let's call this one the "ring" :) | 23:34 |
@sonney2k | I mean for ascii we have an input buffer that just buffers what is in the raw file right? | 23:34 |
f-x | call it the I/O buffer | 23:34 |
@sonney2k | and then we have a buffer for the examples that stores the parsed data right? | 23:35 |
f-x | called a 'ring' | 23:35 |
f-x | yes | 23:35 |
@sonney2k | heh | 23:35 |
@sonney2k | so the ring is always necessary agreed | 23:35 |
@sonney2k | the i/o buffer too? | 23:35 |
f-x | the I/O buffer isn't implemented yet for AsciiFile | 23:35 |
@sonney2k | I don't know | 23:35 |
f-x | no I don't think the IO buffer would be necessary | 23:35 |
@sonney2k | f-x, yeah but it is the getline ting | 23:35 |
f-x | since we can directly red one example at a time | 23:35 |
f-x | yes, the getline thing is a simple substitute | 23:36 |
f-x | it would be much faster if we read as much data as possible into the IO buffer, and then use getline on that | 23:36 |
@sonney2k | f-x, I meant when you do getline with fread (without seeking) | 23:36 |
@sonney2k | exactly | 23:36 |
f-x | so IO buffer must be copied from VW and made compatible | 23:37 |
@sonney2k | for binary it could only be that reading examples of low dimensionality would benefit from a buffer | 23:38 |
@sonney2k | othewise fread is called very often... | 23:38 |
f-x | exactly | 23:38 |
f-x | i need to have a deeper look though, to be able to make a final statement on this | 23:38 |
@sonney2k | not sure if that is really a problem though | 23:39 |
f-x | but most likely the buffer could be needed -- but only if it is speedy enough | 23:39 |
f-x | no, not a major problem | 23:39 |
@sonney2k | I mean I don't have such small examples in practice | 23:39 |
f-x | oh.. ok. good | 23:39 |
f-x | anyway it wouldn't hurt to assume for now that the buffer may be needed for binary too.. | 23:40 |
@sonney2k | so plan would be 1. put the fread based getline in asciifile 2. do the StreamingAsciffFile and use this getline 3. convert StreamingFeatures to be templated CStreamingSimpleFeatures and derived from CFeatures and make it use the StreamingFile API | 23:41 |
@sonney2k | 4. get OnlineSGD to work | 23:42 |
@sonney2k | f-x, does that make sense? | 23:42 |
f-x | sonney2k: it does, it does | 23:42 |
@sonney2k | and for the buffering you borrow code from vw | 23:43 |
f-x | exactly what i was about to ask | 23:43 |
@sonney2k | it would be great to have a CVowpalWabbit soon in shogun though | 23:43 |
@sonney2k | I gues | 23:43 |
f-x | yeah.. that name has a nice ring to it :) | 23:43 |
@sonney2k | I don't know if it is realistic to for midterm | 23:43 |
@sonney2k | I mean you would totally have to port the algorithm to use shogun's dotfeatures | 23:44 |
f-x | CVowpalWabbit? i guess a lot of discussion would be required to port that properly | 23:44 |
@sonney2k | so that is sth you can only do with john giving you lots of hints | 23:44 |
f-x | definitely | 23:44 |
blackburn | hey what VW does? :D | 23:45 |
f-x | there are tons of optimizations in vw that may be difficult to fit into our structure | 23:45 |
@sonney2k | f-x, in the end VW implements SGD with john specific tricks I guess | 23:45 |
blackburn | classifying? | 23:45 |
f-x | prediction | 23:45 |
@sonney2k | blackburn, online SVm - like SGD | 23:45 |
blackburn | ah I see | 23:45 |
blackburn | I just dimreduction redneck so don't know much about these svms :D | 23:46 |
f-x | blackburn: redneck? by now you're an old veteran! | 23:47 |
blackburn | hehe | 23:47 |
@bettyboo | :) | 23:47 |
blackburn | I mean I'm dummy in SVMs :) | 23:47 |
@sonney2k | f-x, I would suggest you try to do 1-3 next week | 23:48 |
@sonney2k | please please small patches whenevery you have sth finished | 23:48 |
@sonney2k | otherwise I won't be able to review these easily | 23:48 |
f-x | how big is 'something' supposed to be? | 23:48 |
@sonney2k | (bug chunks are hard to digest...) | 23:48 |
f-x | "bug chunks" :P | 23:48 |
@sonney2k | big :D | 23:49 |
@sonney2k | f-x, I mean if you do fread based getline in asciifile -> pull request | 23:49 |
@sonney2k | if you write StremaingAsciiFIle -> pull request | 23:49 |
@sonney2k | and so on | 23:50 |
@sonney2k | small logical units | 23:50 |
f-x | okay.. got it | 23:50 |
blackburn | I've invented a game: we pull code with bugs and Soeren is guessing where it is | 23:50 |
@sonney2k | blackburn, in the end I can only loose :`-( | 23:50 |
f-x | blackburn: yeah, it's 5 vs. 1 | 23:51 |
@sonney2k | f-x, so lets discuss again end of next week - and I hope to see plenty fo pull requests | 23:51 |
blackburn | sonney2k: but you are 80 lvl coder :D you have chances | 23:51 |
@sonney2k | if you have questions in the meantime ask per mail or chat | 23:51 |
f-x | sonney2k: sure.. a pull request soon. | 23:52 |
f-x | i guess i'll go to bed now.. thanks for the insight, sonney2k | 23:52 |
@sonney2k | f-x, thanks for this nice discussion and I hope you will have some fun doing these things | 23:52 |
* sonney2k wonders why f-x always writes similar sentences :D | 23:53 | |
f-x | sonney2k: i'm sure i will :) now things seem much simpler | 23:53 |
f-x | :D | 23:53 |
f-x | bbye, sonney2k and blackburn | 23:53 |
blackburn | f-x: are you robot? what is an integral of cot(x)? :D | 23:54 |
blackburn | bye :) | 23:54 |
f-x | blackburn: don't give me bad dreams.. i'm trying to go to bed now :) | 23:54 |
@bettyboo | ;) | 23:54 |
@sonney2k | :) | 23:54 |
@bettyboo | ^_^ | 23:54 |
-!- f-x [~user@117.192.200.14] has quit [Quit: ERC Version 5.3 (IRC client for Emacs)] | 23:55 | |
blackburn | sonney2k: how are your family? | 23:56 |
* sonney2k feels exhausted | 23:56 | |
@sonney2k | everyone asleep | 23:56 |
@sonney2k | healthy and growing ... | 23:56 |
blackburn | nice :) | 23:56 |
@sonney2k | went swimming / to the zoo | 23:56 |
@sonney2k | was a lot of fun I can tell | 23:56 |
blackburn | I have seen you live near lake? | 23:57 |
@sonney2k | my daughter was pretty excited about a pelican trying to chase him... | 23:57 |
blackburn | hehe | 23:57 |
@sonney2k | (these are running around freely in the zoo 'tierpark') | 23:58 |
blackburn | you are happy: have nice places here :) | 23:58 |
blackburn | http://maps.yandex.ru/-/CBQOqN98 - place where I live :D | 23:59 |
--- Log closed Sun Jun 12 00:00:16 2011 |
Generated by irclog2html.py 2.10.0 by Marius Gedminas - find it at mg.pov.lt!