--- Log opened Sat Dec 06 00:00:39 2014 | ||
-!- rujmeister [3200beca@gateway/web/freenode/ip.50.0.190.202] has joined #shogun | 00:23 | |
rujmeister | hello. what path should I add to environment variables so that Windows can find the cmake file? I'm trying to install shogun with the python interface. Here are the instructions from the install file: $ mkdir build && cd build $ cmake -DPythonModular=ON .. | 00:26 |
---|---|---|
-!- Floatingman [~Floatingm@c-68-52-34-232.hsd1.tn.comcast.net] has quit [Read error: Connection reset by peer] | 01:37 | |
-!- Floatingman [~Floatingm@c-68-52-34-232.hsd1.tn.comcast.net] has joined #shogun | 01:37 | |
-!- sanuj [~sanuj@117.196.246.167] has joined #shogun | 03:45 | |
-!- sanuj [~sanuj@117.196.246.167] has quit [Client Quit] | 03:46 | |
-!- rajul [~rajul@59.89.129.177] has joined #shogun | 06:51 | |
-!- vin-ivar [~vinit@122.170.93.140] has joined #shogun | 07:34 | |
-!- vin-ivar [~vinit@122.170.93.140] has left #shogun [] | 07:35 | |
-!- rujmeister [3200beca@gateway/web/freenode/ip.50.0.190.202] has quit [Ping timeout: 246 seconds] | 07:45 | |
-!- sanuj [~sanuj@117.220.60.238] has joined #shogun | 07:48 | |
-!- sanuj [~sanuj@117.220.60.238] has quit [Ping timeout: 272 seconds] | 07:55 | |
-!- sanuj [~sanuj@117.220.56.119] has joined #shogun | 08:08 | |
-!- rajul [~rajul@59.89.129.177] has quit [Ping timeout: 244 seconds] | 09:24 | |
-!- rajul [~rajul@59.89.128.245] has joined #shogun | 09:33 | |
-!- sanuj [~sanuj@117.220.56.119] has quit [Ping timeout: 244 seconds] | 09:47 | |
-!- sanuj [~sanuj@117.220.56.119] has joined #shogun | 10:24 | |
-!- sanuj [~sanuj@117.220.56.119] has quit [Ping timeout: 256 seconds] | 10:59 | |
-!- rajul [~rajul@59.89.128.245] has quit [Ping timeout: 264 seconds] | 11:31 | |
-!- besser82 [~besser82@2a02:8108:8840:1e00:f2de:f1ff:fe89:42d4] has joined #shogun | 11:39 | |
-!- besser82 [~besser82@2a02:8108:8840:1e00:f2de:f1ff:fe89:42d4] has quit [Changing host] | 11:39 | |
-!- besser82 [~besser82@fedora/besser82] has joined #shogun | 11:39 | |
-!- mode/#shogun [+o besser82] by ChanServ | 11:39 | |
-!- rajul [~rajul@117.199.159.14] has joined #shogun | 11:44 | |
-!- besser82 [~besser82@fedora/besser82] has quit [Ping timeout: 260 seconds] | 12:00 | |
-!- sanuj [~sanuj@117.220.56.119] has joined #shogun | 13:21 | |
-!- sanuj [~sanuj@117.220.56.119] has quit [Ping timeout: 256 seconds] | 14:21 | |
-!- sanuj [~sanuj@117.196.238.254] has joined #shogun | 14:34 | |
-!- sanuj [~sanuj@117.196.238.254] has quit [Ping timeout: 255 seconds] | 15:41 | |
-!- sanuj [~sanuj@117.196.248.88] has joined #shogun | 15:54 | |
-!- sanuj [~sanuj@117.196.248.88] has quit [Ping timeout: 245 seconds] | 16:57 | |
-!- sanuj [~sanuj@117.196.244.238] has joined #shogun | 17:09 | |
-!- pickle27 [~pickle27@192-0-136-118.cpe.teksavvy.com] has joined #shogun | 17:21 | |
-!- sanuj [~sanuj@117.196.244.238] has quit [Ping timeout: 265 seconds] | 17:27 | |
-!- sanuj [~sanuj@117.196.237.221] has joined #shogun | 17:39 | |
-!- vin-ivar [~vinit@122.170.93.140] has joined #shogun | 17:45 | |
-!- vin-ivar [~vinit@122.170.93.140] has quit [Quit: WeeChat 1.0.1] | 17:59 | |
-!- vin-ivar [~vinit@122.170.93.140] has joined #shogun | 18:00 | |
-!- vin-ivar [~vinit@122.170.93.140] has quit [Client Quit] | 18:04 | |
-!- vin-ivar [~vinit@122.170.93.140] has joined #shogun | 18:04 | |
-!- sanuj [~sanuj@117.196.237.221] has quit [Quit: Leaving] | 18:21 | |
-!- vin-ivar [~vinit@122.170.93.140] has left #shogun ["WeeChat 1.0.1"] | 18:57 | |
-!- rajul [~rajul@117.199.159.14] has quit [Ping timeout: 264 seconds] | 19:05 | |
-!- kcm_ [~kcm@122.177.143.226] has quit [Ping timeout: 258 seconds] | 19:13 | |
-!- Floatingman [~Floatingm@c-68-52-34-232.hsd1.tn.comcast.net] has quit [Remote host closed the connection] | 20:06 | |
-!- vin-ivar [~vinit@122.170.93.140] has joined #shogun | 20:09 | |
-!- Floatingman [~Floatingm@c-68-52-34-232.hsd1.tn.comcast.net] has joined #shogun | 20:10 | |
-!- vin-ivar [~vinit@122.170.93.140] has quit [Ping timeout: 265 seconds] | 20:33 | |
-!- vin-ivar [~vinit@122.170.93.140] has joined #shogun | 20:43 | |
-!- vin-ivar [~vinit@122.170.93.140] has quit [Quit: WeeChat 1.0.1] | 21:02 | |
-!- vin-ivar [~vinit@122.170.93.140] has joined #shogun | 21:05 | |
-!- vin-ivar [~vinit@122.170.93.140] has left #shogun ["WeeChat 1.0.1"] | 21:11 | |
-!- Larmona [6bbc181a@gateway/web/freenode/ip.107.188.24.26] has joined #shogun | 21:33 | |
Larmona | Hello is any here? I'm sorry to bother but I had a question on the CommStringKernel, in particular the significance of the gap and start parameter... there seems to be no documentation on the website for what these do, and i'm having a hard time infering the significance from directly looking at the C++ code | 21:35 |
@lisitsyn | Larmona: hey | 22:03 |
@lisitsyn | may be I can try to help you | 22:03 |
Larmona | yay! | 22:06 |
Larmona | do you follow my question or should I explain more? | 22:06 |
@lisitsyn | Larmona: I'd have to look up the code | 22:07 |
Larmona | http://www.shogun-toolbox.org/doc/en/3.0.0/Alphabet_8cpp_source.html | 22:07 |
Larmona | lisitsyn: its at line 836 | 22:07 |
@lisitsyn | Larmona: wow that's really legacy code :D | 22:09 |
Larmona | oh is it old? I'm using shogun 3.2, but the documentation on the website pointed me to that. is there maybe another source with more up-to-date? | 22:10 |
@lisitsyn | Larmona: no actually no | 22:10 |
@lisitsyn | I mean it is written like years years before ;) | 22:10 |
@lisitsyn | okay so actually from what I can see | 22:10 |
@lisitsyn | start is here just to shift that obs array | 22:10 |
@lisitsyn | but that's pretty straightforward | 22:11 |
@lisitsyn | as for gap I am looking at it | 22:11 |
Larmona | hmmm as in to start from the ith index of the sequence where start = i? that's odd, because in all the example code for CommString it initializes start to be order - 1, but why would that be the default? it seems odd to me to ignore the first (order-1) members of the sequence | 22:13 |
@lisitsyn | I have to be honest I have no idea what this code is about :D let me check the example | 22:14 |
@lisitsyn | I guess you're working with some bio sequences | 22:14 |
Larmona | I'm working with text documents and trying to classify them using SSK | 22:15 |
@lisitsyn | ah | 22:15 |
@lisitsyn | good to know someone is not using neural nets still :D | 22:16 |
Larmona | haha | 22:16 |
@lisitsyn | Larmona: could you please guide me to the example you are referring? | 22:17 |
@lisitsyn | is it in cpp examples? can't find it yet | 22:17 |
Larmona | k give me a sec i'll get it | 22:17 |
Larmona | so I'm currently using R modular and this is the example i've been referencing | 22:19 |
Larmona | http://www.shogun-toolbox.org/doc/en/3.0.0/r_modular_examples.html | 22:19 |
Larmona | ../examples/documented/r_modular/kernel_comm_ulong_string_modular.R | 22:19 |
Larmona | ^^^ thats the title | 22:19 |
@lisitsyn | ok cool | 22:19 |
Larmona | the python and other examples for this kernel look identical | 22:19 |
@lisitsyn | obtain_from_char(...) | 22:20 |
@lisitsyn | Larmona: right? | 22:20 |
Larmona | right | 22:20 |
Larmona | which itself calls translate_from_single_order | 22:21 |
@lisitsyn | Larmona: okay just a simple guess I have yet is that such start=order-1 thing makes it consistent with the order | 22:21 |
@lisitsyn | I mean you'd have to get 'order' elements before | 22:22 |
@lisitsyn | but for first elements it is not possible | 22:22 |
@lisitsyn | just a guess yet | 22:22 |
Larmona | lisitsyn: I'm not quite sure I follow. why would you not be able to search for 'order'-grams that begin in the first 'order' elements ? | 22:23 |
@lisitsyn | Larmona: I mean it would be like indices of -1, -2 etc | 22:25 |
Larmona | ah ok I see | 22:25 |
@lisitsyn | Larmona: but that's just a guess yet | 22:25 |
Larmona | so moving start to, say, order + 1, would be if you wanted to ignore first two elements of sequence? | 22:26 |
@lisitsyn | Larmona: yeah probably | 22:26 |
Larmona | gotcha | 22:26 |
@lisitsyn | Larmona: I am not sure if this is true though :) | 22:27 |
Larmona | lisitsyn: won't hold you to it | 22:27 |
@lisitsyn | Larmona: as I can see start is always start+gap | 22:29 |
@lisitsyn | so start is just to enable you to use your string from some position | 22:29 |
@lisitsyn | but it always adds the gap | 22:29 |
@lisitsyn | and then it shifts it back | 22:29 |
@lisitsyn | oh that's really complex should be documented | 22:30 |
Larmona | hmmm... not quite sure I follow in that what would be the purpose of modifying gap=0? in the context of feature mapping? | 22:30 |
@lisitsyn | 840 const int32_t start_gap=(p_order-gap)/2; | 22:32 |
@lisitsyn | 841 const int32_t end_gap=start_gap+gap; | 22:32 |
@lisitsyn | Larmona: ^ according to that thing it skips first gap elements and don't handle last gap elements | 22:32 |
Larmona | so sort of like start, but trimming sequence at both ends? | 22:33 |
@lisitsyn | Larmona: looks like | 22:33 |
Larmona | gotcha | 22:33 |
Larmona | well that is very helpful | 22:33 |
Larmona | initially i thought/hoped it was maybe a parameter to handle gaps in sequences (i.e. non-contiguous sequences) | 22:33 |
@lisitsyn | Larmona: it looks like it handles all elements without gaps | 22:34 |
Larmona | yeah | 22:34 |
Larmona | there's no kernel that looks for non-contiguous sequences in shogun is there? | 22:34 |
@lisitsyn | Larmona: cant' remember any - you mean you have some positions you want to skip? | 22:35 |
Larmona | no as in there are published kernels that consider matching non-contiguous subsequences, which then penalizes them according to there length | 22:36 |
Larmona | so for example if the subsequence I'm looking for is CAT | 22:36 |
Larmona | and the string in question is CART | 22:36 |
Larmona | it counts as a match by skipping R but them penalizes the similarity measure by some parameter | 22:37 |
Larmona | The idea is its basically a more robust SSK kernel because if there are either misspellings or something like that it could still catch the similarity | 22:37 |
@lisitsyn | ahh | 22:38 |
@lisitsyn | let me check | 22:39 |
Larmona | there's actually a package in R that implements it called kernlab... but it performs terrible and crashed my R session everytime | 22:40 |
@lisitsyn | Larmona: I am pretty sure some kernel should do this job, checking | 22:42 |
Larmona | lisitsyn:thanks | 22:42 |
@lisitsyn | Larmona: oh I think I know a list with proper description | 22:43 |
@lisitsyn | Larmona: it is sonney2k's thesis :D | 22:44 |
@lisitsyn | Larmona: http://sonnenburgs.de/soeren/publications/Son08.pdf page 15 | 22:44 |
Larmona | yes I think you are right I have read quite a few of his papers | 22:45 |
Larmona | actually the kernel im referencing is from '03 but the mismatch one does get at the spirit of it | 22:47 |
@lisitsyn | Larmona: these kernels are like specialized for these AGCT guys | 22:47 |
Larmona | ah I see | 22:48 |
@lisitsyn | I have no idea whether there is some simpler approach for regular text | 22:48 |
@lisitsyn | Larmona: but I remember some recommendation I was given about WD kernel | 22:49 |
@lisitsyn | so maybe you could give it a try | 22:49 |
Larmona | yeah WD seemed to touch on the idea because it allows for shifts I think | 22:50 |
@lisitsyn | thought it is going to catch char by char similarity | 22:50 |
@lisitsyn | I mean it won't be like bag of words because if your shift is big you'd get not really informative kernel | 22:50 |
@lisitsyn | in other words sounds like it is very 'positional' | 22:51 |
Larmona | but nonetheless this was very helpful! you wouldn't happen to be familiar with any other powerful string kernel libraries out there would you? from my research shogun seemed to be the best by a margin | 22:51 |
Larmona | yeah positional would not be helpful for my classification problem | 22:51 |
@lisitsyn | Larmona: I guess spectrum kernel would be better for regular text | 22:53 |
Larmona | here is the kernel I was referencing fyi | 22:53 |
Larmona | http://www.infoautoclassification.org/public/articles/Lodhi-et.-al._Text-Classification-using-String-Kernels.pdf | 22:53 |
@lisitsyn | as for your question about other libs - I know basically nothing :) | 22:54 |
Larmona | haha alrighty | 22:54 |
@lisitsyn | Larmona: I'd suggest you to give spectrum kernel a try | 22:55 |
@lisitsyn | it just counts number of matching n-grams | 22:55 |
Larmona | yeah thats what I've been using | 22:56 |
Larmona | well thank you very much for the help lisitsyn! I really appreciate it. I'm off to grind on this project now. Have a good day! | 22:56 |
@lisitsyn | I see | 22:56 |
@lisitsyn | you are welcome :) | 22:56 |
Larmona | ciao | 22:57 |
-!- Larmona [6bbc181a@gateway/web/freenode/ip.107.188.24.26] has quit [Quit: Page closed] | 22:57 | |
-!- rujmeister [3200beca@gateway/web/freenode/ip.50.0.190.202] has joined #shogun | 23:53 | |
--- Log closed Sun Dec 07 00:00:40 2014 |
Generated by irclog2html.py 2.10.0 by Marius Gedminas - find it at mg.pov.lt!