IRC logs of #shogun for Saturday, 2014-12-06

--- Log opened Sat Dec 06 00:00:39 2014
-!- rujmeister [3200beca@gateway/web/freenode/ip.50.0.190.202] has joined #shogun00:23
rujmeisterhello. what path should I add to environment variables so that Windows can find the cmake file? I'm trying to install shogun with the python interface. Here are the instructions from the install file: $ mkdir build && cd build   $ cmake -DPythonModular=ON ..00:26
-!- Floatingman [~Floatingm@c-68-52-34-232.hsd1.tn.comcast.net] has quit [Read error: Connection reset by peer]01:37
-!- Floatingman [~Floatingm@c-68-52-34-232.hsd1.tn.comcast.net] has joined #shogun01:37
-!- sanuj [~sanuj@117.196.246.167] has joined #shogun03:45
-!- sanuj [~sanuj@117.196.246.167] has quit [Client Quit]03:46
-!- rajul [~rajul@59.89.129.177] has joined #shogun06:51
-!- vin-ivar [~vinit@122.170.93.140] has joined #shogun07:34
-!- vin-ivar [~vinit@122.170.93.140] has left #shogun []07:35
-!- rujmeister [3200beca@gateway/web/freenode/ip.50.0.190.202] has quit [Ping timeout: 246 seconds]07:45
-!- sanuj [~sanuj@117.220.60.238] has joined #shogun07:48
-!- sanuj [~sanuj@117.220.60.238] has quit [Ping timeout: 272 seconds]07:55
-!- sanuj [~sanuj@117.220.56.119] has joined #shogun08:08
-!- rajul [~rajul@59.89.129.177] has quit [Ping timeout: 244 seconds]09:24
-!- rajul [~rajul@59.89.128.245] has joined #shogun09:33
-!- sanuj [~sanuj@117.220.56.119] has quit [Ping timeout: 244 seconds]09:47
-!- sanuj [~sanuj@117.220.56.119] has joined #shogun10:24
-!- sanuj [~sanuj@117.220.56.119] has quit [Ping timeout: 256 seconds]10:59
-!- rajul [~rajul@59.89.128.245] has quit [Ping timeout: 264 seconds]11:31
-!- besser82 [~besser82@2a02:8108:8840:1e00:f2de:f1ff:fe89:42d4] has joined #shogun11:39
-!- besser82 [~besser82@2a02:8108:8840:1e00:f2de:f1ff:fe89:42d4] has quit [Changing host]11:39
-!- besser82 [~besser82@fedora/besser82] has joined #shogun11:39
-!- mode/#shogun [+o besser82] by ChanServ11:39
-!- rajul [~rajul@117.199.159.14] has joined #shogun11:44
-!- besser82 [~besser82@fedora/besser82] has quit [Ping timeout: 260 seconds]12:00
-!- sanuj [~sanuj@117.220.56.119] has joined #shogun13:21
-!- sanuj [~sanuj@117.220.56.119] has quit [Ping timeout: 256 seconds]14:21
-!- sanuj [~sanuj@117.196.238.254] has joined #shogun14:34
-!- sanuj [~sanuj@117.196.238.254] has quit [Ping timeout: 255 seconds]15:41
-!- sanuj [~sanuj@117.196.248.88] has joined #shogun15:54
-!- sanuj [~sanuj@117.196.248.88] has quit [Ping timeout: 245 seconds]16:57
-!- sanuj [~sanuj@117.196.244.238] has joined #shogun17:09
-!- pickle27 [~pickle27@192-0-136-118.cpe.teksavvy.com] has joined #shogun17:21
-!- sanuj [~sanuj@117.196.244.238] has quit [Ping timeout: 265 seconds]17:27
-!- sanuj [~sanuj@117.196.237.221] has joined #shogun17:39
-!- vin-ivar [~vinit@122.170.93.140] has joined #shogun17:45
-!- vin-ivar [~vinit@122.170.93.140] has quit [Quit: WeeChat 1.0.1]17:59
-!- vin-ivar [~vinit@122.170.93.140] has joined #shogun18:00
-!- vin-ivar [~vinit@122.170.93.140] has quit [Client Quit]18:04
-!- vin-ivar [~vinit@122.170.93.140] has joined #shogun18:04
-!- sanuj [~sanuj@117.196.237.221] has quit [Quit: Leaving]18:21
-!- vin-ivar [~vinit@122.170.93.140] has left #shogun ["WeeChat 1.0.1"]18:57
-!- rajul [~rajul@117.199.159.14] has quit [Ping timeout: 264 seconds]19:05
-!- kcm_ [~kcm@122.177.143.226] has quit [Ping timeout: 258 seconds]19:13
-!- Floatingman [~Floatingm@c-68-52-34-232.hsd1.tn.comcast.net] has quit [Remote host closed the connection]20:06
-!- vin-ivar [~vinit@122.170.93.140] has joined #shogun20:09
-!- Floatingman [~Floatingm@c-68-52-34-232.hsd1.tn.comcast.net] has joined #shogun20:10
-!- vin-ivar [~vinit@122.170.93.140] has quit [Ping timeout: 265 seconds]20:33
-!- vin-ivar [~vinit@122.170.93.140] has joined #shogun20:43
-!- vin-ivar [~vinit@122.170.93.140] has quit [Quit: WeeChat 1.0.1]21:02
-!- vin-ivar [~vinit@122.170.93.140] has joined #shogun21:05
-!- vin-ivar [~vinit@122.170.93.140] has left #shogun ["WeeChat 1.0.1"]21:11
-!- Larmona [6bbc181a@gateway/web/freenode/ip.107.188.24.26] has joined #shogun21:33
LarmonaHello is any here? I'm sorry to bother but I had a question on the CommStringKernel, in particular the significance of the gap and start parameter... there seems to be no documentation on the website for what these do, and i'm having a hard time infering the significance from directly looking at the C++ code21:35
@lisitsynLarmona: hey22:03
@lisitsynmay be I can try to help you22:03
Larmonayay!22:06
Larmonado you follow my question or should I explain more?22:06
@lisitsynLarmona: I'd have to look up the code22:07
Larmonahttp://www.shogun-toolbox.org/doc/en/3.0.0/Alphabet_8cpp_source.html22:07
Larmonalisitsyn: its at line 83622:07
@lisitsynLarmona: wow that's really legacy code :D22:09
Larmonaoh is it old? I'm using shogun 3.2, but the documentation on the website pointed me to that. is there maybe another source with more up-to-date?22:10
@lisitsynLarmona: no actually no22:10
@lisitsynI mean it is written like years years before ;)22:10
@lisitsynokay so actually from what I can see22:10
@lisitsynstart is here just to shift that obs array22:10
@lisitsynbut that's pretty straightforward22:11
@lisitsynas for gap I am looking at it22:11
Larmonahmmm as in to start from the ith index of the sequence where start = i? that's odd, because in all the example code for CommString it initializes start to be order - 1, but why would that be the default? it seems odd to me to ignore the first (order-1) members of the sequence22:13
@lisitsynI have to be honest I have no idea what this code is about :D let me check the example22:14
@lisitsynI guess you're working with some bio sequences22:14
LarmonaI'm working with text documents and trying to classify them using SSK22:15
@lisitsynah22:15
@lisitsyngood to know someone is not using neural nets still :D22:16
Larmonahaha22:16
@lisitsynLarmona: could you please guide me to the example you are referring?22:17
@lisitsynis it in cpp examples? can't find it yet22:17
Larmonak give me a sec i'll get it22:17
Larmonaso I'm currently using R modular and this is the example i've been referencing22:19
Larmonahttp://www.shogun-toolbox.org/doc/en/3.0.0/r_modular_examples.html22:19
Larmona../examples/documented/r_modular/kernel_comm_ulong_string_modular.R22:19
Larmona^^^ thats the title22:19
@lisitsynok cool22:19
Larmonathe python and other examples for this kernel look identical22:19
@lisitsynobtain_from_char(...)22:20
@lisitsynLarmona: right?22:20
Larmonaright22:20
Larmonawhich itself calls translate_from_single_order22:21
@lisitsynLarmona: okay just a simple guess I have yet is that such start=order-1 thing makes it consistent with the order22:21
@lisitsynI mean you'd have to get 'order' elements before22:22
@lisitsynbut for first elements it is not possible22:22
@lisitsynjust a guess yet22:22
Larmonalisitsyn: I'm not quite sure I follow. why would you not be able to search for 'order'-grams that begin in the first 'order' elements ?22:23
@lisitsynLarmona: I mean it would be like indices of -1, -2 etc22:25
Larmonaah ok I see22:25
@lisitsynLarmona: but that's just a guess yet22:25
Larmonaso moving start to, say, order + 1, would be if you wanted to ignore first two elements of sequence?22:26
@lisitsynLarmona: yeah probably22:26
Larmonagotcha22:26
@lisitsynLarmona: I am not sure if this is true though :)22:27
Larmonalisitsyn: won't hold you to it22:27
@lisitsynLarmona: as I can see start is always start+gap22:29
@lisitsynso start is just to enable you to use your string from some position22:29
@lisitsynbut it always adds the gap22:29
@lisitsynand then it shifts it back22:29
@lisitsynoh that's really complex should be documented22:30
Larmonahmmm... not quite sure I follow in that what would be the purpose of modifying gap=0? in the context of feature mapping?22:30
@lisitsyn  840     const int32_t start_gap=(p_order-gap)/2;22:32
@lisitsyn  841     const int32_t end_gap=start_gap+gap;22:32
@lisitsynLarmona: ^ according to that thing it skips first gap elements and don't handle last gap elements22:32
Larmonaso sort of like start, but trimming sequence at both ends?22:33
@lisitsynLarmona: looks like22:33
Larmonagotcha22:33
Larmonawell that is very helpful22:33
Larmonainitially i thought/hoped it was maybe a parameter to handle gaps in sequences (i.e. non-contiguous sequences)22:33
@lisitsynLarmona: it looks like it handles all elements without gaps22:34
Larmonayeah22:34
Larmonathere's no kernel that looks for non-contiguous sequences in shogun is there?22:34
@lisitsynLarmona: cant' remember any - you mean you have some positions you want to skip?22:35
Larmonano as in there are published kernels that consider matching non-contiguous subsequences, which then penalizes them according to there length22:36
Larmonaso for example if the subsequence I'm looking for is CAT22:36
Larmonaand the string in question is CART22:36
Larmonait counts as a match by skipping R but them penalizes the similarity measure by some parameter22:37
LarmonaThe idea is its basically a more robust SSK kernel because if there are either misspellings or something like that it could still catch the similarity22:37
@lisitsynahh22:38
@lisitsynlet me check22:39
Larmonathere's actually a package in R that implements it called kernlab... but it performs terrible and crashed my R session everytime22:40
@lisitsynLarmona: I am pretty sure some kernel should do this job, checking22:42
Larmonalisitsyn:thanks22:42
@lisitsynLarmona: oh I think I know a list with proper description22:43
@lisitsynLarmona: it is sonney2k's thesis :D22:44
@lisitsynLarmona: http://sonnenburgs.de/soeren/publications/Son08.pdf page 1522:44
Larmonayes I think you are right I have read quite a few of his papers22:45
Larmonaactually the kernel im referencing is from '03 but the  mismatch one does get at the spirit of it22:47
@lisitsynLarmona: these kernels are like specialized for these AGCT guys22:47
Larmonaah I see22:48
@lisitsynI have no idea whether there is some simpler approach for regular text22:48
@lisitsynLarmona: but I remember some recommendation I was given about WD kernel22:49
@lisitsynso maybe you could give it a try22:49
Larmonayeah WD seemed to touch on the idea because it allows for shifts I think22:50
@lisitsynthought it is going to catch char by char similarity22:50
@lisitsynI mean it won't be like bag of words because if your shift is big you'd get not really informative kernel22:50
@lisitsynin other words sounds like it is very 'positional'22:51
Larmonabut nonetheless this was very helpful! you wouldn't happen to be familiar with any other powerful string kernel libraries out there would you? from my research shogun seemed to be the best by a margin22:51
Larmonayeah positional would not be helpful for my classification problem22:51
@lisitsynLarmona: I guess spectrum kernel would be better for regular text22:53
Larmonahere is the kernel I was referencing fyi22:53
Larmonahttp://www.infoautoclassification.org/public/articles/Lodhi-et.-al._Text-Classification-using-String-Kernels.pdf22:53
@lisitsynas for your question about other libs - I know basically nothing :)22:54
Larmonahaha alrighty22:54
@lisitsynLarmona: I'd suggest you to give spectrum kernel a try22:55
@lisitsynit just counts number of matching n-grams22:55
Larmonayeah thats what I've been using22:56
Larmonawell thank you very much for the help lisitsyn! I really appreciate it. I'm off to grind on this project now. Have a good day!22:56
@lisitsynI see22:56
@lisitsynyou are welcome :)22:56
Larmonaciao22:57
-!- Larmona [6bbc181a@gateway/web/freenode/ip.107.188.24.26] has quit [Quit: Page closed]22:57
-!- rujmeister [3200beca@gateway/web/freenode/ip.50.0.190.202] has joined #shogun23:53
--- Log closed Sun Dec 07 00:00:40 2014

Generated by irclog2html.py 2.10.0 by Marius Gedminas - find it at mg.pov.lt!