SHOGUN
4.2.0

Go to the source code of this file.
Classes  
class  CSubsequenceStringKernel 
class SubsequenceStringKernel that implements String Subsequence Kernel (SSK) discussed by Lodhi et. al.[1]. A subsequence is any ordered sequence of \(n\) characters occurring in the text, though not necessarily contiguous. More formally, string \(u\) is a subsequence of string \(s\), iff there exists indices \(\mathbf{i}=(i_{1},\dots,i_{u})\), with \(1\le i_{1} \le \cdots \le i_{u} \le s\), such that \(u_{j}=s_{i_{j}}\) for \(j=1,\dots,u\), written as \(u=s[\mathbf{i}]\). The feature mapping \(\phi\) in this scenario is given by \[ \phi_{u}(s)=\sum_{\mathbf{i}:u=s[\mathbf{i}]}\lambda^{l(\mathbf{i})} \] for some \(lambda\le 1\), where \(l(\mathbf{i})\) is the length of the subsequence in \(s\), given by \(i_{u}i_{1}+1\). The kernel here is an inner product in the feature space generated by all subsequences of length \(n\). \[ K_{n}(s,t)=\sum_{u\in\Sigma^{n}}\langle \phi_{u}(s), \phi_{u}(t)\rangle = \sum_{u\in\Sigma^{n}}\sum_{\mathbf{i}:u=s[\mathbf{i}]} \sum_{\mathbf{j}:u=t[\mathbf{j}]}\lambda^{l(\mathbf{i})+l(\mathbf{j})} \] Since the subsequences are weighted by the exponentially decaying factor \(\lambda\) of their full length in the text, more weight is given to those occurrences that are nearly contiguous. A direct computation is infeasible since the dimension of the feature space grows exponentially with \(n\). The paper describes an efficient computation approach using a dynamic programming technique. More...  