This class offers access to the Oligo Kernel introduced by Meinicke et al. in 2004.
The class has functions to preprocess the data such that the kernel computation can be pursued faster. The kernel function is then kernelOligoFast or kernelOligo.
Requires significant speedup, should be working but as is might be applicable only to academic small scale problems:
Uses CSqrtDiagKernelNormalizer, as the vanilla kernel seems to be very diagonally dominant.
Definition at line 41 of file OligoStringKernel.h.
Public Member Functions | |
COligoStringKernel () | |
COligoStringKernel (int32_t cache_size, int32_t k, float64_t width) | |
virtual | ~COligoStringKernel () |
virtual bool | init (CFeatures *l, CFeatures *r) |
virtual EKernelType | get_kernel_type () |
virtual const char * | get_name () const |
virtual float64_t | compute (int32_t x, int32_t y) |
virtual void | cleanup () |
Protected Member Functions | |
float64_t | kernelOligoFast (const std::vector< std::pair< int32_t, float64_t > > &x, const std::vector< std::pair< int32_t, float64_t > > &y, int32_t max_distance=-1) |
returns the value of the oligo kernel for sequences 'x' and 'y' | |
Static Protected Member Functions | |
static void | encodeOligo (const std::string &sequence, uint32_t k_mer_length, const std::string &allowed_characters, std::vector< std::pair< int32_t, float64_t > > &values) |
encodes the signals of the sequence | |
static void | getSequences (const std::vector< std::string > &sequences, uint32_t k_mer_length, const std::string &allowed_characters, std::vector< std::vector< std::pair< int32_t, float64_t > > > &encoded_sequences) |
encodes all sequences with the encodeOligo function and stores them in 'encoded_sequences' | |
Protected Attributes | |
int32_t | k |
float64_t | width |
float64_t * | gauss_table |
int32_t | gauss_table_len |
default constructor
Definition at line 24 of file OligoStringKernel.cpp.
COligoStringKernel | ( | int32_t | cache_size, | |
int32_t | k, | |||
float64_t | width | |||
) |
Constructor
cache_size | cache size for kernel | |
k | k-mer length | |
width | - equivalent to 2*sigma^2 |
Definition at line 30 of file OligoStringKernel.cpp.
~COligoStringKernel | ( | ) | [virtual] |
Destructor
Definition at line 39 of file OligoStringKernel.cpp.
void cleanup | ( | ) | [virtual] |
clean up your kernel
Reimplemented from CKernel.
Definition at line 44 of file OligoStringKernel.cpp.
float64_t compute | ( | int32_t | x, | |
int32_t | y | |||
) | [virtual] |
compute kernel function for features a and b idx_{a,b} denote the index of the feature vectors in the corresponding feature object
abstract base method
x | index a | |
y | index b |
Implements CKernel.
Definition at line 233 of file OligoStringKernel.cpp.
void encodeOligo | ( | const std::string & | sequence, | |
uint32_t | k_mer_length, | |||
const std::string & | allowed_characters, | |||
std::vector< std::pair< int32_t, float64_t > > & | values | |||
) | [static, protected] |
encodes the signals of the sequence
This function stores the oligo function signals in 'values'.
The 'k_mer_length' and the 'allowed_characters' determine, which signals are used. Every pair contains the position of the signal and a numerical value reflecting the signal. The numerical value represents the k_mer to a base n = |allowed_characters|. Example: The value of k_mer CG for the allowed characters ACGT would be 1 * n^1 + 2 * n^0 = 6.
Definition at line 67 of file OligoStringKernel.cpp.
virtual EKernelType get_kernel_type | ( | ) | [virtual] |
return what type of kernel we are
Implements CStringKernel< char >.
Definition at line 69 of file OligoStringKernel.h.
virtual const char* get_name | ( | ) | const [virtual] |
return the kernel's name
Reimplemented from CStringKernel< char >.
Definition at line 75 of file OligoStringKernel.h.
void getSequences | ( | const std::vector< std::string > & | sequences, | |
uint32_t | k_mer_length, | |||
const std::string & | allowed_characters, | |||
std::vector< std::vector< std::pair< int32_t, float64_t > > > & | encoded_sequences | |||
) | [static, protected] |
encodes all sequences with the encodeOligo function and stores them in 'encoded_sequences'
This function encodes the sequences of 'sequences' via the function encodeOligo.
Definition at line 125 of file OligoStringKernel.cpp.
initialize kernel
l | features of left-hand side | |
r | features of right-hand side |
Reimplemented from CStringKernel< char >.
Definition at line 53 of file OligoStringKernel.cpp.
float64_t kernelOligoFast | ( | const std::vector< std::pair< int32_t, float64_t > > & | x, | |
const std::vector< std::pair< int32_t, float64_t > > & | y, | |||
int32_t | max_distance = -1 | |||
) | [protected] |
returns the value of the oligo kernel for sequences 'x' and 'y'
This function computes the kernel value of the oligo kernel, which was introduced by Meinicke et al. in 2004. 'x' and 'y' are encoded by encodeOligo and 'exp_cache' has to be constructed by getExpFunctionCache.
'max_distance' can be used to speed up the computation even further by restricting the maximum distance between a k_mer at position i in sequence 'x' and a k_mer at position j in sequence 'y'. If i - j > 'max_distance' the value is not added to the kernel value. This approximation is switched off by default (max_distance < 0).
Definition at line 153 of file OligoStringKernel.cpp.
float64_t* gauss_table [protected] |
cache for exp (see getExpFunctionCache above)
Definition at line 162 of file OligoStringKernel.h.
int32_t gauss_table_len [protected] |
length of gauss table
Definition at line 164 of file OligoStringKernel.h.
int32_t k [protected] |
k-mer length
Definition at line 158 of file OligoStringKernel.h.
width of kernel
Definition at line 160 of file OligoStringKernel.h.