Public Member Functions | Protected Member Functions | Protected Attributes

CWeightedCommWordStringKernel Class Reference


Detailed Description

The WeightedCommWordString kernel may be used to compute the weighted spectrum kernel (i.e. a spectrum kernel for 1 to K-mers, where each k-mer length is weighted by some coefficient $\beta_k$) from strings that have been mapped into unsigned 16bit integers.

These 16bit integers correspond to k-mers. To applicable in this kernel they need to be sorted (e.g. via the SortWordString pre-processor).

It basically uses the algorithm in the unix "comm" command (hence the name) to compute:

\[ k({\bf x},({\bf x'})= \sum_{k=1}^K\beta_k\Phi_k({\bf x})\cdot \Phi_k({\bf x'}) \]

where $\Phi_k$ maps a sequence ${\bf x}$ that consists of letters in $\Sigma$ to a feature vector of size $|\Sigma|^k$. In this feature vector each entry denotes how often the k-mer appears in that ${\bf x}$.

Note that this representation is especially tuned to small alphabets (like the 2-bit alphabet DNA), for which it enables spectrum kernels of order 8.

For this kernel the linadd speedups are quite efficiently implemented using direct maps.

Definition at line 50 of file WeightedCommWordStringKernel.h.

Inheritance diagram for CWeightedCommWordStringKernel:
Inheritance graph
[legend]

List of all members.

Public Member Functions

 CWeightedCommWordStringKernel ()
 CWeightedCommWordStringKernel (int32_t size, bool use_sign)
 CWeightedCommWordStringKernel (CStringFeatures< uint16_t > *l, CStringFeatures< uint16_t > *r, bool use_sign=false, int32_t size=10)
virtual ~CWeightedCommWordStringKernel ()
virtual bool init (CFeatures *l, CFeatures *r)
virtual void cleanup ()
virtual float64_t compute_optimized (int32_t idx)
virtual void add_to_normal (int32_t idx, float64_t weight)
void merge_normal ()
bool set_wd_weights ()
bool set_weights (float64_t *w, int32_t d)
virtual EKernelType get_kernel_type ()
virtual const char * get_name () const
virtual EFeatureType get_feature_type ()
virtual float64_tcompute_scoring (int32_t max_degree, int32_t &num_feat, int32_t &num_sym, float64_t *target, int32_t num_suppvec, int32_t *IDX, float64_t *alphas, bool do_init=true)

Protected Member Functions

virtual float64_t compute_helper (int32_t idx_a, int32_t idx_b, bool do_sort)

Protected Attributes

int32_t degree
float64_tweights

Constructor & Destructor Documentation

default constructor

Definition at line 18 of file WeightedCommWordStringKernel.cpp.

CWeightedCommWordStringKernel ( int32_t  size,
bool  use_sign 
)

constructor

Parameters:
size cache size
use_sign if sign shall be used

Definition at line 24 of file WeightedCommWordStringKernel.cpp.

CWeightedCommWordStringKernel ( CStringFeatures< uint16_t > *  l,
CStringFeatures< uint16_t > *  r,
bool  use_sign = false,
int32_t  size = 10 
)

constructor

Parameters:
l features of left-hand side
r features of right-hand side
use_sign if sign shall be used
size cache size

Definition at line 32 of file WeightedCommWordStringKernel.cpp.

~CWeightedCommWordStringKernel (  )  [virtual]

Definition at line 43 of file WeightedCommWordStringKernel.cpp.


Member Function Documentation

void add_to_normal ( int32_t  idx,
float64_t  weight 
) [virtual]

add to normal

Parameters:
idx where to add
weight what to add

Reimplemented from CCommWordStringKernel.

Definition at line 191 of file WeightedCommWordStringKernel.cpp.

void cleanup (  )  [virtual]

clean up kernel

Reimplemented from CCommWordStringKernel.

Definition at line 59 of file WeightedCommWordStringKernel.cpp.

float64_t compute_helper ( int32_t  idx_a,
int32_t  idx_b,
bool  do_sort 
) [protected, virtual]

helper for compute

Parameters:
idx_a index a
idx_b index b
do_sort if sorting shall be performed

Reimplemented from CCommWordStringKernel.

Definition at line 96 of file WeightedCommWordStringKernel.cpp.

float64_t compute_optimized ( int32_t  idx  )  [virtual]

compute optimized

Parameters:
idx index to compute
Returns:
optimized value at given index

Reimplemented from CCommWordStringKernel.

Definition at line 253 of file WeightedCommWordStringKernel.cpp.

float64_t * compute_scoring ( int32_t  max_degree,
int32_t &  num_feat,
int32_t &  num_sym,
float64_t target,
int32_t  num_suppvec,
int32_t *  IDX,
float64_t alphas,
bool  do_init = true 
) [virtual]

compute scoring

Parameters:
max_degree maximum degree
num_feat number of features
num_sym number of symbols
target target
num_suppvec number of support vectors
IDX IDX
alphas alphas
do_init if initialization shall be performed
Returns:
computed score

Reimplemented from CCommWordStringKernel.

Definition at line 288 of file WeightedCommWordStringKernel.cpp.

virtual EFeatureType get_feature_type (  )  [virtual]

return feature type the kernel can deal with

Returns:
feature type WORD

Reimplemented from CCommWordStringKernel.

Definition at line 134 of file WeightedCommWordStringKernel.h.

virtual EKernelType get_kernel_type (  )  [virtual]

return what type of kernel we are

Returns:
kernel type WEIGHTEDCOMMWORDSTRING

Reimplemented from CCommWordStringKernel.

Definition at line 122 of file WeightedCommWordStringKernel.h.

virtual const char* get_name ( void   )  const [virtual]

return the kernel's name

Returns:
name WeightedCommWordString

Reimplemented from CCommWordStringKernel.

Definition at line 128 of file WeightedCommWordStringKernel.h.

bool init ( CFeatures l,
CFeatures r 
) [virtual]

initialize kernel

Parameters:
l features of left-hand side
r features of right-hand side
Returns:
if initializing was successful

Reimplemented from CCommWordStringKernel.

Definition at line 48 of file WeightedCommWordStringKernel.cpp.

void merge_normal (  ) 

merge normal

Definition at line 221 of file WeightedCommWordStringKernel.cpp.

bool set_wd_weights (  ) 

set weighted degree weights

Returns:
if setting was successful

Definition at line 67 of file WeightedCommWordStringKernel.cpp.

bool set_weights ( float64_t w,
int32_t  d 
)

set custom weights (swig compatible)

Parameters:
w weights
d degree (must match number of weights)
Returns:
if setting was successful

Definition at line 85 of file WeightedCommWordStringKernel.cpp.


Member Data Documentation

int32_t degree [protected]

degree

Definition at line 168 of file WeightedCommWordStringKernel.h.

float64_t* weights [protected]

weights for each of the subkernels of degree 1...d

Definition at line 171 of file WeightedCommWordStringKernel.h.


The documentation for this class was generated from the following files:
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines

SHOGUN Machine Learning Toolbox - Documentation