Public Member Functions | Protected Member Functions | Protected Attributes

CKMeans Class Reference

Detailed Description

KMeans clustering, partitions the data into k (a-priori specified) clusters.

It minimizes

\[ \sum_{i=1}^k\sum_{x_j\in S_i} (x_j-\mu_i)^2 \]

where $\mu_i$ are the cluster centers and $S_i,\;i=1,\dots,k$ are the index sets of the clusters.

Beware that this algorithm obtains only a local optimum.


Definition at line 39 of file KMeans.h.

Inheritance diagram for CKMeans:
Inheritance graph

List of all members.

Public Member Functions

 CKMeans ()
 CKMeans (int32_t k, CDistance *d)
virtual ~CKMeans ()
virtual EClassifierType get_classifier_type ()
virtual bool load (FILE *srcfile)
virtual bool save (FILE *dstfile)
void set_k (int32_t p_k)
int32_t get_k ()
void set_max_iter (int32_t iter)
float64_t get_max_iter ()
SGVector< float64_tget_radiuses ()
SGMatrix< float64_tget_cluster_centers ()
int32_t get_dimensions ()
virtual const char * get_name () const

Protected Member Functions

void clustknb (bool use_old_mus, float64_t *mus_start)
virtual bool train_machine (CFeatures *data=NULL)
virtual void store_model_features ()

Protected Attributes

int32_t max_iter
 maximum number of iterations
int32_t k
 the k parameter in KMeans
int32_t dimensions
 number of dimensions
SGVector< float64_tR
 radi of the clusters (size k)

Constructor & Destructor Documentation

CKMeans (  ) 

default constructor

Definition at line 29 of file KMeans.cpp.

CKMeans ( int32_t  k,
CDistance d 


k parameter k
d distance

Definition at line 35 of file KMeans.cpp.

~CKMeans (  )  [virtual]

Definition at line 43 of file KMeans.cpp.

Member Function Documentation

void clustknb ( bool  use_old_mus,
float64_t mus_start 
) [protected]


use_old_mus if old mus shall be used
mus_start mus start

replace rhs feature vectors

set rhs to mus_start

update rhs

Definition at line 179 of file KMeans.cpp.

virtual EClassifierType get_classifier_type (  )  [virtual]

get classifier type

classifier type KMEANS

Reimplemented from CMachine.

Definition at line 57 of file KMeans.h.

SGMatrix< float64_t > get_cluster_centers (  ) 

get centers

cluster centers or empty matrix if no radiuses are there (not trained yet)

Definition at line 115 of file KMeans.cpp.

int32_t get_dimensions (  ) 

get dimensions

number of dimensions

Definition at line 127 of file KMeans.cpp.

int32_t get_k (  ) 

get k

the parameter k

Definition at line 94 of file KMeans.cpp.

float64_t get_max_iter (  ) 

get maximum number of iterations

maximum number of iterations

Definition at line 105 of file KMeans.cpp.

virtual const char* get_name (  )  const [virtual]
object name

Reimplemented from CDistanceMachine.

Definition at line 116 of file KMeans.h.

SGVector< float64_t > get_radiuses (  ) 

get radiuses


Definition at line 110 of file KMeans.cpp.

bool load ( FILE *  srcfile  )  [virtual]

load distance machine from file

srcfile file to load from
if loading was successful

Reimplemented from CMachine.

Definition at line 73 of file KMeans.cpp.

bool save ( FILE *  dstfile  )  [virtual]

save distance machine to file

dstfile file to save to
if saving was successful

Reimplemented from CMachine.

Definition at line 80 of file KMeans.cpp.

void set_k ( int32_t  p_k  ) 

set k

p_k new k

Definition at line 88 of file KMeans.cpp.

void set_max_iter ( int32_t  iter  ) 

set maximum number of iterations

iter the new maximum

Definition at line 99 of file KMeans.cpp.

void store_model_features (  )  [protected, virtual]

Ensures cluster centers are in lhs of underlying distance

Reimplemented from CDistanceMachine.

Definition at line 464 of file KMeans.cpp.

bool train_machine ( CFeatures data = NULL  )  [protected, virtual]

train k-means

data training data (parameter can be avoided if distance or kernel-based classifiers are used and distance/kernels are initialized with train data)
whether training was successful

Reimplemented from CMachine.

Definition at line 48 of file KMeans.cpp.

Member Data Documentation

int32_t dimensions [protected]

number of dimensions

Definition at line 150 of file KMeans.h.

int32_t k [protected]

the k parameter in KMeans

Definition at line 147 of file KMeans.h.

int32_t max_iter [protected]

maximum number of iterations

Definition at line 144 of file KMeans.h.

SGVector<float64_t> R [protected]

radi of the clusters (size k)

Definition at line 153 of file KMeans.h.

The documentation for this class was generated from the following files:
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines

SHOGUN Machine Learning Toolbox - Documentation