the class Multidimensionalscaling is used to perform multidimensional scaling (capable of landmark approximation if requested).
Description of classical embedding is given on p.261 (Section 12.1) of Borg, I., & Groenen, P. J. F. (2005). Modern multidimensional scaling: Theory and applications. Springer.
Description of landmark MDS approximation is given in
Sparse multidimensional scaling using landmark points V De Silva, J B Tenenbaum (2004) Technology, p. 1-4
In this preprocessor the LAPACK routine DSYEVR is used for solving an eigenproblem. If ARPACK library is available, its routines DSAUPD/DSEUPD are used instead.
Note that target dimension should be set with reasonable value (using set_target_dim). In case it is higher than intrinsic dimensionality of the dataset 'extra' features of the output might be inconsistent (essentially, according to zero or negative eigenvalues). In this case a warning is showed.
It is possible to apply multidimensional scaling to any given distance using apply_to_distance_matrix method. By default euclidean distance is used (with parallel instance replaced by preprocessor's one).
Faster landmark approximation is parallel using pthreads. As for choice of landmark number it should be at least 3 for proper triangulation. For reasonable embedding accuracy greater values (30-50% of total examples number) is pretty good for the most tasks.
Definition at line 60 of file MultidimensionalScaling.h.
Public Member Functions | |
CMultidimensionalScaling () | |
virtual | ~CMultidimensionalScaling () |
virtual bool | init (CFeatures *features) |
virtual void | cleanup () |
virtual CSimpleFeatures < float64_t > * | apply_to_distance (CDistance *distance) |
virtual SGMatrix< float64_t > | apply_to_feature_matrix (CFeatures *features) |
virtual SGVector< float64_t > | apply_to_feature_vector (SGVector< float64_t > vector) |
virtual const char * | get_name () const |
virtual EPreprocessorType | get_type () const |
SGVector< float64_t > | get_eigenvalues () const |
void | set_landmark_number (int32_t num) |
int32_t | get_landmark_number () const |
void | set_landmark (bool landmark) |
bool | get_landmark () const |
Protected Member Functions | |
void | init () |
SGMatrix< float64_t > | classic_embedding (SGMatrix< float64_t > distance_matrix) |
SGMatrix< float64_t > | landmark_embedding (SGMatrix< float64_t > distance_matrix) |
Static Protected Member Functions | |
static void * | run_triangulation_thread (void *p) |
static SGVector< int32_t > | shuffle (int32_t count, int32_t total_count) |
Protected Attributes | |
SGVector< float64_t > | m_eigenvalues |
bool | m_landmark |
int32_t | m_landmark_number |
Definition at line 60 of file MultidimensionalScaling.cpp.
~CMultidimensionalScaling | ( | ) | [virtual] |
Definition at line 84 of file MultidimensionalScaling.cpp.
CSimpleFeatures< float64_t > * apply_to_distance | ( | CDistance * | distance | ) | [virtual] |
apply preprocessor to CDistance
distance | (should be approximate euclidean for consistent result) |
Reimplemented in CIsomap.
Definition at line 89 of file MultidimensionalScaling.cpp.
apply preprocessor to feature matrix, changes feature matrix to the one having target dimensionality
features | features which feature matrix should be processed |
Reimplemented from CDimensionReductionPreprocessor.
Reimplemented in CIsomap.
Definition at line 112 of file MultidimensionalScaling.cpp.
apply preprocessor to feature vector
vector |
Reimplemented from CDimensionReductionPreprocessor.
Reimplemented in CIsomap.
Definition at line 143 of file MultidimensionalScaling.cpp.
classical embedding
distance_matrix | distance matrix to be used for embedding |
Definition at line 149 of file MultidimensionalScaling.cpp.
void cleanup | ( | ) | [virtual] |
cleanup
Reimplemented from CDimensionReductionPreprocessor.
Reimplemented in CIsomap.
Definition at line 80 of file MultidimensionalScaling.cpp.
get last embedding eigenvectors
Definition at line 106 of file MultidimensionalScaling.h.
bool get_landmark | ( | ) | const |
getter for landmark parameter
Definition at line 142 of file MultidimensionalScaling.h.
int32_t get_landmark_number | ( | ) | const |
get number of landmarks
Definition at line 126 of file MultidimensionalScaling.h.
virtual const char* get_name | ( | void | ) | const [virtual] |
get name
Reimplemented from CDimensionReductionPreprocessor.
Reimplemented in CIsomap.
Definition at line 98 of file MultidimensionalScaling.h.
virtual EPreprocessorType get_type | ( | ) | const [virtual] |
get type
Reimplemented from CDimensionReductionPreprocessor.
Reimplemented in CIsomap.
Definition at line 101 of file MultidimensionalScaling.h.
bool init | ( | CFeatures * | features | ) | [virtual] |
empty init
features |
Reimplemented from CDimensionReductionPreprocessor.
Reimplemented in CIsomap.
Definition at line 75 of file MultidimensionalScaling.cpp.
void init | ( | void | ) | [protected] |
default initialization
Reimplemented from CDimensionReductionPreprocessor.
Reimplemented in CIsomap.
Definition at line 66 of file MultidimensionalScaling.cpp.
landmark embedding (approximate, accuracy varies with m_landmark_num parameter)
distance_matrix | distance matrix to be used for embedding |
Definition at line 270 of file MultidimensionalScaling.cpp.
void * run_triangulation_thread | ( | void * | p | ) | [static, protected] |
run triangulation thread for landmark embedding →→→→→→→→ *
p | thread parameters →→→→→→→→ |
Definition at line 405 of file MultidimensionalScaling.cpp.
void set_landmark | ( | bool | landmark | ) |
setter for landmark parameter
landmark | true if landmark embedding should be used |
Definition at line 134 of file MultidimensionalScaling.h.
void set_landmark_number | ( | int32_t | num | ) |
set number of landmarks should be lesser than number of examples and greater than 3 for consistent embedding as triangulation is used
num | number of landmark to be set |
Definition at line 116 of file MultidimensionalScaling.h.
SGVector< int32_t > shuffle | ( | int32_t | count, | |
int32_t | total_count | |||
) | [static, protected] |
subroutine used to shuffle count indexes among of total_count ones →→→→→→→→ * with Fisher-Yates (known as Knuth too) shuffle algorithm →→→→→→→→ *
count | number of indexes to be shuffled and returned →→→→→→→→ * | |
total_count | total number of indexes →→→→→→→→ * |
Definition at line 447 of file MultidimensionalScaling.cpp.
SGVector<float64_t> m_eigenvalues [protected] |
last embedding eigenvalues
Definition at line 182 of file MultidimensionalScaling.h.
bool m_landmark [protected] |
use landmark approximation?
Definition at line 185 of file MultidimensionalScaling.h.
int32_t m_landmark_number [protected] |
number of landmarks
Definition at line 188 of file MultidimensionalScaling.h.