SHOGUN  4.1.0
 全部  命名空间 文件 函数 变量 类型定义 枚举 枚举值 友元 宏定义  
Methods.mainpage
浏览该文件的文档.
1 /*! \page methods 机器学习方法
2 
3 目前Shogun的机器学习功能分为几个部分:feature表示,feature预处理,
4 核函数表示,核函数标准化,距离表示,分类器表示,聚类方法,分布,
5 性能评价方法,回归方法,结构化输出学习器。以下是shogun已实现的机器学习
6 相关算法和类。
7 
8  \section featrep_sec Feature表示
9 Shogun提供多种feature表示。它们分别是:简单feature(参照CSimpleFeatures),它们是标准的二维
10 矩阵;字符串feature(参照CStringFeatures),它们其实是一个包含多个字符串的列表,每个字符串的
11 长度不限;稀疏feature(参照CSparseFeatures),它们用于表示稀疏矩阵。
12 
13  每一种对象
14 
15  \li Simple Features (CSimpleFeatures)
16  \li Strings (CStringFeatures)
17  \li Sparse Features (CSparseFeatures)
18 
19  支持下面这些数据类型:
20 
21  \li bool
22  \li 8bit char
23  \li 8bit Byte
24  \li 16bit Integer
25  \li 16bit Word
26  \li 32bit Integer
27  \li 32bit Unsigned Integer
28  \li 32bit Float matrix
29  \li 64bit Float matrix
30  \li 96bit Float matrix
31 
32  另外还有其它的feature类型。其中有些是基于上面的三种基本的feature类型,如CTOPFeatures
33  (CHMM中使用的TOP Kernel features),CFKFeatures(CHMM使用的Fisher Kernel features)
34  和CRealFileFeatures(从一个二进制文件获取向量)。请注意,所有feature类型都继承于
35  CFeatures。更加复杂的类型还有
36  \li CAttributeFeatures - Features of attribute value pairs.
37  \li CCombinedDotFeatures - Features that allow stacking of dot features.
38  \li CCombinedFeatures - Features that allow stacking of arbitrary features.
39  \li CDotFeatures - Features that support a certain set of features (like multiplication with a scalar + addition to a dense vector). Examples are sparse and dense features.
40  \li CDummyFeatures - Features without content; Only number of vectors is known.
41  \li CExplicitSpecFeatures - Implement spectrum kernel feature space explicitly.
42  \li CImplicitWeightedSpecFeatures - DotFeatures that implicitly implement weighted spectrum kernel features.
43  \li CWDFeatures - DotFeatures that implicitly implement weighted degree kernel features.
44 
45 另外,label由CLabels表示,字母表由CAlphabet表示。
46 
47 
48  \section preproc_sec 预处理器
49  前面提到的所在feature类型都可以作预处理,如减去均值或将向量范数标准化为1等。以下是已实现的预处理器:
50  \li CNormOne - Normalizes vectors to norm 1.
51  \li CLogPlusOne - add 1 and applies log().
52  \li CPCACut - Keeps eigenvectors with the highest eigenvalues.
53  \li CPruneVarSubMean - removes dimensions with little variance, substracting the mean.
54  \li CSortUlongString - Sorts vectors.
55  \li CSortWordString - Sorts vectors.
56 
57 
58 
59  \section classifiers_sec 分类器
60 
61 在shogun中实现了一系列分类器。它们中有些是标准的二类分类器,有些是一类分类器,有
62 些是多类分类器。它们中有一部分是线性分类器和SVM。较快的线性SVM分类器有CSGD,
63 CSVMOcas及CLibLinear,它们能处理上百万的样本及feature。
64 
65  \subsection linclassi_sec 线性分类器
66  \li CPerceptron - standard online perceptron
67  \li CLDA - fishers linear discriminant
68  \li CLPM - linear programming machine (1-norm regularized SVM)
69  \li CLPBoost - linear programming machine using boosting on the features
70  \li CSVMPerf - a linear svm with l2-regularized bias
71  \li CLibLinear - a linear svm with l2-regularized bias
72  \li CSVMLin - a linear svm with l2-regularized bias
73  \li CSVMOcas - a linear svm with l2-regularized bias
74  \li CSubgradientSVM - SVM based on steepest subgradient descent
75  \li CSubgradientLPM - LPM based on steepest subgradient descent
76 
77 
78  \subsubsection svmclassi_sec 支持向量机(SVM)
79  \li CSVMLight - A variant of SVMlight using pr_loqo as its internal solver.
80  \li CLibSVM - LibSVM modified to use shoguns kernel framework.
81  \li CMPDSVM - Minimal Primal Dual SVM
82  \li CGPBTSVM - Gradient Projection Technique SVM
83  \li CWDSVMOcas - CSVMOcas based SVM using explicitly spanned WD-Kernel feature space
84  \li CGMNPSVM - A true multiclass one vs. rest SVM
85  \li CGNPPSVM - SVM solver based on the generalized nearest point problem
86  \li CMCSVM - An experimental multiclass SVM
87  \li CLibSVMMultiClass - LibSVMs one vs. one multiclass SVM solver
88  \li CLibSVMOneClass - LibSVMs one-class SVM
89 
90 
91  \subsection distmachine_sec 距离学习机
92  \li k-Nearest Neighbor - Standard k-NN
93 
94 
95 
96 
97  \section regression_sec 回归
98  \subsection 支持向量回归(SVR)
99  \li CSVRLight - SVMLight based SVR
100  \li CLibSVR - LIBSVM based SVR
101 
102  \subsection other_regress 其它
103  \li CKRR - Kernel Ridge Regression
104 
105 
106 
107  \section distrib_sec 分布
108  \li CHMM - Hidden Markov Models
109  \li CHistogram - Histogram
110  \li CLinearHMM - Markov chains (embedded in ``Linear'' HMMs)
111 
112 
113 
114 
115  \section cluster_sec 聚类
116  \li CHierarchical - Agglomerative hierarchical single linkage clustering.
117  \li CKMeans - k-Means Clustering
118 
119 
120 
121 
122  \section mkl_sec 多核函数学习(Multiple Kernel Learning)
123  \li CMKLRegression for q-norm MKL with Regression
124  \li CMKLOneClass for q-norm 1-class MKL
125  \li CMKLClassification for q-norm 2-class MKL
126  \li CGMNPMKL for 1-norm multi-class MKL
127 
128 
129 
130 
131  \section kernels_sec 核函数
132  \li CAUCKernel - To maximize AUC in SVM training (takes a kernel as input)
133  \li CChi2Kernel - Chi^2 Kernel
134  \li CCombinedKernel - Combined kernel to work with multiple kernels
135  \li CCommUlongStringKernel - Spectrum Kernel with spectrums of up to 64bit
136  \li CCommWordStringKernel - Spectrum kernel with spectrum of up to 16 bit
137  \li CConstKernel - A ``kernel'' returning a constant
138  \li CCustomKernel - A user supplied custom kernel
139  \li CDiagKernel - A kernel with nonzero elements only on the diagonal
140  \li CDistanceKernel - A transformation to transform distances into similarities
141  \li CFixedDegreeStringKernel - A string kernel
142  \li CGaussianKernel - The standard Gaussian kernel
143  \li CGaussianShiftKernel - Gaussian kernel with shift (inspired by the Weighted Degree shift kernel
144  \li CGaussianShortRealKernel - Gaussian Kernel on 32bit Floats
145  \li CHistogramWordStringKernel - A TOP kernel on Sequences
146  \li CLinearByteKernel - Linear Kernel on Bytes
147  \li CLinearKernel - Linear Kernel
148  \li CLinearStringKernel - Linear Kernel on Strings
149  \li CLinearWordKernel - Linear Kernel on Words
150  \li CLocalAlignmentStringKernel - The local alignment kernel
151  \li CLocalityImprovedStringKernel - The locality improved kernel
152  \li CMatchWordStringKernel - Another String kernel
153  \li COligoStringKernel - The oligo string kernel
154  \li CPolyKernel - the polynomial kernel
155  \li CPolyMatchStringKernel - polynomial kernel on strings
156  \li CPolyMatchWordStringKernel - polynomial kernel on strings
157  \li CPyramidChi2 - pyramid chi2 kernel (from image analysis)
158  \li CRegulatoryModulesStringKernel - regulatory modules string kernel
159  \li CSalzbergWordStringKernel - salzberg features based string kernel
160  \li CSigmoidKernel - Tanh sigmoidal kernel
161  \li CSimpleLocalityImprovedStringKernel - A variant of the locality improved kernel
162  \li CSparseGaussianKernel - Gaussian Kernel on sparse features
163  \li CSparseLinearKernel - Linear Kernel on sparse features
164  \li CSparsePolyKernel - Polynomial Kernel on sparse features
165  \li CTensorProductPairKernel - The Tensor Product Pair Kernel (TPPK)
166  \li CWeightedCommWordStringKernel - A weighted (or blended) spectrum kernel
167  \li CWeightedDegreePositionStringKernel - Weighted Degree kernel with shift
168  \li CWeightedDegreeStringKernel - Weighted Degree string kernel
169 
170 
171 
172 
173  \subsection kernel_normalizer 核函数标准化
174 因为有些核函数对某些SVM来说数值不稳定,它们需要先作一些标准化。
175 
176  \li CSqrtDiagKernelNormalizer - divide kernel by square root of product of diagonal
177  \li CAvgDiagKernelNormalizer - divide by average diagonal value
178  \li CFirstElementKernelNormalizer - divide by first kernel element k(0,0)
179  \li CIdentityKernelNormalizer - no normalization
180  \li CDiceKernelNormalizer - normalization inspired by the dice coefficient
181  \li CRidgeKernelNormalizer - adds a ridge on the kernel diagonal
182  \li CTanimotoKernelNormalizer - tanimoto coefficient inspired normalizer
183  \li CVarianceKernelNormalizer - normalize vectors in feature space to norm 1
184 
185 
186 
187 
188  \section dist_sec 距离
189 距离用于度量两个对象之间的矩离。它们可以用在CDistanceMachine对象中,如CKNN。
190 下面是已实现的矩离表示
191 
192  \li CBrayCurtisDistance - Bray curtis distance
193  \li CCanberraMetric - Canberra metric
194  \li CChebyshewMetric - Chebyshew metric
195  \li CChiSquareDistance - Chi^2 distance
196  \li CCosineDistance - Cosine distance
197  \li CEuclidianDistance - Euclidian Distance
198  \li CGeodesicMetric - Geodesic metric
199  \li CHammingWordDistance - Hammin distance
200  \li CJensenMetric - Jensen metric
201  \li CManhattanMetric - Manhatten metric
202  \li CMinkowskiMetric - Minkowski metric
203  \li CTanimotoDistance - Tanimoto distance
204 
205 
206 
207 
208  \section eval_sec 评价
209  \subsection perf_sec 性能度量
210  性能度量用于评价预测质量,在shogun中CPerformanceMeasures实现。下面是已实现的
211  性能度量
212  \li Receiver Operating Curve (ROC)
213  \li Area under the ROC curve (auROC)
214  \li Area over the ROC curve (aoROC)
215  \li Precision Recall Curve (PRC)
216  \li Area under the PRC (auPRC)
217  \li Area over the PRC (aoPRC)
218  \li Detection Error Tradeoff (DET)
219  \li Area under the DET (auDET)
220  \li Area over the DET (aoDET)
221  \li Cross Correlation coefficient (CC)
222  \li Weighted Relative Accuracy (WRAcc)
223  \li Balanced Error (BAL)
224  \li F-Measure
225  \li Accuracy
226  \li Error
227 
228 */

SHOGUN 机器学习工具包 - 项目文档