Multi-class Error-Correcting Output Codes

ECOC (Error-Correcting Output Codes) is a multi-class learning strategy. ECOC trains \(L\) binary classifers and transforms the results of the multiple classifications into a matrix, which is called ECOC codebook. A decoder is applied to interpret the codebook, and to predict the labels of the samples.

The difference between ECOC and OvR/OvO strategies (See multi-class linear machine cookbook) is that in ECOC, \(L\) is greater than class number \(K\), hence the training process is error-correcting.

There are multiple methods to encode or decode a codebook. In this example, we show how to apply random encoder and hamming distance decoder to multi-class dataset.

Codebooks can also be encoded as dense or sparse. For dense codebooks, only \(+1\) and \(-1\) are generated as labels for each sample in each binary classifier. In sparse codebooks, \(+1\), \(-1\) and \(0\) are allowed, where \(0\) labels the samples that are not classified.

In this examples, we use CECOCRandomDenseEncoder as encoder. CECOCRandomSparseEncoder can be applied for generating sparse codebooks. More encoders and decoders are available in Shogun and can be passed to CECOCStrategy via the interface CECOCEncoder and CECOCDecoder.

See [EPR10] for a detailed introduction

Example

Imagine we have files with training and test data. We create CDenseFeatures (here 64 bit floats aka RealFeatures) and CMulticlassLabels as

features_train = RealFeatures(f_feats_train)
features_test = RealFeatures(f_feats_test)
labels_train = MulticlassLabels(f_labels_train)
labels_test = MulticlassLabels(f_labels_test)
features_train = RealFeatures(f_feats_train);
features_test = RealFeatures(f_feats_test);
labels_train = MulticlassLabels(f_labels_train);
labels_test = MulticlassLabels(f_labels_test);
RealFeatures features_train = new RealFeatures(f_feats_train);
RealFeatures features_test = new RealFeatures(f_feats_test);
MulticlassLabels labels_train = new MulticlassLabels(f_labels_train);
MulticlassLabels labels_test = new MulticlassLabels(f_labels_test);
features_train = Modshogun::RealFeatures.new f_feats_train
features_test = Modshogun::RealFeatures.new f_feats_test
labels_train = Modshogun::MulticlassLabels.new f_labels_train
labels_test = Modshogun::MulticlassLabels.new f_labels_test
features_train <- RealFeatures(f_feats_train)
features_test <- RealFeatures(f_feats_test)
labels_train <- MulticlassLabels(f_labels_train)
labels_test <- MulticlassLabels(f_labels_test)
features_train = modshogun.RealFeatures(f_feats_train)
features_test = modshogun.RealFeatures(f_feats_test)
labels_train = modshogun.MulticlassLabels(f_labels_train)
labels_test = modshogun.MulticlassLabels(f_labels_test)
RealFeatures features_train = new RealFeatures(f_feats_train);
RealFeatures features_test = new RealFeatures(f_feats_test);
MulticlassLabels labels_train = new MulticlassLabels(f_labels_train);
MulticlassLabels labels_test = new MulticlassLabels(f_labels_test);
auto features_train = some<CDenseFeatures<float64_t>>(f_feats_train);
auto features_test = some<CDenseFeatures<float64_t>>(f_feats_test);
auto labels_train = some<CMulticlassLabels>(f_labels_train);
auto labels_test = some<CMulticlassLabels>(f_labels_test);

We use CLibLinear as base classifier and create an instance of CLibLinear. (See the linear SVM cookbook)

classifier = LibLinear()
classifier = LibLinear();
LibLinear classifier = new LibLinear();
classifier = Modshogun::LibLinear.new 
classifier <- LibLinear()
classifier = modshogun.LibLinear()
LibLinear classifier = new LibLinear();
auto classifier = some<CLibLinear>();

We initialize CECOCStrategy by specifying the random dense encoder and the decoder

encoder = ECOCRandomDenseEncoder()
decoder = ECOCHDDecoder()
rnd_dense_strategy = ECOCStrategy(encoder, decoder)
encoder = ECOCRandomDenseEncoder();
decoder = ECOCHDDecoder();
rnd_dense_strategy = ECOCStrategy(encoder, decoder);
ECOCRandomDenseEncoder encoder = new ECOCRandomDenseEncoder();
ECOCHDDecoder decoder = new ECOCHDDecoder();
ECOCStrategy rnd_dense_strategy = new ECOCStrategy(encoder, decoder);
encoder = Modshogun::ECOCRandomDenseEncoder.new 
decoder = Modshogun::ECOCHDDecoder.new 
rnd_dense_strategy = Modshogun::ECOCStrategy.new encoder, decoder
encoder <- ECOCRandomDenseEncoder()
decoder <- ECOCHDDecoder()
rnd_dense_strategy <- ECOCStrategy(encoder, decoder)
encoder = modshogun.ECOCRandomDenseEncoder()
decoder = modshogun.ECOCHDDecoder()
rnd_dense_strategy = modshogun.ECOCStrategy(encoder, decoder)
ECOCRandomDenseEncoder encoder = new ECOCRandomDenseEncoder();
ECOCHDDecoder decoder = new ECOCHDDecoder();
ECOCStrategy rnd_dense_strategy = new ECOCStrategy(encoder, decoder);
auto encoder = some<CECOCRandomDenseEncoder>();
auto decoder = some<CECOCHDDecoder>();
auto rnd_dense_strategy = some<CECOCStrategy>(encoder, decoder);

We create an instance of the CLinearMulticlassMachine classifier by passing it the ECOC strategies, together with the dataset, binary classifier and the labels.

mc_classifier = LinearMulticlassMachine(rnd_dense_strategy, features_train, classifier, labels_train)
mc_classifier = LinearMulticlassMachine(rnd_dense_strategy, features_train, classifier, labels_train);
LinearMulticlassMachine mc_classifier = new LinearMulticlassMachine(rnd_dense_strategy, features_train, classifier, labels_train);
mc_classifier = Modshogun::LinearMulticlassMachine.new rnd_dense_strategy, features_train, classifier, labels_train
mc_classifier <- LinearMulticlassMachine(rnd_dense_strategy, features_train, classifier, labels_train)
mc_classifier = modshogun.LinearMulticlassMachine(rnd_dense_strategy, features_train, classifier, labels_train)
LinearMulticlassMachine mc_classifier = new LinearMulticlassMachine(rnd_dense_strategy, features_train, classifier, labels_train);
auto mc_classifier = some<CLinearMulticlassMachine>(rnd_dense_strategy, features_train, classifier, labels_train);

Then we train and apply it to test data, which here gives CMulticlassLabels.

mc_classifier.train()
labels_predict = mc_classifier.apply_multiclass(features_test)
mc_classifier.train();
labels_predict = mc_classifier.apply_multiclass(features_test);
mc_classifier.train();
MulticlassLabels labels_predict = mc_classifier.apply_multiclass(features_test);
mc_classifier.train 
labels_predict = mc_classifier.apply_multiclass features_test
mc_classifier$train()
labels_predict <- mc_classifier$apply_multiclass(features_test)
mc_classifier:train()
labels_predict = mc_classifier:apply_multiclass(features_test)
mc_classifier.train();
MulticlassLabels labels_predict = mc_classifier.apply_multiclass(features_test);
mc_classifier->train();
auto labels_predict = mc_classifier->apply_multiclass(features_test);

We can evaluate test performance via e.g. CMulticlassAccuracy.

eval = MulticlassAccuracy()
accuracy = eval.evaluate(labels_predict, labels_test)
eval = MulticlassAccuracy();
accuracy = eval.evaluate(labels_predict, labels_test);
MulticlassAccuracy eval = new MulticlassAccuracy();
double accuracy = eval.evaluate(labels_predict, labels_test);
eval = Modshogun::MulticlassAccuracy.new 
accuracy = eval.evaluate labels_predict, labels_test
eval <- MulticlassAccuracy()
accuracy <- eval$evaluate(labels_predict, labels_test)
eval = modshogun.MulticlassAccuracy()
accuracy = eval:evaluate(labels_predict, labels_test)
MulticlassAccuracy eval = new MulticlassAccuracy();
double accuracy = eval.evaluate(labels_predict, labels_test);
auto eval = some<CMulticlassAccuracy>();
auto accuracy = eval->evaluate(labels_predict, labels_test);

References

[EPR10]S. Escalera, O. Pujol, and P. Radeva. Error-correcting ouput codes library. Journal of Machine Learning Research, 11:661–664, 2010.