SHOGUN: Tutorial for Static Interfaces

SHOGUN provides "static" interfaces to Matlab(tm), R, Python, Octave and provides a command line stand-a-lone executable. The idea behind the static interfaces is to provide a simple environment just enough to do simple experiments. For example, it will allow you to train and evaluate a classifier but not go beyond that. In case you are looking for basically unlimited extensibility (multiple methods like classifiers potentially sharing data and interacting) you might want to look at the Modular Interfaces instead.

In this tutorial we demonstrate how to use shogun to create a simple gaussian kernel based support vector machine classifier but first things first. Lets start up R, python, octave or matlab load the shogun environment.

Starting SHOGUN

To start SHOGUN in python, start python and type

  from sg import sg

For R issue (from within R)

  library(sg)

For octave and matlab just make sure sg is in the path (use addpath). For the cmdline interface just start the shogun executable

Now in all languages

  sg('help')

and

  help

in the cmdline interface will show the help screen. If not consult Installation on how to install shogun.

Creating an SVM classifier

The rest of this tutorial assumes that the cmdline shogun executable is used (but hints on how things work using other interfaces). The basic syntax is

  <command> <option1> <option2> ...

here options are separated by spaces. For example

set_kernel GAUSSIAN REAL 10 1.2

will create a gaussian kernel that operates on real-valued features uses a kernel cache of size 10 MB and kernel width of 1.2. In analogy the other the cmdline for the other interfaces (python,r,...) would look like

sg('set_kernel', 'GAUSSIAN', 'REAL', 10, 1.2)

Note that there is little difference to the other interfaces, basically only strings are marked as such and arguments comma separated.

We now use two random gaussians as inputs as train data:

set_features TRAIN ../data/fm_train_real.dat

(For other interfaces sth. like

sg('set_features', 'TRAIN', [ randn(2, 100)-1, randn(2,100)+1 ])

would work).

For training a supervised method like an SVM we need a labeling of the training data, which we set via

set_labels TRAIN ../data/label_train_twoclass.dat

(For other interfaces, e.g. matlab/octave sth. like

sg('set_labels', 'TRAIN', sign(randn(1, 100)))

would work)

Now we create an SVM and set the SVM-C parameter to some hopefully sane value (which in real applications needs tuning; like the kernel parameters (here kernel width)).

new_classifier LIBSVM
c 1

We then train our SVM:

train_classifier

We can now apply our classifier to unseen test data by loading some test data and classifying the examples:

set_features TEST ../data/fm_test_real.dat
out.txt = classify

In case we want to save the classifiers, such that we don't have to perform potentially time consuming training again we can save and load like this

save_classifier libsvm.model

load_classifier libsvm.model LIBSVM

Other interfaces (python,r...) could use the load/save functions but typically one manually obtains and restores the model parameters, like

[b,alphas]=sg('get_classifier')

sg('set_classifier', b, alphas)

in this case.

Finally, the complete example looks like this

% In this example a two-class support vector machine classifier is trained on a
% toy data set and the trained classifier is used to predict labels of test
% examples. As training algorithm LIBSVM is used with SVM regularization
% parameter C=1 and a Gaussian kernel of width 1.2 and 10MB of kernel cache and 
% the precision parameter epsilon=1e-5.
% 
% For more details on LIBSVM solver see http://www.csie.ntu.edu.tw/~cjlin/libsvm/ 

% LibSVM
print LibSVM

set_kernel GAUSSIAN REAL 10 1.2
set_features TRAIN ../data/fm_train_real.dat
set_labels TRAIN ../data/label_train_twoclass.dat
new_classifier LIBSVM
c 1

train_classifier
save_classifier libsvm.model

load_classifier libsvm.model LIBSVM
set_features TEST ../data/fm_test_real.dat
out.txt = classify
! rm out.txt
! rm libsvm.model

and can be found together with many other examples in examples/cmdline/classifier_libsvm.sg .

For users of other interfaces we show the translated example for completeness:

Static R Interface

# In this example a two-class support vector machine classifier is trained on a
# toy data set and the trained classifier is used to predict labels of test
# examples. As training algorithm LIBSVM is used with SVM regularization
# parameter C=1 and a Gaussian kernel of width 1.2 and 10MB of kernel cache and 
# the precision parameter epsilon=1e-5.
# 
# For more details on LIBSVM solver see http://www.csie.ntu.edu.tw/~cjlin/libsvm/ 

library("sg")

size_cache <- 10
C <- 10
epsilon <- 1e-5
use_bias <- TRUE

fm_train_real <- t(as.matrix(read.table('../data/fm_train_real.dat')))
fm_test_real <- t(as.matrix(read.table('../data/fm_test_real.dat')))
label_train_twoclass <- as.real(as.matrix(read.table('../data/label_train_twoclass.dat')))

# LibSVM
print('LibSVM')

width <- 2.1

dump <- sg('set_features', 'TRAIN', fm_train_real)
dump <- sg('set_kernel', 'GAUSSIAN', 'REAL', size_cache, width)

dump <- sg('set_labels', 'TRAIN', label_train_twoclass)
dump <- sg('new_classifier', 'LIBSVM')
dump <- sg('svm_epsilon', epsilon)
dump <- sg('c', C)
dump <- sg('svm_use_bias', use_bias)
dump <- sg('train_classifier')

dump <- sg('set_features', 'TEST', fm_test_real)
result <- sg('classify')

Static Matlab/Octave Interface

% In this example a two-class support vector machine classifier is trained on a
% toy data set and the trained classifier is used to predict labels of test
% examples. As training algorithm LIBSVM is used with SVM regularization
% parameter C=1 and a Gaussian kernel of width 1.2 and 10MB of kernel cache and 
% the precision parameter epsilon=1e-5.
% 
% For more details on LIBSVM solver see http://www.csie.ntu.edu.tw/~cjlin/libsvm/ 

% Explicit examples on how to use the different classifiers

size_cache=10;
C=1;
use_bias=false;
epsilon=1e-5;
width=2.1;

addpath('tools');
label_train_twoclass=load_matrix('../data/label_train_twoclass.dat');
fm_train_real=load_matrix('../data/fm_train_real.dat');
fm_test_real=load_matrix('../data/fm_test_real.dat');

% LibSVM
disp('LibSVM');

sg('set_kernel', 'GAUSSIAN', 'REAL', size_cache, width);
sg('set_features', 'TRAIN', fm_train_real);
sg('set_labels', 'TRAIN', label_train_twoclass);
sg('new_classifier', 'LIBSVM');
sg('svm_epsilon', epsilon);
sg('svm_use_bias', use_bias);
sg('c', C);

sg('train_classifier');

sg('set_features', 'TEST', fm_test_real);
result=sg('classify');

Static Python Interface

# In this example a two-class support vector machine classifier is trained on a
# toy data set and the trained classifier is used to predict labels of test
# examples. As training algorithm LIBSVM is used with SVM regularization
# parameter C=1 and a Gaussian kernel of width 1.2 and 10MB of kernel cache and 
# the precision parameter epsilon=1e-5.
# 
# For more details on LIBSVM solver see http://www.csie.ntu.edu.tw/~cjlin/libsvm/ 

from tools.load import LoadMatrix
from sg import sg
lm=LoadMatrix()

traindat=lm.load_numbers('../data/fm_train_real.dat')
testdat=lm.load_numbers('../data/fm_test_real.dat')
train_label=lm.load_labels('../data/label_train_twoclass.dat')
parameter_list=[[traindat,testdat, train_label,10,2.1,1.2,1e-5,False],
		[traindat,testdat,train_label,10,2.1,1.3,1e-4,False]]

def classifier_libsvm (fm_train_real=traindat,fm_test_real=testdat,
			label_train_twoclass=train_label,
			size_cache=10, width=2.1,C=1.2,
			epsilon=1e-5,use_bias=False):

	sg('set_features', 'TRAIN', fm_train_real)
	sg('set_kernel', 'GAUSSIAN', 'REAL', size_cache, width)
	sg('set_labels', 'TRAIN', label_train_twoclass)
	sg('new_classifier', 'LIBSVM')
	sg('svm_epsilon', epsilon)
	sg('c', C)
	sg('svm_use_bias', use_bias)
	sg('train_classifier')

	sg('set_features', 'TEST', fm_test_real)
	result=sg('classify')
	kernel_matrix = sg('get_kernel_matrix', 'TEST')
	return result, kernel_matrix

if __name__=='__main__':
	print 'LibSVM'
	classifier_libsvm(*parameter_list[0])