As mentioned before SHOGUN interfaces to several programming languages and toolkits such as Matlab(tm), R, Python, Octave. The following sections shall give you an overview over the static interface commands of SHOGUN. For the static interfaces we tried to preserve the syntax of the commands in a consistent manner through all the different languages. However as in some cases this was not possible and we document the subtle differences of syntax and semantic in the respective toolkit. Instead of reading through all this, we suggest to have a look at the large number of examples available in the **examples** / interface directory. For example examples/R or examples/python etc.

**Overview of Static Interfaces & Testing the Installation**

**Interface Commands**

**Command Reference**

Since octave is nowadays up to par with matlab a single documentation for both interfaces is sufficient and will be based on octave (matlab can be used synonymously).

To start SHOGUN in octave, start octave and check if it is correctly installed by by typing ( let ">" be the octave prompt )

sg('help')

inside of octave. This should show you some help text.

To start SHOGUN in python, start python and check if it is correctly installed by by typing ( let ">" be the python prompt )

from sg import sg sg('help')

inside of python. This should show you some help text.

To fire up SHOGUN in R make sure that you have SHOGUN correctly installed in R. You can check this by typing ( let ">" be the R prompt ):

> library()

inside of R, this command should list all R packages that have been installed on your system. You should have an entry like:

sg The SHOGUN Machine Learning Toolbox

After you made sure that SHOGUN is installed correctly you can start it via:

> library(sg)

you will see some informations of the SHOGUN core (compile options etc). After this command R and SHOGUN are ready to receive your commands.

In general all commands in SHOGUN are issued using the function sg(...). To invoke the SHOGUN command help one types:

> sg('help')

and then a help text appears giving a short description of all commands.

These functions transfer data from the interface to shogun and back. Suppose you have a matlab matrix or R matrix "features" which contains your training data and you want to register this data, you simply type:

Transfer the features to shogun

**set_features**sg('set_features', 'TRAIN|TEST', features[, DNABINFILE|<ALPHABET>])

**add_features**sg('add_features', 'TRAIN|TEST', features[, DNABINFILE|<ALPHABET>])

Features can be char/byte/word/int/real valued matrices, real values sparse matrices, or strings (lists or cell arrays of strings). When dealing with strings an alphabet name has to be specified (DNA, RAW, ...). Use 'TRAIN' to tell SHOGUN that this is the data you want to train your classifier and TEST for the test data.

In contrast to **set_features**, **add_features** will create a combined feature object and append the features to it. This is useful when dealing with a set of different features (real valued and strings) and multiple kernels.

In case a single string was set using **set_features**, it can be "multiplexed" by sliding a window over it using

**from_position_list**orsg('from_position_list', 'TRAIN|TEST', winsize, shift[, skip])

**obtain_from_sliding_window**sg('obtain_from_sliding_window', winsize, skip)

Deletes the features which we assigned before in the actual SHOGUN session.

- clean_features
sg('clean_features')

Obtain the Features from shogun

**get_features**[features]=sg('get_features', 'TRAIN|TEST')

One proceeds similar when assigning labels to the training data and obtaining labels from shogun: The commands

**set_labels**sg('set_labels', 'TRAIN', trainlab)

**get_labels**[labels]=sg('get_labels', 'TRAIN|TEST')

tell SHOGUN that the labels of the assigned training data reside in trainlab, respectively return the current labels (note that currently all data is **copied** into SHOGUN, so modifications to trainlab are local within the interface).

Kernel and DistanceMatrix specific commands, used to create, obtain and setting the kernel matrix.

Creating a kernel in shogun

**set_kernel**sg('set_kernel', 'KERNELNAME', 'FEATURETYPE', CACHESIZE, PARAMETERS)

**add_kernel**sg('add_kernel', WEIGHT, 'KERNELNAME', 'FEATURETYPE', CACHESIZE, PARAMETERS)

Here KERNELNAME is the name of the kernel one wishes to use, FEATURETYPE the type of features (e.g. REAL for standard realvalued feature vectors), CACHESIZE the size of the kernel cache in megabytes and PARAMETERS kernel specific additional parameters.

The following kernels are implemented in SHOGUN:

- AUC
- Chi2
- Spectrum
- Const Kernel
- User defined CustomKernel
- Diagonal Kernel
- Kernel from Distance
- Fixed Degree StringKernel
- Gaussian

To work with a gaussian kernel on real values one issues:

sg('set_kernel', 'GAUSSIAN', 'TYPE', CACHESIZE, SIGMA)

For example:

sg('set_kernel', 'GAUSSIAN', 'REAL', 40, 1)

creates a gaussian kernel on real values with a cache size of 40MB and a sigma value of one. Available types for the gaussian kernel: REAL, SPARSEREAL.

- Gaussian Shift Kernel
- Histogram Kernel
- Linear

A linear kernel is created via:

sg('set_kernel', 'LINEAR', 'TYPE', CACHESIZE)

For example:

sg('add_kernel', 1.0, 'LINEAR', 'REAL', 50')

creates a linear kernel of cache size 50 for real datavalues, with weight 1.0.

Available types for the linear kernel: BYTE, WORD CHAR, REAL, SPARSEREAL.

- Local Alignment StringKernel
- Locality Improved StringKernel
- Polynomial Kernel

A polynomial kernel is created via:

sg('set_kernel', 'POLY', 'TYPE', CACHESIZE, DEGREE, INHOMOGENE, NORMALIZE)

For example:

sg('add_kernel', 0.1, 'POLY', 'REAL', 50, 3, 0)

adds a polynomial kernel. Available types for the polynomial kernel: REAL, CHAR, SPARSEREAL.

- Salzberg Kernel
- Sigmoid Kernel To work with a sigmoid kernel on real values one issues:

sg('set_kernel', 'SIGMOID', 'TYPE', CACHESIZE, GAMMA, COEFF)

For example:

sg('set_kernel', 'SIGMOID', 'REAL', 40, 0.1, 0.1)

creates a sigmoid kernel on real values with a cache size of 40MB, a gamma value of 0.1 and a coefficient of 0.1. Available types for the gaussian kernel: REAL.

- Weighted Spectrum Kernel
- Weighted Degree Kernels
- Match Kernel
- Custom Kernel

Assign a user defined custom kernel, fo which only the upper triangle may be given (DIAG) or the FULL matrix (FULL), or the full matrix which is then internally stored as a upper triangle (FULL2DIAG).

**set_custom_kernel**sg('set_custom_kernel', kernelmatrix, 'DIAG|FULL|FULL2DIAG')

The purpose of the get_kernel_matrix and get_distance_matrix commands is to return a kernel or distance matrix representing the kernel/distance matrix for the actual problem.

**get_distance_matrix**[D]=sg('get_distance_matrix', 'TRAIN|TEST')

**get_kernel_matrix**[K]=sg('get_kernel_matrix', 'TRAIN|TEST')

km refers to a matrix object.

- new_classifier Creates a new classifier (e.g. SVM instance).
- train_classifier Starts the training of the SVM on the assigned features and kernels.

The get_svm command returns some properties of an SVM such as the Langrange multipliers alpha, the bias b and the index of the support vectors SV (zero based).

**get_classifier**[bias, alphas]=sg('get_svm')

**set_classifier**sg('set_classifier', bias, alphas)

This commands returns a list of arguments. **set_classifier** may be later on used (after creating an SVM classifier) to set alphas and bias again.

The result of the classification of the test sample is obtained via:

**classify**[result]=sg('classify')

**classify_example**where result is a vector containing the classification result for each datapoint and[result]=sg('classify_example', feature_vector_index)

**classify_example**only obtains the output for a single example (index is zero based like in python. note that octave, matlab, R are 1 based).

- get_hmm
- set_hmm
- hmm_classify
- hmm_classify_example
- hmm_likelihood
- get_viterbi_path

- compute_poim_wd
- get_SPEC_consensus
- get_SPEC_scoring
- get_WD_consensus
- get_WD_scoring

Miscellaneous functions.

Returns the svn version number

**help**sg('get_version')

Gives you a help text.

**help**sg('help')

**help**sg('help', 'CMD')

Sets a debugging log level - useful to trace errors.

- loglevel LEVEL can be one of DEBUG, WARN, ERROR
sg('loglevel', 'LEVEL')

- ALL: very verbose logging output (useful only for hunting memory leaks)
- DEBUG: verbose logging output (useful for debugging).
- WARN: less logging output (useful for error search).
- ERROR: only logging output on critical errors.

For example

> sg('loglevel', 'ALL')

gives you a list of instructions.

Let's get started, equipped with the above information on the basic SHOGUN commands you are now able to create your own SHOGUN applications.

Let us discuss an example:

- registers the training sample which reside in traindat.
sg('set_features', 'TRAIN', traindat)

- registers the training labels.
sg('set_labels', 'TRAIN', trainlab)

- creates a new gaussian kernel for reals with cache size 100Mb and width = 1.
sg('set_kernel', 'GAUSSIAN', 'REAL', 100, 1.0)

- creates a new SVM object inside the SHOGUN core.
sg('new_classifier', 'SVMLIGHT')

- sets the C value of the new SVM to 20.0.
sg('c', 20.0)

- attaches the data to the kernel and does some initialization then starts the training on the sample.
sg('train_classifier')

- registers the test sample
sg('set_features', 'TEST', testdat)

- attaches the data to the kernel and classifies. Then gives you the classification result as a vector.
out=sg('classify')

SHOGUN Machine Learning Toolbox - Documentation