en/current/DeveloperTutorial_8mainpage_source.html

/*!

\page developer_tutorial Libshogun and Developer Tutorial


Shogun is split up into libshogun which contains all the machine learning

algorithms, libshogunui which contains a library for the 'static interfaces',

the static interfaces python, octave, matlab, r and the modular interfaces

python_modular, octave_modular and r_modular (all found in the src/

subdirectory with corresponding name). See src/INSTALL on how to install shogun.


In case one wants to extend shogun the best way is to start using its library.

This can be easily done as a number of examples in examples/libshogun document.


The simplest libshogun based program would be


\verbinclude basic_minimal.cpp


which could be compiled with g++ -lshogun minimal.cpp -o minimal and obviously

does nothing (apart from initializing and destroying a couple of global shogun

objects internally).


In case one wants to redirect shoguns output functions SG_DEBUG(), SG_INFO(),

SG_WARN(), SG_ERROR(), SG_PRINT() etc, one has to pass them to init_shogun() as

parameters like this


\verbatim

void print_message(FILE* target, const char* str)

{

    fprintf(target, "%s", str);

}


void print_warning(FILE* target, const char* str)

{

    fprintf(target, "%s", str);

}


void print_error(FILE* target, const char* str)

{

    fprintf(target, "%s", str);

}


init_shogun(&print_message, &print_warning,

            &print_error);

\endverbatim


To finally see some action one has to include the appropriate header files,

e.g. we create some features and a gaussian kernel


\verbinclude classifier_minimal_svm.cpp


Now you probably wonder why this example does not leak memory. First of all,

supplying pointers to arrays allocated with new[] will make shogun objects own

these objects and will make them take care of cleaning them up on object

destruction. Then, when creating shogun objects they keep a reference counter

internally. Whenever a shogun object is returned or supplied as an argument to

some function its reference counter is increased, for example in the example

above


\verbatim

CLibSVM* svm = new CLibSVM(10, kernel, labels);

\endverbatim


increases the reference count of kernel and labels. On destruction the

reference counter is decreased and the object is freed if the counter is <= 0.


It is therefore your duty to prevent objects from destruction if you keep a

handle to them globally <b>that you still intend to use later</b>. In the example

above accessing labels after the call to SG_UNREF(svm) will cause a

segmentation fault as the Label object was already destroyed in the SVM

destructor. You can do this by SG_REF(obj). To decrement the reference count of

an object, call SG_UNREF(obj) which will also automagically destroy it if the

counter is <= 0 and set obj=NULL only in this case.


Generally, all shogun C++ Objects are prefixed with C, e.g. CSVM and derived from

CSGObject. Since variables in the upper class hierarchy, need to be initialized

upon construction of the object, the constructor of base class needs to be

called in the constructor, e.g. CSVM calls CKernelMachine, CKernelMachine calls

CClassifier which finally calls CSGObject.


For example if you implement your own SVM called MySVM you would in the

constructor do


\verbatim

class MySVM : public CSVM

{

    MySVM( ) : CSVM()

    {


    }


    virtual ~MySVM()

    {


    }

};

\endverbatim


Also make sure that you define the destructor \b virtual.


We are now going to define our own kernel, a linear like kernel defined on

standard double precision floating point vectors. We define it as


\f$k({\bf x}, {\bf x'}) = \sum_{i=1}^D x_i \cdot x'_{D-i+1}\f$


where D is the dimensionality of the data.

To implement this kernel we need to derive a class say CReverseLinearKernel from

CSimpleKernel<float64_t> (for strings it would be CStringKernel, for sparse

features CSparseKernel).


Essentially we only need to overload the CKernel::compute() function with our

own implementation of compute. All the rest gets empty defaults. An example for

our compute() function could be


\verbatim

virtual float64_t compute(int32_t idx_a, int32_t idx_b)

{

    int32_t alen, blen;

    bool afree, bfree;


    float64_t* avec=

        ((CSimpleFeatures<float64_t>*) lhs)->get_feature_vector(idx_a, alen, afree);

    float64_t* bvec=

        ((CSimpleFeatures<float64_t>*) rhs)->get_feature_vector(idx_b, blen, bfree);


    ASSERT(alen==blen);


    float64_t result=0;

    for (int32_t i=0; i<alen; i++)

        result+=avec[i]*bvec[alen-i-1];


    ((CSimpleFeatures<float64_t>*) lhs)->free_feature_vector(avec, idx_a, afree);

    ((CSimpleFeatures<float64_t>*) rhs)->free_feature_vector(bvec, idx_b, bfree);


    return result;

}

\endverbatim


So for two indices idx_a (for vector a) and idx_b (for vector b) we obtain the

corresponding pointers to the feature vectors avec and bvec, do our two line

computation (for loop in the middle) and ``free'' the feature vectors again. It

should be noted that in most cases getting the feature vector is actually a

single memory access operation (and free_feature_vector is a nop in this case).

However, when preprocessor objects are attached to the feature object they

could potentially perform on-the-fly processing operations.


A complete, fully working example could look like this


\verbinclude kernel_revlin.cpp


As you notice only a few other functions are defined returning name of the

object, and object id and allow for loading/saving of kernel initialization

data. No magic really, the same holds when you want to incorporate a new

SVM (derive from CSVM or CLinearClassifier if it is a linear SVM) or create new

feature objects (derive from CFeatures or CSimpleFeatures, CStringFeatures or

CSparseFeatures). For the SVM you would only have to override the CSVM::train()

function, parameter settings like epsilon, C  and evaluating SVMs is done

naturally by the CSVM base class.


In case you would want to integrate this into shoguns

modular interfaces, all you have to do is to put this class in a header file

and to include the header file in the corresponding .i file (in this case

src/modular/Kernel.i). It is easiest to search for a similarly wrapped object

and just fill in the same three lines: in the %{ %} block (that is ignored by

swig - the program we use to generate the modular python/octave interface

wrappers)


\verbatim

%{

#include <shogun/kernel/ReverseLinearKernel.h>

%}

\endverbatim


then remove the C prefix (if you had one)


\verbatim

%rename(ReverseLinearKernel) CReverseLinearKernel;

\endverbatim


and finally tell swig to wrap all functions found in the header


\verbatim

%include <shogun/kernel/ReverseLinearKernel.h>

\endverbatim


In case you got your object working we will happily integrate it into shogun

provided you follow a number of basic coding conventions detailed in \subpage

devel (see FORMATTING for formatting instructions, MACROS on how to use and name

macros, TYPES on which types to use, FUNCTIONS on how functions should look like

and NAMING CONVENTIONS for the naming scheme. Note that in case you change the

API in a way that breaks ABI compatibility you need to increase the major number

of the libshogun soname (see \subpage soname ).


*/