SHOGUN  4.1.0
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Modules Pages
DeveloperTutorial.mainpage
Go to the documentation of this file.
1 /*!
2 \page developer_tutorial Libshogun and Developer Tutorial
3 
4 Shogun is split up into libshogun which contains all the machine learning
5 algorithms, libshogunui which contains a library for the 'static interfaces',
6 the static interfaces python, octave, matlab, r and the modular interfaces
7 python_modular, octave_modular and r_modular (all found in the src/
8 subdirectory with corresponding name). See src/INSTALL on how to install shogun.
9 
10 In case one wants to extend shogun the best way is to start using its library.
11 This can be easily done as a number of examples in examples/libshogun document.
12 
13 The simplest libshogun based program would be
14 
15 \verbinclude basic_minimal.cpp
16 
17 which could be compiled with g++ -lshogun minimal.cpp -o minimal and obviously
18 does nothing (apart from initializing and destroying a couple of global shogun
19 objects internally).
20 
21 In case one wants to redirect shoguns output functions SG_DEBUG(), SG_INFO(),
22 SG_WARN(), SG_ERROR(), SG_PRINT() etc, one has to pass them to init_shogun() as
23 parameters like this
24 
25 \verbatim
26 void print_message(FILE* target, const char* str)
27 {
28  fprintf(target, "%s", str);
29 }
30 
31 void print_warning(FILE* target, const char* str)
32 {
33  fprintf(target, "%s", str);
34 }
35 
36 void print_error(FILE* target, const char* str)
37 {
38  fprintf(target, "%s", str);
39 }
40 
41 init_shogun(&print_message, &print_warning,
42  &print_error);
43 \endverbatim
44 
45 To finally see some action one has to include the appropriate header files,
46 e.g. we create some features and a gaussian kernel
47 
48 \verbinclude classifier_minimal_svm.cpp
49 
50 Now you probably wonder why this example does not leak memory. First of all,
51 supplying pointers to arrays allocated with new[] will make shogun objects own
52 these objects and will make them take care of cleaning them up on object
53 destruction. Then, when creating shogun objects they keep a reference counter
54 internally. Whenever a shogun object is returned or supplied as an argument to
55 some function its reference counter is increased, for example in the example
56 above
57 
58 \verbatim
59 CLibSVM* svm = new CLibSVM(10, kernel, labels);
60 \endverbatim
61 
62 increases the reference count of kernel and labels. On destruction the
63 reference counter is decreased and the object is freed if the counter is <= 0.
64 
65 It is therefore your duty to prevent objects from destruction if you keep a
66 handle to them globally <b>that you still intend to use later</b>. In the example
67 above accessing labels after the call to SG_UNREF(svm) will cause a
68 segmentation fault as the Label object was already destroyed in the SVM
69 destructor. You can do this by SG_REF(obj). To decrement the reference count of
70 an object, call SG_UNREF(obj) which will also automagically destroy it if the
71 counter is <= 0 and set obj=NULL only in this case.
72 
73 
74 Generally, all shogun C++ Objects are prefixed with C, e.g. CSVM and derived from
75 CSGObject. Since variables in the upper class hierarchy, need to be initialized
76 upon construction of the object, the constructor of base class needs to be
77 called in the constructor, e.g. CSVM calls CKernelMachine, CKernelMachine calls
78 CClassifier which finally calls CSGObject.
79 
80 For example if you implement your own SVM called MySVM you would in the
81 constructor do
82 
83 \verbatim
84 class MySVM : public CSVM
85 {
86  MySVM( ) : CSVM()
87  {
88 
89  }
90 
91  virtual ~MySVM()
92  {
93 
94  }
95 };
96 \endverbatim
97 
98 Also make sure that you define the destructor \b virtual.
99 
100 We are now going to define our own kernel, a linear like kernel defined on
101 standard double precision floating point vectors. We define it as
102 
103 \f$k({\bf x}, {\bf x'}) = \sum_{i=1}^D x_i \cdot x'_{D-i+1}\f$
104 
105 where D is the dimensionality of the data.
106 To implement this kernel we need to derive a class say CReverseLinearKernel from
107 CSimpleKernel<float64_t> (for strings it would be CStringKernel, for sparse
108 features CSparseKernel).
109 
110 Essentially we only need to overload the CKernel::compute() function with our
111 own implementation of compute. All the rest gets empty defaults. An example for
112 our compute() function could be
113 
114 
115 \verbatim
116 virtual float64_t compute(int32_t idx_a, int32_t idx_b)
117 {
118  int32_t alen, blen;
119  bool afree, bfree;
120 
121  float64_t* avec=
122  ((CSimpleFeatures<float64_t>*) lhs)->get_feature_vector(idx_a, alen, afree);
123  float64_t* bvec=
124  ((CSimpleFeatures<float64_t>*) rhs)->get_feature_vector(idx_b, blen, bfree);
125 
126  ASSERT(alen==blen);
127 
128  float64_t result=0;
129  for (int32_t i=0; i<alen; i++)
130  result+=avec[i]*bvec[alen-i-1];
131 
132  ((CSimpleFeatures<float64_t>*) lhs)->free_feature_vector(avec, idx_a, afree);
133  ((CSimpleFeatures<float64_t>*) rhs)->free_feature_vector(bvec, idx_b, bfree);
134 
135  return result;
136 }
137 \endverbatim
138 
139 So for two indices idx_a (for vector a) and idx_b (for vector b) we obtain the
140 corresponding pointers to the feature vectors avec and bvec, do our two line
141 computation (for loop in the middle) and ``free'' the feature vectors again. It
142 should be noted that in most cases getting the feature vector is actually a
143 single memory access operation (and free_feature_vector is a nop in this case).
144 However, when preprocessor objects are attached to the feature object they
145 could potentially perform on-the-fly processing operations.
146 
147 A complete, fully working example could look like this
148 
149 \verbinclude kernel_revlin.cpp
150 
151 As you notice only a few other functions are defined returning name of the
152 object, and object id and allow for loading/saving of kernel initialization
153 data. No magic really, the same holds when you want to incorporate a new
154 SVM (derive from CSVM or CLinearClassifier if it is a linear SVM) or create new
155 feature objects (derive from CFeatures or CSimpleFeatures, CStringFeatures or
156 CSparseFeatures). For the SVM you would only have to override the CSVM::train()
157 function, parameter settings like epsilon, C and evaluating SVMs is done
158 naturally by the CSVM base class.
159 
160 In case you would want to integrate this into shoguns
161 modular interfaces, all you have to do is to put this class in a header file
162 and to include the header file in the corresponding .i file (in this case
163 src/modular/Kernel.i). It is easiest to search for a similarly wrapped object
164 and just fill in the same three lines: in the %{ %} block (that is ignored by
165 swig - the program we use to generate the modular python/octave interface
166 wrappers)
167 
168 \verbatim
169 %{
170 #include <shogun/kernel/ReverseLinearKernel.h>
171 %}
172 \endverbatim
173 
174 then remove the C prefix (if you had one)
175 
176 \verbatim
177 %rename(ReverseLinearKernel) CReverseLinearKernel;
178 \endverbatim
179 
180 and finally tell swig to wrap all functions found in the header
181 
182 \verbatim
183 %include <shogun/kernel/ReverseLinearKernel.h>
184 \endverbatim
185 
186 In case you got your object working we will happily integrate it into shogun
187 provided you follow a number of basic coding conventions detailed in \subpage
188 devel (see FORMATTING for formatting instructions, MACROS on how to use and name
189 macros, TYPES on which types to use, FUNCTIONS on how functions should look like
190 and NAMING CONVENTIONS for the naming scheme. Note that in case you change the
191 API in a way that breaks ABI compatibility you need to increase the major number
192 of the libshogun soname (see \subpage soname ).
193 
194 */

SHOGUN Machine Learning Toolbox - Documentation