Shogun - A Large Scale Machine Learning Toolbox
This is the official homepage of the SHOGUN machine learning toolbox.
|
The machine learning toolbox's focus is on large scale kernel methods and
especially on Support Vector Machines (SVM) . It provides a generic SVM
object interfacing to several different SVM implementations, among them the
state of the art OCAS,
LibSVM , SVMLight,
SVMLin and GPDT . Each of the SVMs can be
combined with a variety of kernels. The toolbox not only provides efficient
implementations of the most common kernels, like the Linear, Polynomial,
Gaussian and Sigmoid Kernel but also comes with a number of recent string
kernels as e.g. the Locality Improved , Fischer , TOP , Spectrum ,
Weighted Degree Kernel (with shifts) . For the latter the efficient
LINADD optimizations are implemented. Also SHOGUN offers the freedom of
working with custom pre-computed kernels. One of its key features is the
combined kernel which can be constructed by a weighted linear combination
of a number of sub-kernels, each of which not necessarily working on the same
domain. An optimal sub-kernel weighting can be learned using
Multiple Kernel Learning .
Currently SVM 2-class classification and regression problems can be dealt
with. However SHOGUN also implements a number of linear methods like Linear
Discriminant Analysis (LDA), Linear Programming Machine (LPM), (Kernel)
Perceptrons and features algorithms to train hidden markov models.
The input feature-objects can be dense, sparse or strings and
of type int/short/double/char and can be converted into different feature types.
Chains of preprocessors (e.g. substracting the mean) can be attached to
each feature object allowing for on-the-fly pre-processing.
SHOGUN is implemented in C++ and interfaces to Matlab(tm), R, Octave and Python and is proudly released as Machine Learning Open Source Software.
|
We have successfully used this toolbox to tackle the following sequence
analysis problems: Protein Super Family classification[6],
Splice Site Prediction , Interpreting the SVM Classifier ,
Splice Form Prediction , Alternative Splicing and Promotor
Prediction . Some of them come with no less than 10
million training examples, others with 7 billion test examples.
| Except for SVMLight
which is (C) Torsten Joachims and follows a different licensing scheme
(cf. LICENSE.SVMLight in the tar achive) SHOGUN is licensed under the
GPL version 3 or any later version (cf. LICENSE). |
 |
|
If you use SHOGUN in your research you are kindly asked to cite the following paper:
S.Sonnenburg, G.Raetsch, C.Schaefer and B.Schoelkopf, Large Scale Multiple Kernel Learning.
Journal of Machine Learning Research,7:1531-1565, July 2006, K.Bennett and E.P.-Hernandez Editors.
|
SHOGUN Version 0.7.3 (libshogun 3.0, libshogunui 1.1) (updated 02.05.2009)
Older Versions
|
This release contains several bugfixes:
Features:
- Improve libshogun/developer tutorial.
- Implement convenience function for parallel quicksort.
- Fasta/fastq file loading for StringFeatures.
Bugfixes:
- get_name function was undefined in Evaluation causing the PerformanceMeasures class to be defunct.
- Workaround bugs in the std template library for math functions.
- Compiles cleanly under OSX now, thanks to James Kyle.
Cleanup and API Changes:
- Make sure that all destructors are declared virtual.
|
|
We use Doxygen for both user and developer documentation which may be read online here.
Additionally many examples can be found in the [interface]/examples
directory in the source code (where interface is one of R, octave, matlab,
python, python-modular). Note that documentation for python-modular is most complete and also that python's help function will show the documentation when working interactively:
$ python
Python 2.4.4 (#2, Jan 3 2008, 13:36:28)
[GCC 4.2.3 20071123 (prerelease) (Debian 4.2.2-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from shogun.Classifier import SVM
>>> help(SVM)
class SVM(CSVM)
| Method resolution order:
| SVM
| CSVM
| CKernelMachine
| Classifier
| SGObject
| __builtin__.object
|
| Methods defined here:
|
| __init__(self, kernel, alphas, support_vectors, b)
[...]
|
Below we provide some of the examples that were used to carry out experiments for a number of publications. Note that updated versions of all of these can also be found in the source code.
|
Click on the corresponding link to see classification and regression examples for Matlab(tm), R, Octave or Python:
|
Below one finds some Bioinformatics examples (for octave and matlab) as presented at BOSC 2006:
|
Multiple Kernel Learning examples (JMLR 2006 paper "Large Scale Multiple Kernel Learning"):
|
|
|
|
|
|
|
In case you find bugs or have feature requests, file them using the SHOGUN-TRAC bug tracking system.
We are coordinating development (milestones, roadmap) using trac. Also if you would like to browse syntax hilighted source from svn, just have a look.
In case of comments, problems, questions, bug-reports etc. please use the mailing list (subscription required)
In case you need to directly get in touch with us, feel free to contact
|
Want to contribute ? We maintain SHOGUNs source code via SVN
- To browse the source code of the current and previous releases use
http://svn.tuebingen.mpg.de/shogun/releases/
- To access the source code via svn use
svn checkout http://svn.tuebingen.mpg.de:/shogun/releases shogun-releases
- To get access to the most up-to-date svn-trunk contact us for read/write access. Then use
svn checkout https://svn.tuebingen.mpg.de:/shogun/trunk shogun
|
The authors gratefully acknowledge the support of DFG grant MU 987/2-1 and the PASCAL Network of Excellence.
|