

Table of Contents 

Improving accessibility to shogun 

Machine learning tasks


List of Ideas 

Improving accessibility to shogun


Machine learning tasksInterface and develop the general quadratic solver library libqpMentor: Vojtech FrancDifficulty: Medium to Difficult Requirements: Some background in optimization, programming in C++ libqp is a C library which implements algorithms for solving two instances of convex Quadratic Programming (QP) frequently used in Machine Learning (see this link for more details). libqp uses a variant of the sequential minimal optimization algorithm able to solve verylarge QP problems where the Hessian does not fit to computer memory. The task would be to make this library accessible from within the shogun framework by interfacing it via the modular interfaces and to extend libqp by implementing solvers for new QP problems and by integrating some existing solvers (like pr_loqo) already in shogun. Tasks (in parallel to interfacing libqp to shogun):
[1] V. Franc. Optimization Algorithms for Kernel Methods. PhD thesis, CMP, FEE CTU Prague. Defended in November 2005. [2] R.E. Fan, P.H. Chen, C.J. Lin. Working Set Selection Using Second Order Information for Training SVM. JMLR. vol 6. 2005. [3] V. Franc, M.Navara, V.Hlavac. Sequential CoordinateWise Algorithm for the Nonnegative Least Squares Problem. CAIP2005. [4] MOSEK. Implement algorithms for Blind Source Separation (BSS) and Independent Component Analysis (ICA) based on Approximate Joint Diagonalization (AJD) of matricesMentor: Andreas ZieheShogun comentor: Sergey Lisitsyn Difficulty: Medium Requirements: C++, numerical linear algebra, optimization (Matlab, R) This task is about implementing stateoftheart algorithms for Blind Source Separation (BSS) and Independent Component Analysis (ICA) to complement the SHOGUN toolbox. ICA is a powerful technique that can be used as a preprocessing, feature extraction and visualization method in many machine learning problems. As it is well known, ICA/BSS can be done via the approximate joint diagonalization (AJD) of matrices [1]. Thus we propose to implement efficient algorithms for AJD as a main building block for SHOGUN. Possible candidates are the algorithms JADE, FFDIAG [2], JADIAG, JEDI and UWEDGE [3]. Furthermore efficient methods to compute the matrices to be diagonalized (i.e. lagged correlation matrices, matrices from 4th order cumulant slices, ...) are needed as well. There exists a reference package containing some implementations (in R)[4]. This idea offers an opportunity to dive into the intersection of machine learning, statistics and digital signal processing and gives a chance to highly improve scientific programming skills. References: [1] Handbook of Blind Source Separation http://books.google.de/books?id=PTbj03bYH6kC [2] A. Ziehe, P. Laskov, G. Nolte and K.R. Mueller; A Fast Algorithm for Joint Diagonalization with Nonorthogonal Transformations and its Application to Blind Source Separation; Journal of Machine Learning Research (JMLR), vol 5, pages 777800, 2004 [3] Tichavsky, P. & Yeredor, A.; Fast Approximate Joint Diagonalization Incorporating Weight Matrices; IEEE Trans. Signal Process., 2009, 57, 878891 [4] JointDiag R package http://cran.rproject.org/web/packages/jointDiag Large Scale Learning & Dimension reduction using hashingMentor: Olivier Chapelle, Benoit RostykusShogun comentor: Soeren Sonnenburg Difficulty: Medium Requirements: C++, basic machine learning The socalled "hashing trick", made popular by the Vowpal Wabbit learning software, is a very convenient method to reduce the dimensionality of sparse high dimensional data derived from categorical data or bag of words representations. A naive way to implement it is to prehash all the tokens and their derived features and to use standard linear algorithms in the new representation. But it is often more efficient to compute the hashed features on the fly. For instance, in the case of quadratic features, it would require O(d^2) storage to explicitly represent an example with d token. But like in the case of the kernel trick, we do not need to explicitly compute that representation. The goal of this project is to implement this implicit representation in Shogun. As in Vowpal Wabbit, this can be applied to categorical features, quadratic features, ngrams, structured outputlearning, and potentially other uses cases. This project offers a great opportunity to learn essentials of large scale learning systems that are of high demand nowadays. Subtasks would be:
Large Scale Learning: loglinear time kernel expansion (aka Fast Food)Mentor: Quoc V. LeShogun comentor: Soeren Sonnenburg Difficulty: Medium Requirements: C++, machine learning Background: Large scale learning with kernels is hindered by the fact that kernel classifiers require computing kernel expansions of the form \( f(x)=\sum_{i=1}^N \alpha_i K(x,x_i) \). Obviously, the more nonzero coefficients \( \alpha_i \) there are the slower the kernel machine. Recently, progress has been made in drastically speeding up kernel machines by approximating the kernel feature space [1][2][3].
[1] Efficient Additive Kernels via Explicit Feature Maps (Vedaldi, 2011) [2] Random kitchen sinks https://research.microsoft.com/apps/video/dl.aspx?id=103390&l=i [3] Fast Food http://videolectures.net/nipsworkshops2012_smola_kernel Largescale learning of general structured output modelsMentor: Patrick PletscherShogun comentor: Soeren Sonnenburg Difficulty: Medium to Difficult Requirements: C++, Python (to produce usability examples and read existing code), machine learning theory (specially structured output learning) Desired: Optimization, knowledge of Shogun’s structured output framework. This project focuses on extending the structured output (SO) learning framework in Shogun. We will consider two aspects: First, we would like to support general structured output applications, such as backgroundforeground segmentation in computer vision or partofspeech tagging in natural language processing. Second, we would like to improve the Shogun SO learning framework to handle largescale structured data by means of efficient online solvers. These two aspects are described in more detail below. Parameter learning of general structured output models using structured SVMs. Using the current SO framework in Shogun, this entails extending the class StructuredModel and implementing the application dependent parts, namely the joint feature vector function, the MAP inference and the structured loss. Most of the work in this part is probably devoted to the (lossaugmented) inference problem. A couple of methods could be implemented for this, e.g. linear programming relaxations and/or graph cuts. In addition, in order to support general structured output domains, a new class of Features and StructuredLabels is likely to be needed. Largescale structured SVM learning. Shogun currently only supports learning the structured SVM using the BMRM and cutting planes algorithms. Both of these methods are batch solvers (i.e., one update step after going through the whole dataset once) and do not perform very efficiently for largescale applications. We would like to improve the speed of training structured SVMs by implementing two online learning approaches: standard stochastic gradient descent and the recent block coordinate FrankWolfe solver introduced in [LacosteJulien, Jaggi, Schmidt and Pletscher; 2013]. Furthermore, largescale applications are likely to substantially profit from the Coffin approach [Sonnenburg, Franc; 2010] which is already implemented in Shogun for training standard binary SVMs, hence we would like to extend Coffin to the structured SVM. While the programming difficulty for this project is not high, basic understanding of machine learning and optimization is mandatory. This renders the whole project difficult. Implement metric learning algorithms with applications to metagenomicsMentor: Georg ZellerShogun comentor: Sergey Lisitsyn Difficulty: Medium to Difficult Requirements: C++, optimization, machine learning The goal of this project would be to implement widely used distance metric learning methods in SHOGUN. Most importantly, the largemargin nearest neighbor (LMNN) approach by Kilian Weinberger and colleagues [1,2]. The idea is to learn a transformation of the Euclidean space such that the accuracy of the knearest neighbor (kNN) classifier is maximized. This leads to a semidefinite convex optimization problem. Although not as scalable as e.g. linear SVMs, the LMNN approach can handle multiclass problems with ~200 features and >50,000 training examples. Applications for these distance learning approaches could come from many domains, but metagenomics  a recently founded, fastpaced field in the interface between molecular biology, computational biology and microbial ecology  would be an ideal choice, because a typical data analysis heavily relies on distancebased clustering and lowrank projections such as PCA and PCoA. This idea offers an outstanding opportunity to help metagenomics researchers with providing implementation of LMNN algorithm to the community. The mentor of this project is an active researcher working in metagenomics which means the student can learn a lot on scientific programming and data analysis (and maybe even help with ongoing research tasks to give implementations a try). References: [1] Distance Metric Learning for Large Margin Nearest Neighbor Classification, by Weinberger & Saul, JMLR 2009 [2] Large Margin MultiTask Metric Learning by Parameswaran & Weinberger NIPS 2010 Implement robust classification methodsMentor: Cheng Soon OngShogun comentor: Sergey Lisitsyn Difficulty: Medium Requirements: Some background in optimization, programming in C++ There are several recent approaches to learning with uncertain data, including robust classification and regression. Instead of the standard supervised learning setting with one examplelabel pair, uncertain data has an uncertainty region around each example. This results in second order cone programming (SOCP) problems, which can be solved using standard solvers such as MOSEK or CPLEX. Tasks include:
[1] Shivaswamy, Bhattaracharya, Smola, "Second Order Cone Programming Approaches for Handling Missing and Uncertain Data", JMLR 7 (2006), 12831314 [2] Lanckriet, El Ghaoui, Bhattacharyya, Jordan, "A Robust Minimax Approach to Classification", JMLR 3 (2002) 555582 Implement multiple instance learning algorithmssMentor: Cheng Soon OngShogun comentor: Sergey Lisitsyn Difficulty: Medium Requirements: Some background in optimization, programming in C++ In many applications of supervised learning, the cost of obtaining ground truth labels is a significant bottleneck. This has led to research on weakly labeled data, among which the framework of multiple instance learning (MIL) has shown promising results. This project aims to extend shogun to enable MIL by implementing several algorithms. Tasks include:
[1] Andrews, Tsochantaridis, Hofmann, Support vector machines for multipleinstance learning. NIPS, 2002. [2] Gehler, Chapelle, Deterministic annealing for multipleinstance learning. AISTATS, 2007. [3] Krummenacher, Ong, Buhmann, Ellipsoidal Multiple Instance Learning, ICML 2013. Improve dimensionality reduction moduleMentor: Sergey LisitsynDifficulty: Medium Requirements: C++, Eigen3, numerical linear algebra, data structures Dimension reduction is the process of finding lowdimensional representations which is especially useful for visualization and compressing redundant data. It is worth to note that dimension reduction can be considered as a mature field as it has been under heavy development for decades already. Nevertheless, we believe machine learning community lacks flexible and efficient opensource implementations of developed algorithms. During the summer the student will be improving dimensionality reduction module of the Shogun toolbox and possible tasks for this idea that can be done by the student include:
Implement estimators of largescale sparse Gaussian densitiesMentor: Heiko Strathmann, Erlend Aune, Daniel SimpsonDifficulty: Difficult Requirements: C++, (sparse) numerical linear algebra (positive definiteness, trace, Cholesky decomposition, Eigenvalues, iterative methods for linear systems (conjugate gradients), preconditioning), Basic probability theory: (Gaussian distributions and manipulating them, expected value and variance), Graph colourings Useful: Eigen3, Bayesian statistics, MCMC knowledge (MetropolisHastings algorithm), LaTeX (for shoguntutorial documentation), MATLAB (to read existing code) Gaussian distributions are a key tool in a vast number of applications from computational statistics and machine learning. Applications include for example dimension reduction such as factor analysis, the nonparametric Gaussian processes for regression or classification, and timeseries models such as the Kalman filter (see [3]). We consider a subclass of Gaussian distributions whose covariance matrix is sparse and very large. Evaluating those distributions (in the logdomain) requires to compute the logdeterminant of the covariance matrix. This is usually done via taking a matrix factorisation such as the Cholesky and then using the diagonal entries. However, Cholesky factors of sparse matrices usually are not sparse themselves. From a certain size on (few hundred thousands to millions), this makes computing the logdeterminant infeasible due to computer memory constraints. This project aims at implementing statistical estimators for logdeterminants of positive definite matrices, whose empirical mean converges to the true value and which are feasible to compute on desktop computers. We plan to take an approach that is described in [1].
The project offers a great opportunity to get your hands dirty on a large number of topics used in computational statistics and many other contexts. Using unbiased estimators for likelihood functions is a very hot topic in statistics  people will be very keen in having this implemented for large Gaussians and it is likely that this code will be used. References: [1] Erlend Aune, Daniel Simpson and Jo Eidsvik, Parameter estimation in high dimensional Gaussian distributions, Preprint in Statistics, No. 5/2012, NTNU, 2012 [2] “Painless Conjugate Gradient” and related concepts: http://www.cs.cmu.edu/~quakepapers/painlessconjugategradient.pdf [3] wiki:Multivariate normal distribution [4] wiki:Conjugate gradient method [5] wiki:Preconditioning [6] Hale, N., Higham, N. J., & Trefethen, L. N. (2008). Computing \( A^{\alpha} \), \( \log(A) \), and related matrix functions by contour integrals. SIAM Journal on Numerical Analysis, 46(5), 2505–2523. Gaussian Processes for ClassificationMentor: Heiko Strathmann, Oliver StegleDifficulty: Medium Requirements: C++, Gaussian Process basics, Bayesian Inference Useful: GPMLtoolbox, Eigen3, LaTeX (for shoguntutorial documentation) In last year’s GSoC, Jacob Walker implemented a flexible framework for Gaussian process regression models. This project build upon this initial work and is about extending the existing infrastructure to allow classification with GPs. The aims are
This project offers the opportunity to get in touch with the concept of GPs from an implementation side of view. Apart from SVM, GPs are among the most widely used tools for classification. As the GPML toolbox coming with [1] has no implementation of multiclass GPs, you can expect that your code will be used by the community. The project is quite flexible in terms of how much can be done and offers a lot of room for sophisticated extensions while the basic tasks are not too hard. You can play a lot with these things (and we expect you to). Should be quite fun! References: [1] Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian Processes in Machine Learning. MIT Press. [2] Williams, C. K. I., & Barber, D. (1998). Bayesian classification with Gaussian processes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(12), 1342–1351. [3] Sparse Bayesian Learning and the Relevance Vector Machine Michael E. Tipping, JMLR 2001. http://jmlr.csail.mit.edu/papers/v1/tipping01a.html 
Jan. 26, 2015  SHOGUN 4.0.0  
Feb. 17, 2014  SHOGUN 3.2.0  
Jan. 6, 2014  SHOGUN 3.1.1  
Jan. 5, 2014  SHOGUN 3.1.0  
Oct. 28, 2013  SHOGUN 3.0.0  
March 17, 2013  SHOGUN 2.1.0  
Sept. 1, 2012  SHOGUN 2.0.0 