$$ P\left( Y=k | X = x \right) = \frac{P(X=x|Y=k)P(Y=k)}{P(X=x)} $$

The prediction is then made by

$$ y = \operatorname*{argmax}_{k\in\{1,\ldots,K\}}\; P(Y=k|X=x) $$

Since $P(X=x)$ is a constant factor for all $P(Y=k|X=x)$, $k=1,\ldots,K$, there is no need to compute it.

In `SHOGUN`

, CGaussianNaiveBayes implements the Naive Bayes
algorithm. It is prefixed with "Gaussian" because the probability model for
$P(X=x|Y=k)$ for each $k$ is taken to be a multi-variate Gaussian distribution.
Furthermore, each dimension of the feature vector $X$ is assumed to be
independent. The *Naive* independence assumption enables us the learn the
model by estimating the parameters for each feature dimension independently,
thus the whole learning algorithm runs very quickly. And this is also the
reason for its name. However, this assumption can be very restrictive. In
this demo, we show a simple 2D example. There are 3 linearly
separable classes. The scattered points are training samples with colors
indicating their labels. The filled area indicate the hypothesis learned by
the CGaussianNaiveBayes. The training samples are actually
generated from three Gaussian distributions. But since the covariance for those
Gaussian distributions are not diagonal (i.e. there are *rotations*), the GNB
algorithm cannot handle them properly.

We first init the models for generating samples for this demo:

In []:

```
%matplotlib inline
import numpy as np
import pylab as pl
np.random.seed(0)
n_train = 300
models = [{'mu': [8, 0], 'sigma':
np.array([[np.cos(-np.pi/4),-np.sin(-np.pi/4)],
[np.sin(-np.pi/4), np.cos(-np.pi/4)]]).dot(np.diag([1,4]))},
{'mu': [0, 0], 'sigma':
np.array([[np.cos(-np.pi/4),-np.sin(-np.pi/4)],
[np.sin(-np.pi/4), np.cos(-np.pi/4)]]).dot(np.diag([1,4]))},
{'mu': [-8,0], 'sigma':
np.array([[np.cos(-np.pi/4),-np.sin(-np.pi/4)],
[np.sin(-np.pi/4), np.cos(-np.pi/4)]]).dot(np.diag([1,4]))}]
```

A helper function is defined to generate samples:

In []:

```
def gen_samples(n_samples):
X_all = np.zeros((2, 0))
Y_all = np.zeros(0)
for i, model in enumerate(models):
Y = np.zeros(n_samples) + i+1
X = np.array(model['sigma']).dot(np.random.randn(2, n_samples)) + np.array(model['mu']).reshape((2,1))
X_all = np.hstack((X_all, X))
Y_all = np.hstack((Y_all, Y))
return (X_all, Y_all)
```

Then we train the GNB model with `SHOGUN`

:

In []:

```
from modshogun import GaussianNaiveBayes
from modshogun import RealFeatures
from modshogun import MulticlassLabels
X_train, Y_train = gen_samples(n_train)
machine = GaussianNaiveBayes()
machine.set_features(RealFeatures(X_train))
machine.set_labels(MulticlassLabels(Y_train))
machine.train()
```

Out[]:

Run classification over the whole area to generate color regions:

In []:

```
delta = 0.1
x = np.arange(-20, 20, delta)
y = np.arange(-20, 20, delta)
X,Y = np.meshgrid(x,y)
Z = machine.apply_multiclass(RealFeatures(np.vstack((X.flatten(), Y.flatten())))).get_labels()
```

Plot figure:

In []:

```
pl.figure(figsize=(8,5))
pl.contourf(X, Y, Z.reshape(X.shape), np.arange(0, len(models)+1))
pl.scatter(X_train[0,:],X_train[1,:], c=Y_train)
pl.axis('off')
pl.tight_layout()
```

This algorithm is closely related to the *Gaussian Mixture Model* (GMM) learning
algorithm. However, while GMM is an unsupervised learning algorithm, Gaussian
Naive Bayes is supervised learning. It uses the training labels to directly
estimate the Gaussian parameters for each class, thus avoids the iterative
*Expectation Maximization* procedures in GMM.

The merit of GNB is that both training and predicting are very fast, and it has no hyper-parameters.

Although named logistic *regression*, it is actually a classification
algorithm. Similar to *Naive Bayes*, logistic regression computes the
posterior $P(Y=k|X=x)$ and makes prediction by

$$ y = \operatorname*{argmax}_{k\in\{1,\ldots,K\}}\; P(Y=k|X=x) $$

However, Naive Bayes is a *generative model*, in which the distribution of
the input variable $X$ is also modeled (by a Gaussian distribution in this
case). But logistic regression is a *discriminative model*, which doesn't
care about the distribution of $X$, and models the posterior directly.
Actually, the two algorithms are a *generative-discriminative pair*.

To be specific, logistic regression uses *linear* functions in $X$ to model the
posterior probabilities:

\begin{eqnarray} \log\frac{P(Y=1|X=x)}{P(Y=K|X=x)} &=& \beta_{10} + \beta_1^Tx \\\\ \log\frac{P(Y=2|X=x)}{P(Y=K|X=x)} &=& \beta_{20} + \beta_2^Tx \\\\ &\vdots& \nonumber\\\\ \log\frac{P(Y=K-1|X=x)}{P(Y=K|X=x)} &=& \beta_{(K-1)0} + \beta_{K-1}^Tx \end{eqnarray}

The training of a logistic regression model is carried out via *maximum
likelihood estimation* of the parameters
$\boldsymbol\beta = \{\beta_{10},\beta_1^T,\ldots,\beta_{(K-1)0},\beta_{K-1}^T\}$. There is no
closed form solution for the estimated parameters.

There is no independent implementation of logistic regression in `SHOGUN`

,
but the `CLibLinear`

becomes a logistic regression model when
constructed with the argument `L2R_LR`

. This model also include a
regularization term of the $\ell_2$-norm of $\boldsymbol\beta$. If sparsity in
$\boldsymbol\beta$ is needed, one can also use `L1R_LR`

, which replaces
the $\ell_2$-norm regularizer with a $\ell_1$-norm regularizer.

Unfortunately, while the original LibLinear implementation of Logistic Regression support multiclass case, due to interface incompatability, one cannot use LibLinear as a multiclass-machine in `SHOGUN`

directly so far. An easy work-around is to use multiclass-to-binary reduction instead. Please see the Multiclass Reduction tutorial for details.