======================= Gaussian Mixture Models ======================= A Gaussian mixture model is a probabilistic model that assumes that data are generated from a finite mixture of Gaussians with unknown parameters. The model likelihood can be written as: .. math:: p(x|\theta) = \sum_{i=1}^{K}{\pi_i \mathcal{N}(x|\mu_i, \Sigma_i)} where :math:`p(x|\theta)` is probability distribution given :math:`\theta:=\{\pi_i, \mu_i, \Sigma_i\}_{i=1}^K`, :math:`K` denotes number of mixture components, :math:`\pi_i` denotes weight for :math:`i`-th component, :math:`\mathcal{N}` denotes a multivariate normal distribution with mean vector :math:`\mu_i` and covariance matrix :math:`\Sigma_i`. The expectation maximization (EM) algorithm is used to learn parameters of the model, via finding a local maximum of a lower bound on the likelihood. See Chapter 20 in :cite:`barber2012bayesian` for a detailed introduction. ------- Example ------- We start by creating CDenseFeatures (here 64 bit floats aka RealFeatures) as .. sgexample:: gmm.sg:create_features We initialize :sgclass:`GMM`, passing the desired number of mixture components. .. sgexample:: gmm.sg:create_gmm_instance We provide training features to the :sgclass:`GMM` object, train it by using EM algorithm and sample data-points from the trained model. .. sgexample:: gmm.sg:train_sample We extract parameters like :math:`\pi`, :math:`\mu_i` and :math:`\Sigma_i` for any componenet from the trained model. .. sgexample:: gmm.sg:extract_params We obtain log likelihood of belonging to clusters and being generated by this model. .. sgexample:: gmm.sg:cluster_output We can also use Split-Merge Expectation-Maximization algorithm :cite:`ueda2000smem` for training. .. sgexample:: gmm.sg:training_smem ---------- References ---------- :wiki:`Mixture_model` :wiki:`Expectation–maximization_algorithm` .. bibliography:: ../../references.bib :filter: docname in docnames