Gaussian Process Regression

A Gaussian Process is the extension of the Gaussian distribution to infinite dimensions. It is specified by a mean function \(m(\mathbf{x})\) and a covariance kernel \(k(\mathbf{x},\mathbf{x}')\) (where \(\mathbf{x}\in\mathcal{X}\) for some input domain \(\mathcal{X}\)). It defines a distribution over real valued functions \(f(\cdot)\). Any finite set of such function values \(\mathbf{f}\) is jointly Gaussian with mean specified by the mean function evaluated at the inputs \(\mathbf{x}_i\) and covariance matrix \(\mathbf{C}\) with entries \(\mathbf{C}_{ij}=k(\mathbf{x}_i,\mathbf{x}_j)\).

In our regression model, we take our target variable \(\mathbf{y}\) to be our (latent) function values \(\mathbf{f}\) plus Gaussian noise:

\[p(y_i|f_i)=\mathcal{N}(f_i,\sigma^2)\]

Given the training data, the prediction \(y^*\) for a new input point \(\mathbf{x}^*\) is distributed as:

\[p(y^*|\mathbf{x}^*, \mathbf{y}, \mathbf{X})=\int p(\mathbf{y}^*|f^*)p(f^*|\mathbf{x}^*, \mathbf{f})p(\mathbf{f}|\mathbf{y}, \mathbf{X})d\mathbf{f}df^*\]

This is known as the “predictive” distribution.

See [RW08] for a comprehensive treatment of Gaussian Processes (see Chapter 2 for regression).

Example

Imagine we have files with training and test data. We create CDenseFeatures (here 64 bit floats aka RealFeatures) and CRegressionLabels as:

features_train = RealFeatures(f_feats_train)
features_test = RealFeatures(f_feats_test)
labels_train = RegressionLabels(f_labels_train)
labels_test = RegressionLabels(f_labels_test)
features_train = RealFeatures(f_feats_train);
features_test = RealFeatures(f_feats_test);
labels_train = RegressionLabels(f_labels_train);
labels_test = RegressionLabels(f_labels_test);
RealFeatures features_train = new RealFeatures(f_feats_train);
RealFeatures features_test = new RealFeatures(f_feats_test);
RegressionLabels labels_train = new RegressionLabels(f_labels_train);
RegressionLabels labels_test = new RegressionLabels(f_labels_test);
features_train = Modshogun::RealFeatures.new f_feats_train
features_test = Modshogun::RealFeatures.new f_feats_test
labels_train = Modshogun::RegressionLabels.new f_labels_train
labels_test = Modshogun::RegressionLabels.new f_labels_test
features_train <- RealFeatures(f_feats_train)
features_test <- RealFeatures(f_feats_test)
labels_train <- RegressionLabels(f_labels_train)
labels_test <- RegressionLabels(f_labels_test)
features_train = modshogun.RealFeatures(f_feats_train)
features_test = modshogun.RealFeatures(f_feats_test)
labels_train = modshogun.RegressionLabels(f_labels_train)
labels_test = modshogun.RegressionLabels(f_labels_test)
RealFeatures features_train = new RealFeatures(f_feats_train);
RealFeatures features_test = new RealFeatures(f_feats_test);
RegressionLabels labels_train = new RegressionLabels(f_labels_train);
RegressionLabels labels_test = new RegressionLabels(f_labels_test);
auto features_train = some<CDenseFeatures<float64_t>>(f_feats_train);
auto features_test = some<CDenseFeatures<float64_t>>(f_feats_test);
auto labels_train = some<CRegressionLabels>(f_labels_train);
auto labels_test = some<CRegressionLabels>(f_labels_test);

To fit the input (training) data \(\mathbf{X}\), we have to choose an appropriate CMeanFunction and CKernel and instantiate them. Here we use a basic CZeroMean and a CGaussianKernel with chosen width parameter.

width = 1.0
kernel = GaussianKernel(features_train, features_train, width)
mean_function = ZeroMean()
width = 1.0;
kernel = GaussianKernel(features_train, features_train, width);
mean_function = ZeroMean();
double width = 1.0;
GaussianKernel kernel = new GaussianKernel(features_train, features_train, width);
ZeroMean mean_function = new ZeroMean();
width = 1.0
kernel = Modshogun::GaussianKernel.new features_train, features_train, width
mean_function = Modshogun::ZeroMean.new 
width <- 1.0
kernel <- GaussianKernel(features_train, features_train, width)
mean_function <- ZeroMean()
width = 1.0
kernel = modshogun.GaussianKernel(features_train, features_train, width)
mean_function = modshogun.ZeroMean()
double width = 1.0;
GaussianKernel kernel = new GaussianKernel(features_train, features_train, width);
ZeroMean mean_function = new ZeroMean();
auto width = 1.0;
auto kernel = some<CGaussianKernel>(features_train, features_train, width);
auto mean_function = some<CZeroMean>();

We need to specify the inference method to find the posterior distribution of the function values \(\mathbf{f}\). Here we choose to perform exact inference with an instance of CExactInferenceMethod and pass it the chosen kernel, the training features, the mean function, the labels and an instance of CGaussianLikelihood, to specify the distribution of the targets/labels as above. Finally we generate a CGaussianProcessRegression class to be trained.

gauss_likelihood = GaussianLikelihood()
inference_method = ExactInferenceMethod(kernel, features_train, mean_function, labels_train, gauss_likelihood)
gp_regression = GaussianProcessRegression(inference_method)
gauss_likelihood = GaussianLikelihood();
inference_method = ExactInferenceMethod(kernel, features_train, mean_function, labels_train, gauss_likelihood);
gp_regression = GaussianProcessRegression(inference_method);
GaussianLikelihood gauss_likelihood = new GaussianLikelihood();
ExactInferenceMethod inference_method = new ExactInferenceMethod(kernel, features_train, mean_function, labels_train, gauss_likelihood);
GaussianProcessRegression gp_regression = new GaussianProcessRegression(inference_method);
gauss_likelihood = Modshogun::GaussianLikelihood.new 
inference_method = Modshogun::ExactInferenceMethod.new kernel, features_train, mean_function, labels_train, gauss_likelihood
gp_regression = Modshogun::GaussianProcessRegression.new inference_method
gauss_likelihood <- GaussianLikelihood()
inference_method <- ExactInferenceMethod(kernel, features_train, mean_function, labels_train, gauss_likelihood)
gp_regression <- GaussianProcessRegression(inference_method)
gauss_likelihood = modshogun.GaussianLikelihood()
inference_method = modshogun.ExactInferenceMethod(kernel, features_train, mean_function, labels_train, gauss_likelihood)
gp_regression = modshogun.GaussianProcessRegression(inference_method)
GaussianLikelihood gauss_likelihood = new GaussianLikelihood();
ExactInferenceMethod inference_method = new ExactInferenceMethod(kernel, features_train, mean_function, labels_train, gauss_likelihood);
GaussianProcessRegression gp_regression = new GaussianProcessRegression(inference_method);
auto gauss_likelihood = some<CGaussianLikelihood>();
auto inference_method = some<CExactInferenceMethod>(kernel, features_train, mean_function, labels_train, gauss_likelihood);
auto gp_regression = some<CGaussianProcessRegression>(inference_method);

Then we can train the model and evaluate the predictive distribution. We get predicted CRegressionLabels.

gp_regression.train()
labels_predict = gp_regression.apply_regression(features_test)
gp_regression.train();
labels_predict = gp_regression.apply_regression(features_test);
gp_regression.train();
RegressionLabels labels_predict = gp_regression.apply_regression(features_test);
gp_regression.train 
labels_predict = gp_regression.apply_regression features_test
gp_regression$train()
labels_predict <- gp_regression$apply_regression(features_test)
gp_regression:train()
labels_predict = gp_regression:apply_regression(features_test)
gp_regression.train();
RegressionLabels labels_predict = gp_regression.apply_regression(features_test);
gp_regression->train();
auto labels_predict = gp_regression->apply_regression(features_test);

We can compute the predictive variances as

variances = gp_regression.get_variance_vector(features_test)
variances = gp_regression.get_variance_vector(features_test);
DoubleMatrix variances = gp_regression.get_variance_vector(features_test);
variances = gp_regression.get_variance_vector features_test
variances <- gp_regression$get_variance_vector(features_test)
variances = gp_regression:get_variance_vector(features_test)
double[] variances = gp_regression.get_variance_vector(features_test);
auto variances = gp_regression->get_variance_vector(features_test);

The prediction above is based on arbitrarily set hyperparameters \(\boldsymbol{\theta}\): kernel width \(\tau\), kernel scaling \(\gamma\) and observation noise \(\sigma^2\). We can also learn these parameters by optimizing the marginal likelihood \(p(\mathbf{y}|\mathbf{X}, \boldsymbol{\theta})\) w.r.t. \(\boldsymbol{\theta}\). To do this, we define a CGradientModelSelection, passing to it a CGradientEvaluation with its own CGradientCriterion, specifying the gradient scheme and direction. Then we can follow the gradient and apply the chosen \(\boldsymbol{\theta}\) back to the CGaussianProcessRegression instance.

grad_criterion = GradientCriterion()
grad = GradientEvaluation(gp_regression, features_train, labels_train, grad_criterion)
grad.set_function(inference_method)
grad_selection = GradientModelSelection(grad)
best_theta = grad_selection.select_model()
best_theta.apply_to_machine(gp_regression)
grad_criterion = GradientCriterion();
grad = GradientEvaluation(gp_regression, features_train, labels_train, grad_criterion);
grad.set_function(inference_method);
grad_selection = GradientModelSelection(grad);
best_theta = grad_selection.select_model();
best_theta.apply_to_machine(gp_regression);
GradientCriterion grad_criterion = new GradientCriterion();
GradientEvaluation grad = new GradientEvaluation(gp_regression, features_train, labels_train, grad_criterion);
grad.set_function(inference_method);
GradientModelSelection grad_selection = new GradientModelSelection(grad);
ParameterCombination best_theta = grad_selection.select_model();
best_theta.apply_to_machine(gp_regression);
grad_criterion = Modshogun::GradientCriterion.new 
grad = Modshogun::GradientEvaluation.new gp_regression, features_train, labels_train, grad_criterion
grad.set_function inference_method
grad_selection = Modshogun::GradientModelSelection.new grad
best_theta = grad_selection.select_model 
best_theta.apply_to_machine gp_regression
grad_criterion <- GradientCriterion()
grad <- GradientEvaluation(gp_regression, features_train, labels_train, grad_criterion)
grad$set_function(inference_method)
grad_selection <- GradientModelSelection(grad)
best_theta <- grad_selection$select_model()
best_theta$apply_to_machine(gp_regression)
grad_criterion = modshogun.GradientCriterion()
grad = modshogun.GradientEvaluation(gp_regression, features_train, labels_train, grad_criterion)
grad:set_function(inference_method)
grad_selection = modshogun.GradientModelSelection(grad)
best_theta = grad_selection:select_model()
best_theta:apply_to_machine(gp_regression)
GradientCriterion grad_criterion = new GradientCriterion();
GradientEvaluation grad = new GradientEvaluation(gp_regression, features_train, labels_train, grad_criterion);
grad.set_function(inference_method);
GradientModelSelection grad_selection = new GradientModelSelection(grad);
ParameterCombination best_theta = grad_selection.select_model();
best_theta.apply_to_machine(gp_regression);
auto grad_criterion = some<CGradientCriterion>();
auto grad = some<CGradientEvaluation>(gp_regression, features_train, labels_train, grad_criterion);
grad->set_function(inference_method);
auto grad_selection = some<CGradientModelSelection>(grad);
auto best_theta = grad_selection->select_model();
best_theta->apply_to_machine(gp_regression);

Finally, we evaluate the CMeanSquaredError and the (negative log) marginal likelihood for the optimized hyperparameters.

eval = MeanSquaredError()
mse = eval.evaluate(labels_predict, labels_test)
marg_ll = inference_method.get_negative_log_marginal_likelihood()
eval = MeanSquaredError();
mse = eval.evaluate(labels_predict, labels_test);
marg_ll = inference_method.get_negative_log_marginal_likelihood();
MeanSquaredError eval = new MeanSquaredError();
double mse = eval.evaluate(labels_predict, labels_test);
double marg_ll = inference_method.get_negative_log_marginal_likelihood();
eval = Modshogun::MeanSquaredError.new 
mse = eval.evaluate labels_predict, labels_test
marg_ll = inference_method.get_negative_log_marginal_likelihood 
eval <- MeanSquaredError()
mse <- eval$evaluate(labels_predict, labels_test)
marg_ll <- inference_method$get_negative_log_marginal_likelihood()
eval = modshogun.MeanSquaredError()
mse = eval:evaluate(labels_predict, labels_test)
marg_ll = inference_method:get_negative_log_marginal_likelihood()
MeanSquaredError eval = new MeanSquaredError();
double mse = eval.evaluate(labels_predict, labels_test);
double marg_ll = inference_method.get_negative_log_marginal_likelihood();
auto eval = some<CMeanSquaredError>();
auto mse = eval->evaluate(labels_predict, labels_test);
auto marg_ll = inference_method->get_negative_log_marginal_likelihood();

References

Wikipedia: Gaussian_process

[RW08]C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. The MIT Press, 2008.