Gaussian Process Regression

In this notebook we will go over a basic example of how to use Squidward for Gaussian process (GP) regression. This is not a tutorial on how GP regression works or should be done; it is merely an example of how to use Squidward for a simple regression problem.

We’ll begin by importing the packages needed to go through these examples!

[1]:
# model with Squidward
from squidward.kernels import distance, kernel_base
from squidward import gpr, validation

# useful visualization functions
import gp_viz

# generate example data
import numpy as np

# plot example data
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings("ignore")

1D Regression Example

To demonstrate regression with squidward we will use a very simple one dimensional regression example. Here we generate a simple 1D train set, fit a GP to that data, and display the mean and variance of our predictions in an easy to interpret plot.

[2]:
# generate noisy samples for dataset
samples = 100

# train data
x_train = np.random.uniform(-15,15,samples)
noise = np.random.normal(0,350,samples)
y_train = (1-x_train)**3-(1-x_train)**2+100+noise

# test data
x_test = np.random.uniform(-15,15,samples)
noise = np.random.normal(0,350,samples)
y_test = (1-x_test)**3-(1-x_test)**2+100+noise

# generate noiseless data to plot true mean
x_true = np.linspace(-15,15,1000)
y_true = (1-x_true)**3-(1-x_true)**2+100
[3]:
# plot example dataset
plt.figure(figsize=(20,5))
plt.title('1D GP Regression Example Dataset')
plt.scatter(x_train,y_train,label='Train Data', c='c')
plt.scatter(x_test,y_test,label='Test Data', c='r')
plt.plot(x_true,y_true,label='True Mean of Data Generating Distribution', c='k')
plt.legend()
plt.show()
../../_images/markdown_examples_Regression_4_0.png
[4]:
# define the distance function used by the kernel
# for this example we use the radial basis function
# you can use one of the default distance functions
# supported by squidward (like RBF) or supply your own
# distance function (as long as it results in a positive
# semi-definite kernel)
d = distance.RBF(5.0,10000.0**2)
[5]:
# the kernel base class takes the distance measure of choice
# as well as the method for evaluating the kernel (the default
# is k1 which is analogous to the scipy.distance.cdit fuction v1.2.0)
kernel = kernel_base.Kernel(d, 'k1')
[7]:
# the model is instantiated with the kernel
# object as well as a likelihood variance (equivalent
# of a white noise kernel to model data noise)
# there is also a variety of choices for matrix
# inversion methods that trade of numeric stability and speed
model = gpr.GaussianProcessInversion(kernel=kernel, var_l=1050**2, inv_method='solve', show_warnings=False)
[8]:
# generate data to plot prior of model
x = np.linspace(-15,15,100)

# pull parameters of the prior distribution
mean, var = model.prior_predict(x)

# plot prior of model
plt.figure(figsize=(20,10))
plt.title("GP Prior Distribution")
gp_viz.regression.plot_1d(x,mean,var[:,0])
plt.scatter(x_train,y_train,label='Train Data', c='c')
plt.scatter(x_test,y_test,label='Test Data', c='r')
plt.legend()
plt.show()
../../_images/markdown_examples_Regression_8_0.png
[9]:
# generate data to plot prior samples of model
x = np.linspace(-15,15,100)

# plot posterior of the model
plt.figure(figsize=(20,10))
plt.title("Sample GP Prior Distribution")
for i in range(10):
    sample = model.prior_sample(x)
    plt.plot(x, sample, label='Prior Sample')
plt.legend()
plt.show()
../../_images/markdown_examples_Regression_9_0.png
[10]:
# the model object is largely inspired by the
# scikitlearn interface
# simply call fit to "train" the model
model.fit(x_train,y_train)
[11]:
# generate data to plot posterior of model
x = np.linspace(-15,15,100)

# pull the parameters of the posterior distribution
mean, var = model.posterior_predict(x)

# plot posterior of model
plt.figure(figsize=(20,10))
plt.title("GP Posterior Distribution")
gp_viz.regression.plot_1d(x,mean,var[:,0])
plt.scatter(x_train,y_train,label='Train Data', c='c')
plt.scatter(x_test,y_test,label='Test Data', c='r')
plt.legend()
plt.show()
../../_images/markdown_examples_Regression_11_0.png
[12]:
# do basic regression validation

mean, var = model.posterior_predict(x_train)
train_acc = validation.rmse(mean,y_train)

mean, var = model.posterior_predict(x_test)
test_acc = validation.rmse(mean,y_test)

print("Train RMSE: {}\nTest RMSE: {}".format(train_acc,test_acc))
Train RMSE: 319.70145133317754
Test RMSE: 390.31184189505524
[13]:
x = np.linspace(-15,15,100)

# plot posterior of model
plt.figure(figsize=(20,10))
plt.title("Sample GP Posterior Distribution")
for i in range(10):
    sample = model.posterior_sample(x)
    plt.plot(x, sample, label='Posterior Sample')
plt.legend()
plt.show()
../../_images/markdown_examples_Regression_13_0.png

Prior and Posterior Predictive

We can also draw samples from the prior and posterior predictives. Examples of how to do this for the model specified above are shown below.

[48]:
x = np.linspace(-15,15,100)

plt.figure(figsize=(10, 4))
plt.title("Prior Predictive Samples")

for i in range(10):

    # draw a sample from the prior
    prior_sample = model.prior_sample(x)

    # add likelihood noise
    liklihood_noise = model.var_l * np.ones(100)
    liklihood_noise = np.diag(liklihood_noise)
    prior_predictive_sample = np.random.multivariate_normal(mean=prior_sample, cov=liklihood_noise, size=1)

    plt.scatter(x, prior_predictive_sample)

plt.show()
../../_images/markdown_examples_Regression_15_0.png
[47]:
x = np.linspace(-15,15,100)

plt.figure(figsize=(10, 4))
plt.title("Posterior Predictive Samples")

for i in range(10):

    # draw a sample from the prior
    posterior_sample = model.posterior_sample(x)

    # add likelihood noise
    liklihood_noise = model.var_l * np.ones(100)
    liklihood_noise = np.diag(liklihood_noise)
    posterior_predictive_sample = np.random.multivariate_normal(mean=posterior_sample, cov=liklihood_noise, size=1)

    plt.scatter(x, posterior_predictive_sample)

plt.show()
../../_images/markdown_examples_Regression_16_0.png
[ ]: