Gaussian Process Classification

In this notebook we will go over a basic example of how to use Squidward for gaussian process (GP) classification. This is not a tutorial on how GP classification works or should be done; it is merely an example of how to use Squidward for a simple classification problem. Squidward implements GP classification using the one vs. all methodology commonly implemented for logistic regression multi-classification. It is recommended to review the squidward simple regression example before going through this classification example.

We’ll begin by importing the packages needed to go through these examples!

[1]:
# model with squidward
from squidward.kernels import distance, kernel_base
from squidward import gpc, validation
import gp_viz

# generate example data
import numpy as np

# plot example data
import matplotlib.pyplot as plt
import seaborn as sns

plt.set_cmap('winter')

2D Classification Example

To demonstrate classification with squidward we will use a very simple two dimensional classification example. Here we generate a simple 2D train set, fit a GP to that data, and display the mean and variance of our predictions in an easy to interpret plot.

[2]:
n = 100
cov = [[5,0],[0,5]]

x1 = np.random.multivariate_normal(mean=[2,10],cov=cov,size=n)
y1 = np.zeros(n)
x2 = np.random.multivariate_normal(mean=[8,10],cov=cov,size=n)
y2 = np.ones(n)
x3 = np.random.multivariate_normal(mean=[6,3],cov=cov,size=n)
y3 = np.ones(n)*2

x = np.concatenate([x1,x2,x3])
y = np.concatenate([y1,y2,y3]).reshape(-1,1)

X_std = (x - x.min(axis=0)) / (x.max(axis=0) - x.min(axis=0))
x = X_std * (1.0 - 0.0) + 0.0

plt.figure(figsize=(20,5))
plt.scatter(x[:,0],x[:,1],c=y[:,0],s=100)
plt.show()
../../_images/markdown_examples_Classification_3_0.png

In the regression example we used the built in radial basis function kernel. We did this by passing the rbf distance measure to the base kernel class. Here we will create a custom kernel theat will be the additive combination of a rbf kernel over each dimension in the train dataset. Squidward makes it very easy to make custom kernel as shown below.

[3]:
# custom distance function

d1 = distance.RBF(0.2,35.0**2)
d2 = distance.RBF(0.2,35.0**2)

def d(alpha, beta):
    return d1(alpha[0], beta[0]) + d1(alpha[1], beta[1])
[4]:
# the kernel base class takes the distance measure of choice
# as well as the method for evaluating the kernel (the default
# is k1 which is analogous to the scipy.distance.cdit fuction v1.2.0)
kernel = kernel_base.Kernel(d, 'k1')
[5]:
# the model is instantiated with the kernel
# object as well as a likelihood variance (equivalent
# of a white noise kernel to model data noise)
# there is also a variety of choices for matrix
# inversion methods that trade of numeric stability and speed
model = gpc.GaussianProcess(n_classes=3, kernel=kernel, var_l=1.0**2, show_warnings=False)
[6]:
# generate data to plot posterior of GP
x_test = np.mgrid[0:1:0.05,0:1:0.05].reshape(2,-1).T
[8]:
# plot samples from prior
plt.figure(figsize=(20,10))
for i in range(5):
    _x = x_test + 0.1*np.random.rand(x_test.shape[0], x_test.shape[1])
    sample = model.prior_sample(_x).argmax(axis=1)
    plt.scatter(_x[:,0], _x[:,1], c=sample)
plt.show()
../../_images/markdown_examples_Classification_9_0.png
[9]:
# the model object is largely inspired by the
# scikitlearn interface
# simply call fit to "train" the model
model.fit(x,y)
[10]:
# pull predictions from the posterior distribution
mean  = model.posterior_predict(x)
predictions = mean.argmax(axis=1)
[11]:
# do basic validation

accuracy = validation.acc(predictions, y)
print("Train Accuracy: {}".format(accuracy))
Train Accuracy: 0.8966666666666666
[12]:
# generate data to plot posterior of GP
x_test = np.mgrid[0:1:0.05,0:1:0.05].reshape(2,-1).T

# pull predictions from the posterior distribution
mean  = model.posterior_predict(x_test)
predictions = mean.argmax(axis=1)
[13]:
# plot predictions from posterior
plt.figure(figsize=(20,10))
plt.scatter(x_test[:,0], x_test[:,1], c=predictions)
plt.show()
../../_images/markdown_examples_Classification_14_0.png
[14]:
# plot the logit outputs and logit variances of model
plt.figure(figsize=(20,5))
plt.subplot(121)
plt.title('Posterior Logits')
gp_viz.classification.plot_contour(model,(0,1,.1))
plt.subplot(122)
plt.title('Variance of Posterior Logits')
gp_viz.classification.plot_contour(model,(0,1,.1),True)
plt.show()
../../_images/markdown_examples_Classification_15_0.png