PyMC3 Examples - Gaussian and Linear Regression

Introduction

Probabilistic programming allows defining models similar to (directed) graphical models programmatically. The modeling in probabilistic programming is more powerful because the model can contain control structures such as loops and if-statements. Every directed graphical model can be expressed by a loop-free probabilistic program [GOR14]. Inference (including learning) is based on black-box methods based on MC (Monte Carlo) sampling or variational inference. Therefore, the programmer can focus primarily on modeling. Depending on the software packages the inference is entirely automatic, or it requires strong background knowledge for tuning the inference. Such tuning is especially necessary for a large data set and big models.

In this exercise PyMC3 is used, which makes use of the NUTS (No-U-Turn-Sampler) sampler. This sampler "has several self-tuning strategies for adaptively setting the tunable parameters of Hamiltonian Monte Carlo, which means you usually don’t need to have specialized knowledge about how the algorithms work" (PyMC3 - Getting started).

In this exercise you will use PyMC3 to:

1) Estimate the parameters of a normal distribution.

2) Estimate the parameters of a linear regression model.

Remark: In order to detect errors in your own code, execute the notebook cells containing assert or assert_almost_equal. These statements raise exceptions, as long as the calculated result is not yet correct.

Requirements

Knowledge

Theory

All PyMC3-exercises are intended as part of the course Bayesian Learning. Therefore work through the course up to and including chapter Probabilistic Progrmaming.

If you are unfamiliar with Bayesian Learning the onlinebook Probabilistic-Programming-and-Bayesian-Methods-for-Hackers from Cameron Davidson-Pilon is an excellent source to get familiar with probabilistic programming and also provides example code for PyMC3.

Another suitable source is GORDON, Andrew D., et al. Probabilistic programming

PyMC3

Python Modules

import os
import scipy.stats
import scipy.special as sps
import pymc3 as pm
import pandas as pd
import numpy as np
import daft

from IPython.core.pylabtools import figsize
from IPython.display import Image
from matplotlib import pyplot as plt
from matplotlib import rc

#rc("font", family="serif", size=16)

%matplotlib inline

Exercises

Estimating Mean and Standard Deviation of Normal Distribution

One of the most simple tasks is to estimate the mean$ \mu $ and variance$ \sigma^2 $ of some given, normally distributed Data$ Y $ (we might also just assume it to be normally distributed).

$ Y \sim \mathcal N(\mu, \sigma^2) $

with
-$ \mu $: mean -$ \sigma $: standard deviation -$ \sigma^2 $ : variance

### generate observed data, just execute
### REMEMBER: we do not know mu and std
N = 10 # number of observed data points
mu = 10. # mean mu
std = 2. # np.random param "scale" is standarddeviation
Y = np.random.normal(loc=mu, scale=std, size=N)
### This is everyhting we know
Y

Task:

Build the model with:

  • The observed data should come from a Gaussian distribution$ \mathcal N(\mu, \sigma^2) $
  • Use a unifom prior for the mean in the range of$ [-50, 50] $
  • Use a uniform prior for the standard deviation in the range$ [0.001, 100] $

Hint:

Use the following classes to generete the variables with PyMC3:

  • pm.Uniform(...)
  • pm.Normal(...)
### create model
gaussian_model = pm.Model()

with gaussian_model:
    ### create variable here inside context stack, so 
    ### they get attached to the model automatically
    
    pass

With the code below, PyMC3 will sample from the distribution:

with gaussian_model:
    trace = pm.sample(2000, n_init=40)

The code below generates the following picture depicting the so called plate notation:

internet connection needed

### if the code throws an exception install 
### the packages listed in the message if you want
### not needed for the code though.
pm.model_to_graphviz(gaussian_model)

The code below shows us some stats:

  • The right column shows the samples drawn in sequential order.
  • The left column is the more interesting part: It shows a smoothed histogram for the posterior (our estimates) distributions of$ \mu $ and$ \sigma $.
_ = pm.traceplot(trace)

We also can get the samples drawn for$ \mu $ and$ \sigma $ and save them into a numpy array. The parameter burn is used to skip the first 100 drawn samples as they are likely to lie outside the true distribution.

mu_trace=trace.get_values(varname="mu", burn=100)
sd_trace=trace.get_values(varname="sd", burn=100)
plt.figure(figsize=(10,4))
plt.subplot(121)
plt.hist(mu_trace, 15, histtype='step', density=True, label='posterior');
plt.legend(loc='best')
plt.xlabel("$\mu$")
plt.ylabel("probability")
plt.title("Mean of Gaussian $\mu$")

plt.subplot(122)
plt.hist(sd_trace, 15, histtype='step', density=True, label='posterior');
plt.legend(loc='best');
plt.title("$\sigma$")
plt.xlabel("$\sigma$")
plt.ylabel("probability")

Estimating Parameters of a Linear Regression Model

The task now is to estimate the regression parameters using a simple linear model:

$ y \sim ax + b $

We can restate the linear model (with noise$ \epsilon $): $ y = ax + b + \epsilon $

as sampling from a probability distribution

$ y \sim \mathcal N(ax + b, \sigma^2) $

Now we can use pymc to estimate the paramters$ a, b $ and$ \sigma $.

e.g. assume the following priors (choose appropriate vales for the '?' later in the exercises some cells below): $ a \sim \mathcal N (?, ?) $ $ b \sim \mathcal N (?, ?) $ $ \tau \sim HalfNormal (?, ?) $

### observed data. 
### Again we do not know these values except the observed data y
n = 3
a = 20
b = 4
sigma = 2.5
x = np.random.uniform(0, 1, n)
y_obs = a*x + b + np.random.normal(0, sigma, n)
data = pd.DataFrame(np.array([x, y_obs]).T, columns=['x', 'y'])
data.plot(x='x', y='y', kind='scatter', s=50)

Graphical Model

def plot_glm():
    pgm = daft.PGM([5.3, 4.05], origin=[-0.3, -0.3], aspect=1., dpi=200)
    #pgm.add_node(daft.Node("alpha", r"$\alpha$", 2.5, 3, fixed=True))
    pgm.add_node(daft.Node("sigma", r"$\sigma$", 3.5, 2.2))

    pgm.add_node(daft.Node("theta", r"$\theta$", 2.5, 2.2))
    # Data.
    pgm.add_node(daft.Node("xi", r"$\vec x^{(i)}$", 1.5, 1, fixed=True))
    pgm.add_node(daft.Node("yi", r"$y^{(i)}$", 2.5, 1, observed=True))

    pgm.add_node(daft.Node("x", r"$\vec x$", 4.5, 1, fixed=True))
    pgm.add_node(daft.Node("y", r"$y$", 3.5, 1))

    # Add in the edges.
   # pgm.add_edge("alpha", "theta")
    pgm.add_edge("theta", "yi")
    pgm.add_edge("xi", "yi")
    pgm.add_edge("sigma", "yi")

    pgm.add_edge("x", "y")
    pgm.add_edge("theta", "y")
    pgm.add_edge("sigma", "y")
    # And a plate.
    pgm.add_plate(daft.Plate([1., .4, 2, 1.1], label=r"$i = 1, \ldots, n$",
    shift=-0.1))

    pgm.render()
plot_glm()

If the code above throws and expeption install daft (e.g. pip install daft) or just proceed. Here is the picture, which was generated:

internet connection needed

  • inside the plate are the training data$ \mathcal D = \{(\vec x^{(i)}, y^{(i)})\} $ -$ \vec x $ is not considered as random variable. -$ \theta = \{a, b\} $ is a latent variable -$ y $ is the prediction for an$ \vec x $ -$ \alpha $ are the parameters for the prior of$ a $ and$ b $

Task:

  • (a) Build the model in pymc3
  • (b) Use pymc3 to estimate the paramters$ a, b $ and$ \sigma $.
  • (c) Plot the distribution of the parameters.
  • (d) Optinal: Plot the data with multiple regression lines.
model = pm.Model()

with model:
    
    ### Your code here (exercise (a) and (b):
    
    
    pass

If everything is correct, the diagrams for exercises (c) and (d) could look like the following:

(c) internet connection needed

(d) internet connection needed

trace = trace[100:]
pm.model_to_graphviz(model)

Literature

Licenses

Notebook License (CC-BY-SA 4.0)

The following license applies to the complete notebook, including code cells. It does however not apply to any referenced external media (e.g., images).

pymc3 Examples
by Christian Herta, Klaus Strohmenger
is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at https://gitlab.com/deep.TEACHING.

Code License (MIT)

The following license only applies to code cells of the notebook.

Copyright 2018 Christian Herta, Klaus Strohmenger

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Data: $ X \sim \mathcal N(\mu, \sigma^2) $

Probability Density Function: $ f(X \mid \mu, \tau) = \sqrt{\frac{\tau}{2\pi}} \exp\left\{ -\frac{\tau}{2} (X-\mu)^2 \right\} $

with
-$ \mu $: mean -$ \sigma $: standard deviation -$ \sigma^2 $ : variance

  • precision$ \tau := 1/\sigma^2 $