#configure plotting
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
import matplotlib;matplotlib.rcParams['figure.figsize'] = (8,5)
from matplotlib import pyplot as plt
This notebook is to show how to implement a horseshoe prior model in a pymc3. Pymc3 is a probabilistic programming framework for the python programming language. We will just consider a toy dataset and fit a penalised logistic regression model using the horseprior. First we introduce the logistic regression model and 3 formulations of the prior. Then we introduce the toy data we wish to model. The toy data will a sparse set of salient features which we will wish to find with our model. We then implement the different formulations in pymc3.
Logistic Regression can be written as,
Priors need to be placed on and . The horseshoe prior on can be written as,
as shown in http://proceedings.mlr.press/v5/carvalho09a/carvalho09a.pdf (Equation 1)
The half-Cauchy can be written instead as
as shown in Equation 26 in https://arxiv.org/pdf/1611.06649.pdf
Alternatively,
as shown in Appendix of "On the Hyperprior Choice for the Global Shrinkage Parameter in the Horseshoe Prior" - Piironen, Vehtari 2017 . Also see Equation 5 of https://arxiv.org/pdf/1705.10388.pdf "Model Selection in Bayesian Neural Networks via Horseshoe Priors" - Ghosh and Doshi-Velez 2017. https://arxiv.org/pdf/1602.03807.pdf "Variational Inference for Sparse and Undirected Models" - Ingraham and Marks 2017
This is an example of a non-centred model.
See also this blog post http://twiecki.github.io/blog/2017/02/08/bayesian-hierchical-non-centered/
import numpy as np
from scipy.special import expit
from scipy.stats import binom
def build_toy_dataset(N, D=50, noise_std=0.1):
np.random.seed(1234)
X = np.random.random((N, D))-0.5
W = np.random.random((D, 1))-0.5
b = np.random.random()-0.5
zero_ind = np.arange(D//4, D)
W[zero_ind, :] = 0
y = expit(X.dot(W) + b)
#y = binom.rvs(1, y)
y[y < 0.5] = 0
y[y >= 0.5] = 1
y = y.flatten()
return X, y, W
N, D = 200, 50
X, y, W = build_toy_dataset(N, D)
plt.scatter(np.arange(W.size), W.flatten());
We now implement the formulations in pymc3. We perform inference using the Markov Chain Monte Carlo (MCMC) sampling procedure NUTS (No U-turn Sampler) and the Hamilton Markov Chain (HMC) procedure. For each we plot the inferred parameters of the model which can be compared to the true parameters of the toy data.
import pymc3 as pm
import theano as T
# set up a placeholder to input data to the pymc3 model
Xᵢ = T.shared(X)
yᵢ = T.shared(y)
# define the model using formulation 1
with pm.Model() as logreg:
λ = pm.HalfCauchy('lambda', beta=1, shape=D)
τ = pm.HalfCauchy('tau', beta=1)
σ = pm.Deterministic('horseshoe', τ*τ*λ*λ)
β = pm.Normal('beta', mu=0, sd=σ, shape=D)
βₒ = pm.Normal('b', mu=0, sd=1)
μ = pm.math.invlogit(Xᵢ.dot(β) + βₒ)
yₒ = pm.Bernoulli('obs', p=μ, observed=yᵢ)
# performance the inference
SEED = 20090425
warmup = 2000
num_samples = 2000
with logreg:
step = pm.NUTS(target_accept=.8)
logreg_posterior = pm.sample(num_samples, step=step, njobs=2, tune=warmup, random_seed=SEED)
βₕ = logreg_posterior['beta']
βₕ = βₕ[βₕ.shape[0]//2:, :].mean(axis=0)
plt.scatter(np.arange(βₕ.size), βₕ.flatten());
import pymc3 as pm
import theano as T
# set up a placeholder to input data to the pymc3 model
Xᵢ = T.shared(X)
yᵢ = T.shared(y)
# define the model using formulation 1
with pm.Model() as logreg:
ϵ = pm.InverseGamma('epsilon', alpha=0.5, beta=1., shape=D, testval=0.1)
λ2 = pm.InverseGamma('lambda', alpha=0.5, beta=1./ϵ, shape=D, testval=0.1)
ξ = pm.InverseGamma('xi', alpha=0.5, beta=1., testval=0.1)
τ2 = pm.InverseGamma('tau', alpha=0.5, beta=1./ξ, testval=0.1)
σ = pm.Deterministic('horseshoe', τ2*λ2)
β = pm.Normal('beta', mu=0, sd=σ, shape=D)
βₒ = pm.Normal('b', mu=0, sd=1)
μ = pm.math.invlogit(Xᵢ.dot(β) + βₒ)
yₒ = pm.Bernoulli('obs', p=μ, observed=yᵢ)
# performance the inference
SEED = 20090425
warmup = 2000
num_samples = 2000
with logreg:
step = pm.NUTS(target_accept=.8)
logreg_posterior = pm.sample(num_samples, step=step, njobs=2, tune=warmup, random_seed=SEED)
βₕ = logreg_posterior['beta']
βₕ = βₕ[βₕ.shape[0]//2:, :].mean(axis=0)
plt.scatter(np.arange(βₕ.size), βₕ.flatten());
import pymc3 as pm
import theano as T
# set up a placeholder to input data to the pymc3 model
Xᵢ = T.shared(X)
yᵢ = T.shared(y)
# define the model using formulation 1
with pm.Model() as logreg:
ν = 1
rₗ = pm.Normal('r_local', mu=0, sd=1., shape=D)
ρₗ = pm.InverseGamma('rho_local', alpha=0.5*ν, beta=0.5*ν, shape=D, testval=0.1)
rᵧ = pm.Normal('r_global', mu=0, sd=1)
ρᵧ = pm.InverseGamma('rho_global', alpha=0.5, beta=0.5, testval=0.1)
τ = rᵧ * pm.math.sqrt(ρᵧ)
λ = rₗ * pm.math.sqrt(ρₗ)
z = pm. Normal('z', mu=0, sd=1, shape=D)
β = pm.Deterministic('beta', z*λ*τ)
βₒ = pm.Normal('b', mu=0, sd=1)
μ = pm.math.invlogit(Xᵢ.dot(β) + βₒ)
yₒ = pm.Bernoulli('obs', p=μ, observed=yᵢ)
# performance the inference
SEED = 20090425
warmup = 2000
num_samples = 2000
with logreg:
step = pm.NUTS(target_accept=.8)
logreg_posterior = pm.sample(num_samples, step=step, njobs=2, tune=warmup, random_seed=SEED)
βₕ = logreg_posterior['beta']
βₕ = βₕ[βₕ.shape[0]//2:, :].mean(axis=0)
plt.scatter(np.arange(βₕ.size), βₕ.flatten());
import pymc3 as pm
import theano as T
# set up a placeholder to input data to the pymc3 model
Xᵢ = T.shared(X)
yᵢ = T.shared(y)
# define the model using formulation 1
with pm.Model() as logreg:
ν = 1
rₗ = pm.Normal('r_local', mu=0, sd=1., shape=D)
ρₗ = pm.InverseGamma('rho_local', alpha=0.5*ν, beta=0.5*ν, shape=D, testval=0.1)
rᵧ = pm.Normal('r_global', mu=0, sd=1)
ρᵧ = pm.InverseGamma('rho_global', alpha=0.5, beta=0.5, testval=0.1)
τ = rᵧ * pm.math.sqrt(ρᵧ)
λ = rₗ * pm.math.sqrt(ρₗ)
z = pm. Normal('z', mu=0, sd=1, shape=D)
β = pm.Deterministic('beta', z*λ*τ)
βₒ = pm.Normal('b', mu=0, sd=1)
μ = pm.math.invlogit(Xᵢ.dot(β) + \beta\_o)
yₒ = pm.Bernoulli('obs', p=μ, observed=yᵢ)
# performance the inference
SEED = 20090425
warmup = 2000
num_samples = 2000
with logreg:
step = pm.HamiltonianMC()
logreg_posterior = pm.sample(num_samples, step=step, njobs=2, tune=warmup, random_seed=SEED)
βₕ = logreg_posterior['beta']
βₕ = βₕ[βₕ.shape[0]//2:, :].mean(axis=0)
plt.scatter(np.arange(βₕ.size), βₕ.flatten());
import pymc3 as pm
import theano as T
# set up a placeholder to input data to the pymc3 model
Xᵢ = T.shared(X)
yᵢ = T.shared(y)
# define the model using formulation 1
with pm.Model() as logreg:
ν = 3
rₗ = pm.Normal('r_local', mu=0, sd=1., shape=D)
ρₗ = pm.InverseGamma('rho_local', alpha=0.5*ν, beta=0.5*ν, shape=D, testval=0.1)
rᵧ = pm.Normal('r_global', mu=0, sd=1)
ρᵧ = pm.InverseGamma('rho_global', alpha=0.5, beta=0.5, testval=0.1)
τ = rᵧ * pm.math.sqrt(ρᵧ)
λ = rₗ * pm.math.sqrt(ρₗ)
z = pm. Normal('z', mu=0, sd=1, shape=D)
β = pm.Deterministic('beta', z*λ*τ)
βₒ = pm.Normal('b', mu=0, sd=1)
μ = pm.math.invlogit(Xᵢ.dot(β) + \beta\_o)
yₒ = pm.Bernoulli('obs', p=μ, observed=yᵢ)
# performance the inference
SEED = 20090425
warmup = 2000
num_samples = 2000
with logreg:
step = pm.NUTS(target_accept=.8)
logreg_posterior = pm.sample(num_samples, step=step, njobs=2, tune=warmup, random_seed=SEED)
βₕ = logreg_posterior['beta']
βₕ = βₕ[βₕ.shape[0]//2:, :].mean(axis=0)
plt.scatter(np.arange(βₕ.size), βₕ.flatten());
We have shown 3 formulations of the horseshoe prior and implemented them in pymc3. We find the 3rd formulation produces a result more quickly.