Hierarchical Models
Small primer on Bayesian Hierarchical Models
Hierarchical Models
Books
Forecasting: Principles and Practice - Hierarchical or Grouped Time Series
Forecasting: Principles and Practice - Chapter 10 Forecasting hierarchical or grouped time series
Blog Posts
Bayesian Methods for Multilevel Modelling
Hierarchical Linear Regression in PyMC3
The best of both worlds: Hierarchical Linear Regression in PyMC3
Introduction
Hierarchical modeling is especially advantageous when in the presence of multi-level data, making the most of all available information by it’s “shrinkage-effect”. Software like HDDM allows hierarchical Bayesian estimation of a widely used decision making model, but here we will use a classical example of hierarchical linear regression to predict radon levels in houses.
Gelman et al.’s (2007) radon dataset is the canonical example for Bayesian Hierarchical Modelling. In the dataset the amount of radiation has been measured amongst different households in all counties of several US States. Radon gas is known as the highest cause of lung cancer in non-smokers. It is believed to enter the house through the basement. Additionally its concentration differs regionally due to differences in soil type.
The exercise is to attempt to predict radon levels in different counties and where in the house it was measured.
Data Source: radon.csv
Columns: county, log_radon, floor
where floor = 0 is the basement, floor = 1 is the 1st floor, and so forth.
Single regression (pooled model)
A common way of solving this would be to pool all the data and estimate one regression to assess the influence of measurement across counties. The model would follow the following specification:
where $i$ corresponds to the measurement, $c$ corresponds to the county and floor corresponds to which floor the measurement was made. We then estimate a single intercept and a single slope for all measurements over all counties. We are assuming that all the counties are exactly the same.
Separate regressions (unpooled model)
Imagine however we were interested, for instance, in seeing the differences in relationship (slope) or base-rates of radon (intercept) across counties. We could generate $n$ regressions, where $n$ is the number of counties. Such model would follow the following specification:
Now, for every county $c$, we are estimating $\alpha_c$ and $\beta_c$, up to $n$ total counties. In this version, the upooled version, we assume the extreme opposite of the former model. Instead of assuming all counties are exactly the same, we are assuming they share no similarities between them.
Hierarchical Regression
There is a middle ground between a fully pooled model and an unpooled model. It’s a Hierarchical Model.
Specifically, while we may assume that while $\alpha$s and $\beta$s are different for each county, they all come from a common group distribution:
Here the assumption is that the intercepts $\alpha$ and slopes $\beta$ all come from a normal distribution centered around their respective group mean $\mu$ with a certain standard deviation $\sigma^2$, values (posteriors) which we also estimate. It is for this reason that such a model is named Multilevel or Hierarchical.
Using Probabilistic Programming, with a framework like pyMC3,
Advanced Hierarchical Linear Regression in PyMC3
HDDM Demo
Tools
Extra Notes
The Inference Button: Bayesian GLMs made easy with PyMC3
This world is far from Normal(ly distributed): Bayesian Robust Regression in PyMC3