Part1
Let’s get to know each other first!
Part 1: Introduction to Bayesian inferences and some basics in brms
Part 2: Checking the model with the WAMBS and how to do it in R
Part 3: Reporting and interpreting Bayesian models
All material is on a shared Dropbox folder called ABAR_Turku23
You can find:
Folder Presentations
Subfolders for each part of the workshop
With files called Slides_Part1.html (presentation you can open) and Slides_Part1.qmd (source quarto file)
html slides can be exported to pdf in your browser (use the menu button bottom right)
We want to learn more about a population but we have sample data!
Therefore, we have uncertainty
What is the effect of average number of kilometers ran per week on the marathon time?
We can estimate the effect by using a regression model based on a sample:
\[\text{MarathonTime}_i = \beta_0 + \beta_1 * \text{n_km}_i + \epsilon_i\]
Of course, we are not interested in the value of \(\beta_1\) in the sample, but we want to learn more on \(\beta_1\) in the population!
Maximal Likelihood estimation:
Call:
lm(formula = MarathonTimeM ~ km4week, data = MarathonData)
Residuals:
Min 1Q Median 3Q Max
-42.601 -12.469 -0.536 12.816 38.670
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 199.14483 1.93856 102.728 < 2e-16 ***
km4week -0.50907 0.07233 -7.038 4.68e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 18.08 on 85 degrees of freedom
Multiple R-squared: 0.3682, Adjusted R-squared: 0.3608
F-statistic: 49.53 on 1 and 85 DF, p-value: 4.675e-10
Sample: for each km more a week approx half a minute faster
Population:
==> So: -0.6 is as equally probable as -0.4 …
Let’s explore a tutorial App of J. Krushke (2019)
https://iupbsapps.shinyapps.io/KruschkeFreqAndBayesApp/
He has also written a nice tutorial, linked to this App:
https://jkkweb.sitehost.iu.edu/KruschkeFreqAndBayesAppTutorial.html
Advantages:
Disadvantage:
\[ P(\theta|data) = \frac{P(data|\theta)P(\theta)}{P(data)} \] with
\(P(data|\theta)\) : the likelihood of the data given our model \(\theta\)
This part is about our MODEL and the parameters (aka the unknowns) in that model
Example. A model on “time needed to run a marathon” could be a normal distribution: \(y_i \backsim N(\mu, \sigma)\)
For a certain average (\(\mu\)) and standard deviation (\(\sigma\)) we can calculate the probability of observing a marathon time of 240 minutes for instance
Expression of our prior knowledge (belief) on the parameter values
Example. A model on “time needed to run a marathon” could be a normal distribution: \(y_i \backsim N(\mu, \sigma)\)
e.g., How probable is an average marathon time of 120 minutes (\(P(\mu=120)\)) and how probable is a standard deviation of 40 minutes (\(P(\sigma=40)\))
How probable are different values as average marathon time in a population?
Uninformative/Weakly informative
When objectivity is crucial and you want let the data speak for itself…
Informative
When including significant information is crucial
Two potential priors for the mean marathon time in the population
“For Bayesians, the data are treated as fixed and the parameters vary. […] Bayes’ rule tells us that to calculate the posterior probability distribution we must combine a likelihood with a prior probability distribution over parameter values.”
(Lambert, 2018, p.88)
Our Model for marathon times
\(y_i \backsim N(\mu, \sigma)\)
Our Priors related to mu and sigma:
Our data:
[1] 185 193 240 245 155 234 189 196 206 263
What is the posterior probability of mu = 180 combined with a sigma = 10?
We calculated the posterior probability of a combination of 2 parameters
We could repeat this systematically for following values:
aka: we create a grid of possible parameter value combinations and approx the distribution of posterior probabilites
Instead of a fixed grid of combinations of parameter values,
sample pairs/triplets/… of parameter values
reconstruct the posterior probability distribution
brms
?A ‘simple’ linear model
\[\begin{aligned} & MT_{i} \sim N(\mu,\sigma_{e_{i}})\\ & \mu = \beta_0 + \beta_1*\text{sp4week}_{i} + \beta_2*\text{km4week}_{i} + \\ & \beta_3*\text{Age}_{i} + \beta_4*\text{Gender}_{i} + \beta_5*\text{CrossTraining}_{i} \\ \end{aligned}\]
So you can get a REALLY LARGE number of parameters!
Complex models \(\rightarrow\) Large number of parameters \(\rightarrow\) exponentionally large number of combinations!
Posterior gets unsolvable by grid approximation
We will approximate the ‘joint posterior’ by ‘smart’ sampling
Samples of combinations of parameter values are drawn
BUT: samples will not be random!
Following link brings you to an interactive tool that let’s you get the basic idea behind MCMC sampling:
https://chi-feng.github.io/mcmc-demo/app.html#HamiltonianMC,standard
different dedicated software/packages are available: JAGS / BUGS / Stan
most powerful is Stan! Specifically the Hamiltonian Monte Carlo algorithm makes it the best choice at the moment
Stan is a probabilistic programming language that uses C++
// generated with brms 2.14.4
functions {
}
data {
int<lower=1> N; // total number of observations
vector[N] Y; // response variable
int prior_only; // should the likelihood be ignored?
}
transformed data {
}
parameters {
real Intercept; // temporary intercept for centered predictors
real<lower=0> sigma; // residual SD
}
transformed parameters {
}
model {
// likelihood including all constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu = Intercept + rep_vector(0.0, N);
target += normal_lpdf(Y | mu, sigma);
}
// priors including all constants
target += normal_lpdf(Intercept | 500,50);
target += student_t_lpdf(sigma | 3, 0, 68.8)
- 1 * student_t_lccdf(0 | 3, 0, 68.8);
}
generated quantities {
// actual population-level intercept
real b_Intercept = Intercept;
}
brms
worksbrms
brms
syntaxVery very similar to lm
or lme4
and in line with typical R-style writing up of a model …
Notice:
backend = "cmdstanr"
indicates the way we want to interact with Stan and C++The simplest model looked like:
\[ MT_i \sim N(\mu,\sigma_e) \]
In brms
this model is:
MT <- c(185, 193, 240, 245, 155, 234, 189, 196, 206, 263)
# First make a dataset from our MT vector
DataMT <- data_frame(MT)
Mod_MT1 <- brm(
MT ~ 1, # We only model an intercept
data = DataMT,
backend = "cmdstanr",
seed = 1975
)
🏃 Try it yourself and run the model … (don’t forget to load the necessary packages: brms
& tidyverse
)
brms
brms
Family: gaussian
Links: mu = identity; sigma = identity
Formula: MT ~ 1
Data: DataMT (Number of observations: 10)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 209.58 11.18 187.37 231.78 1.00 2386 1935
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 36.21 9.51 23.26 58.13 1.00 2487 2329
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Markov Chain Monte Carlo
brms
defaults with 4 chains each of 2000 iterations of which 1000 iterations are used as burn-in
brms
defaultsplot()
functionplot()
functionMarathonData.RData
km4week
and sp4week
on MarathonTimeM
brms
defaultsplot()
function\(\widehat R\) < 1.015 for each parameter estimate
at least 4 chains are recommended
Effective Sample Size (ESS) > 400 to rely on \(\widehat R\)
Two complementary procedures:
posterior-predictive check
compare models with leave one out cross-validation
A visual check that can be performed with pp_check()
from brms
\(\sim\) AIC or BIC in Frequentist statistics
\(\widehat{elpd}\): “expected log predictive density” (higher \(\widehat{elpd}\) implies better model fit without being sensitive for overfitting!)
elpd_diff se_diff elpd_loo se_elpd_loo p_loo se_p_loo
MarathonTimes_Mod2 0.0 0.0 -356.3 12.0 7.2 4.4
MarathonTimes_Mod1 -39.9 12.6 -396.2 5.3 1.7 0.3
looic se_looic
MarathonTimes_Mod2 712.6 24.0
MarathonTimes_Mod1 792.3 10.6
your dataset
two alternative models
for a certain variableassume normality
andKeep it simple
pp_check()
brms
defaults to family = gaussian(link = "identity")
But many alternatives are available!
The default family
types known in R
(e.g, binomial(link = "logit")
, Gamma(link = "inverse")
, poisson(link = "log")
, …)
see help(family)
And even more!
brmsfamily(...)
has a lot of possible models
see help(brmsfamily)
brms
can estimate models for more complex designs like multilevel models
or more generally mixed effects models
Random intercepts…
Random intercepts and slopes…
Time to think about your data
and research question
prior beliefs
about each of the parameters?