Applied Bayesian Analyses in R

Part 3

Sven De Maeyer

Your Turn

  • Open MarathonData.RData

  • Estimate Model3 with

    • an intercept

    • the effects of km4week_z

    • and sp4week_z

    • and Age40

    • on MarathonTimeM

  • What do we learn?

Interpretation of results…

Different ways to summarize our results:

  • visually

  • credible intervals (eti & hdi)

  • rope + hdi rule

  • hypothesis tests

Visually summarizing the posterior distribution

Functions in bayesplot package

  • mcmc_areas() function

  • mcmc_areas_ridges() function

  • mcmc_intervals() function

The mcmc_areas() function

Gives a posterior distribution including a certain credible interval that you can set manually with the prob argument:

mcmc_areas(
  MarathonTimes_Mod3,
  pars = c(
    "b_km4week_z",
    "b_sp4week_z",
    "b_Age40"
  ),
  prob = .89
)

The mcmc_areas() function

The mcmc_areas_ridges() function


Almost similar to the previous, only the horizontal spacing changes a bit…


Meanwhile, see how you can easily change the color scheme for bayesplot graphs


color_scheme_set(scheme = "red")

mcmc_areas_ridges(
  MarathonTimes_Mod3,
  pars = c(
    "b_km4week_z",
    "b_sp4week_z",
    "b_Age40"
  ),
  prob = .89
)

The mcmc_areas_ridges() function

The mcmc_intervals() function

Summarizes the posterior as a horizontal bar with identifiers for two CI.

Here we set one for a 50% and one for a 89% CI

color_scheme_set(scheme = "gray")

mcmc_intervals(
  MarathonTimes_Mod3,
  pars = c(
    "b_km4week_z",
    "b_sp4week_z",
    "b_Age40"
  ),
  prob = .5,
  prob_outer = .89
)

The mcmc_intervals() function

Manually create visualizations


Powercombo: as_draws_df() + ggplot2 + ggdist


What does as_draws_df() do?


posterior_PD <- as_draws_df(MarathonTimes_Mod3)
head(posterior_PD)

Manually create visualizations

# A draws_df: 6 iterations, 1 chains, and 7 variables
  b_Intercept b_km4week_z b_sp4week_z b_Age40 sigma lprior lp__
1         203       -13.0       -12.6   -7.97    15   -8.1 -333
2         196       -11.7        -9.7    3.88    13   -8.0 -333
3         201       -13.4       -13.5   -7.10    14   -8.0 -333
4         200       -12.0       -13.4   -3.44    13   -8.0 -331
5         199       -11.7        -9.9    0.42    14   -8.1 -331
6         200        -9.7       -10.8   -1.18    14   -8.1 -332
# ... hidden reserved variables {'.chain', '.iteration', '.draw'}

Use draws to create a plot using ggdist geoms

ggdist package has a set of functions to visualize a distribution

An example


Before we start, set our own plot theme (not so necessary)


# Setting a plotting theme
theme_set(theme_linedraw() +
            theme(
              text = element_text(family = "Times", size = 14),
              panel.grid = element_blank()
              )
)

An example


We use posterior_PD as a starting point (our draws)

library(ggdist)

Plot <- ggplot(
  posterior_PD,
  aes(x = b_km4week_z)
  ) +
  stat_halfeye()

Plot + scale_y_continuous(name = "", breaks = NULL)

An example

Change the CI’s


Change the CI’s to 50% and 89%


Plot <- ggplot(
  posterior_PD,
  aes(x = b_km4week_z)
  ) +
  stat_halfeye(
    .width = c(.50,.89)
  )

Plot + scale_y_continuous(name = "", breaks = NULL)

Change the CI’s

Use another visualization


Let’s make a dotplot… (research shows this is best interpreted) with 100 dots


Plot <- ggplot(
  posterior_PD,
  aes(x = b_km4week_z)
  ) +
  stat_dotsinterval(
    quantiles = 100,
    .width = c(.50,.89)
  )

Plot + scale_y_continuous(name = "", breaks = NULL)

Use another visualization

Plot two parameters each in a facet


We use pivot_longer(everything()) to stack information on multiple parameters


posterior_PD %>% 
  select(
    b_km4week_z, b_sp4week_z
  ) %>% 
  pivot_longer(everything()) %>%
  ggplot(
    aes(x = value)
  ) +
  stat_halfeye(
    .width = c(.50,.89)
  ) +
facet_wrap(name ~ .) +
scale_y_continuous(name = "", breaks = NULL)

Plot two parameters each in a facet

Visualize calculated predictions based on posterior

Our example: 2 groups according to Age40

How to visualize the posterior probability of averages for both groups?

posterior_PD %>% 
  select(
    b_Intercept, b_Age40
  ) %>% 
  mutate(
    Mean_age_40above = b_Intercept,
    Mean_age_min40 = b_Intercept + b_Age40
  ) %>% 
  select(
    Mean_age_min40, Mean_age_40above
  ) %>% 
  pivot_longer(everything()) %>%
  ggplot(
    aes(x = value, color = name, fill = name)
  ) +
  stat_halfeye(
    .width = c(.50,.89),
    alpha = .40
  ) + 
  scale_y_continuous(name = "", breaks = NULL)

Visualize calculated predictions based on posterior

Hypothetical Outcome Plots (HOPs)

Code: see separate script called HOP_script.R on Dropbox

Reporting stats about the posterior

Credible Intervals


  • ETI: Equal Tailed Interval


  • HDI: Highest Density Interval

Concept of ROPE

ROPE: Region Of Practical Equivalence

Set a region of parameter values that can be considered equivalent to having no effect

  • in standard effect sizes the advised default is a range of -0.1 to 0.1

  • this equals 1/2 of a small effect size (Cohen, 1988)

  • all parameter values in that range are set equal to no effect

ROPE + HDI

ROPE + HDI rule


  • 95% of HDI out of ROPE \(\rightarrow\) \(H_0\) rejected

  • 95% of HDI all in ROPE \(\rightarrow\) \(H_0\) not rejected

  • 95% of HDI partially out of ROPE \(\rightarrow\) undecided

Applying the HDI + ROPE rule with bayestestR package


We can use the equivalence_test() function of the bayestestR package


library(bayestestR)
equivalence_test(MarathonTimes_Mod3)
# Test for Practical Equivalence

  ROPE: [-2.28 2.28]

Parameter |        H0 | inside ROPE |         95% HDI
-----------------------------------------------------
Intercept |  Rejected |      0.00 % | [194.83 203.73]
km4week_z |  Rejected |      0.00 % | [-14.59  -8.68]
sp4week_z |  Rejected |      0.00 % | [-14.84  -8.76]
Age40     | Undecided |     53.50 % | [ -7.44   4.87]

Visualizing the HDI + ROPE rule


We visualize the equivalence_test() by adding plot( )


equivalence_test(MarathonTimes_Mod3) %>%
  plot()

Probability of Direction (PD) with parameters package

library(parameters)
model_parameters(
  MarathonTimes_Mod3,
  ci_method = "hdi",
  rope_range = c(-2.262,2.262), #sd MarathonTimeM = 22.62 so 22.62*0.1 
  test = c("rope", "pd")
  )
# Fixed Effects

Parameter   | Median |           95% CI |     pd | % in ROPE |  Rhat |     ESS
------------------------------------------------------------------------------
(Intercept) | 199.29 | [195.12, 203.93] |   100% |        0% | 1.000 | 4441.00
km4week_z   | -11.73 | [-14.60,  -8.68] |   100% |        0% | 1.000 | 4421.00
sp4week_z   | -11.80 | [-14.93,  -8.88] |   100% |        0% | 1.000 | 4612.00
Age40       |  -1.29 | [ -7.01,   5.24] | 67.12% |    53.16% | 1.000 | 4741.00

# Sigma

Parameter | Median |         95% CI |   pd | % in ROPE |  Rhat |     ESS
------------------------------------------------------------------------
sigma     |  13.73 | [11.66, 15.92] | 100% |        0% | 1.000 | 4064.00

Outro

Some books

Some books

Some free online books

  • Bayes Rules!:

https://www.bayesrulesbook.com/

  • Or this book:

https://vasishth.github.io/bayescogsci/book/

Rens van de Schoot

In Nature Reviews

THE Podcast

If you like running - like I do - this could be a great companion on your run!

https://www.learnbayesstats.com/

Site on creating the graphs

There are many blogs and websites that you can consult if you want to find out more about making graphs.

One that I often fall back to is:


http://mjskay.github.io/tidybayes/

Questions?


Do not hesitate to contact me!


or

KIITOS