-
Notifications
You must be signed in to change notification settings - Fork 9
/
Copy pathbayes-mcmc.Rmd
151 lines (111 loc) · 4.21 KB
/
bayes-mcmc.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
---
title: 'Bayesian linear modelling via MCMC'
---
```{r, include=F}
knitr::opts_chunk$set(echo = TRUE, collapse=TRUE, cache=TRUE, message=F, warning=F)
library(tidyverse)
library(pander)
library(lmerTest)
```
# Baysian model fitting {#bayes-mcmc}
### Baysian fitting of linear models via MCMC methods {-}
This is a minimal guide to fitting and interpreting regression and multilevel
models via MCMC. For _much_ more detail, and a much more comprehensive
introduction to modern Bayesian analysis see
[Jon Kruschke's _Doing Bayesian Data Analysis_](http://www.indiana.edu/~kruschke/DoingBayesianDataAnalysis/).
Let's revisit our
[previous example which investigated the effect of familiar and liked music on pain perception](#pain-music-data):
```{r}
painmusic <- readRDS('data/painmusic.RDS')
painmusic %>%
ggplot(aes(liked, with.music - no.music,
group=familiar, color=familiar)) +
stat_summary(geom="pointrange", fun.data=mean_se) +
stat_summary(geom="line", fun.data=mean_se) +
ylab("Pain (VAS) with.music - no.music") +
scale_color_discrete(name="") +
xlab("")
```
```{r}
# set sum contrasts
options(contrasts = c("contr.sum", "contr.poly"))
pain.model <- lm(with.music ~
no.music + familiar * liked,
data=painmusic)
summary(pain.model)
```
Do the same thing again, but with with MCMC using Stan:
```{r, echo=T, results="hide"}
library(rstanarm)
options(contrasts = c("contr.sum", "contr.poly"))
pain.model.mcmc <- stan_lm(with.music ~ no.music + familiar * liked,
data=painmusic, prior=NULL)
```
```{r}
summary(pain.model.mcmc)
```
### Posterior probabilities for parameters {-}
```{r}
library(bayesplot)
mcmc_areas(as.matrix(pain.model.mcmc), regex_pars = 'familiar|liked', prob = .9)
```
```{r}
mcmc_intervals(as.matrix(pain.model.mcmc), regex_pars = 'familiar|liked', prob_outer = .9)
```
### Credible intervals {- #credible-intervals}
Credible intervals are distinct from [confidence intervals](#intervals)
TODO EXPAND
<!--
Use this to explain HPI
https://www.researchgate.net/post/Why_do_we_use_Highest_Posterior_Density_HPD_Interval_as_the_interval_estimator_in_Bayesian_Method
http://doingbayesiandataanalysis.blogspot.co.uk/2012/04/why-to-use-highest-density-intervals.html
-->
```{r}
params.of.interest <-
pain.model.mcmc %>%
as_tibble %>%
reshape2::melt() %>%
filter(stringr::str_detect(variable, "famil|liked")) %>%
group_by(variable)
params.of.interest %>%
tidybayes::mean_hdi() %>%
pander::pandoc.table(caption="Estimates and 95% credible intervals for the parameters of interest")
```
### Bayesian 'p values' for parameters {-}
We can do simple arithmetic with the posterior draws to calculate the
probability a parameter is greater than (or less than) zero:
```{r}
params.of.interest %>%
summarise(estimate=mean(value),
`p (x<0)` = mean(value < 0),
`p (x>0)` = mean(value > 0))
```
Or if you'd like the Bayes Factor (evidence ratio) for one hypotheses vs
another, for example comparing the hypotheses that a parameter is > vs. <= 0,
then you can use the `hypothesis` function in the `brms` package:
```{r}
pain.model.mcmc.df <-
pain.model.mcmc %>%
as_tibble
brms::hypothesis(pain.model.mcmc.df,
c("familiar1 > 0",
"liked1 > 0",
"familiar1:liked1 < 0"))
```
Here although we only have a 'significant' p value for one of the parameters, we
can also see there is "very strong" evidence that familiarity also influences
pain, and "strong" evidence for the interaction of familiarity and liking,
according to
[conventional rules of thumb when interpreting Bayes Factors](https://en.wikipedia.org/wiki/Bayes_factor#Interpretation).
TODO - add a fuller explanation of why
[multiple comparisons](#mutiple-comparisons) are not an issue for Bayesian
analysis [@gelman2012we], because _p_ values do not have the same interpretation
in terms of long run frequencies of replication; they are a representation of
the weight of the evidence in favour of a hypothesis.
TODO: Also reference Zoltan Dienes Bayes paper.
<!--
## Bayesian analysis of RCT data {- #region-of-practical-importance}
TODO
- Example from FIT RCT for weight and BMI
- Using and presenting the ROPE
-->