-
Notifications
You must be signed in to change notification settings - Fork 39
/
Copy path12.Rmd
1820 lines (1432 loc) · 92 KB
/
12.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
```{r, echo = F, cache = F}
knitr::opts_chunk$set(fig.retina = 2.5)
knitr::opts_chunk$set(fig.align = "center")
options(width = 110)
```
# Monsters and Mixtures
> Many monsters are hybrids. Many statistical models are too. This chapter is about constructing likelihood and link functions by piecing together the simpler components of previous chapters. Like legendary monsters, these hybrid likelihoods contain pieces of other model types. Endowed with some properties of each piece, they help us model outcome variables with inconvenient, but common, properties....
>
> We'll consider three common and useful examples. The first are models for handling **over-dispersion**. These models extend the binomial and Poisson models of the previous chapter to cope a bit with unmeasured sources of variation. The second type is a family of **zero-inflated** and **zero-augmented** models, each of which mixes a binary event with an ordinary GLM likelihood like a Poisson or binomial. The third type is the **ordered categorical** model, useful for categorical outcomes with a fixed ordering. This model is built by merging a categorical likelihood function with a special kind of link function, usually a **cumulative link**. We'll also learn how to construct ordered categorical predictors.
>
> These model types help us transform our modeling to cope with the inconvenient realities of measurement, rather than transforming measurements to cope with the constraints of our models. [@mcelreathStatisticalRethinkingBayesian2020, p. 369, **emphasis** in the original]
## Over-dispersed counts
> When counts arise from a mixture of different processes, then there may be more variation--thicker tails--than a pure count model expects. This can again lead to overly excited models. When counts are more variable than a pure process, they exhibit **over-dispersion**. The variance of a variable is sometimes called its **dispersion**. For a counting process like a binomial, the variance is a function of the same parameters as the expected value. For example, the expected value of a binomial is $Np$ and its variance is $Np(1 - p)$. When the observed variance exceeds this amount--after conditioning on all the predictor variables--this implies that some omitted variable is producing additional dispersion in the observed counts.
>
> That isn't necessarily bad. Such a model could still produce perfectly good inferences. But ignoring over-dispersion can also lead to all of the same problems as ignoring any predictor variable. Heterogeneity in counts can be a confound, hiding effects of interest or producing spurious inferences. (p, 370, **emphasis** in the original)
In this chapter we'll cope with the problem using continuous mixture models, "models in which a linear model is attached not to the observations themselves but rather to a distribution of observations" (p. 370).
### Beta-binomial.
> A **beta-binomial** model is a mixture of binomial distributions. It assumes that each binomial count observation has its own probability of success. We estimate the *distribution* of probabilities of success instead of a single probability of success. Any predictor variables describe the shape of this distribution. (p, 370, **emphasis** in the original)
Unfortunately, we need to digress. As it turns out, there are multiple ways to parameterize the beta distribution and we've run square into two. In the text, McElreath described the beta distribution with two parameters, an average probability $\bar p$ and a shape parameter $\theta$. In his **R** code 12.1, which we'll reproduce in a bit, he demonstrated that parameterization with the `rethinking::dbeta2()` function. The nice thing about this parameterization is how intuitive the `pbar` parameter is. If you want a beta with an average of .2, you set `pbar = .2`. If you want the distribution to be more or less certain, make the `theta` argument more or less large, respectively.
However, the beta density is often defined in terms of $\alpha$ and $\beta$. If you denote the data as $y$, this follows the form
$$\operatorname{Beta}(y | \alpha, \beta) = \frac{y^{\alpha - 1} (1 - y)^{\beta - 1}}{\text B (\alpha, \beta)},$$
which you can verify in the [*Continuous distributions on [0, 1]* section *Stan functions reference*](https://mc-stan.org/docs/functions-reference/continuous-distributions-on-0-1.html) [@standevelopmentteamStanFunctionsReference2022]. In the formula, $\text B$ stands for the Beta function, which computes a normalizing constant, which you can learn about in the [*Mathematical functions* chapter](https://mc-stan.org/docs/functions-reference/math-functions.html) of the Stan functions reference. If you look at the base **R** `dbeta()` function, you'll learn it takes two parameters, `shape1` and `shape2`. Those uncreatively-named parameters are the same $\alpha$ and $\beta$ from the density, above. They do not correspond to the `pbar` and `theta` parameters of McEreath's `rethinking::dbeta2()` function.
McElreath had good reason for using `dbeta2()`. Beta's typical $\alpha$ and $\beta$ parameters aren't the most intuitive to use; the parameters in McElreath's `dbeta2()` are much nicer. If you dive a little deeper, it turns out you can find the mean of a beta distribution in terms of $\alpha$ and $\beta$ like this:
$$\mu = \frac{\alpha}{\alpha + \beta}.$$
We can talk about the spread of the distribution, sometimes called $\kappa$, in terms $\alpha$ and $\beta$ like this:
$$\kappa = \alpha + \beta.$$
With $\mu$ and $\kappa$ in hand, we can even find the $\textit{SD}$ of a beta distribution with the formula
$$\sigma = \sqrt{\mu (1 - \mu) / (\kappa + 1)}.$$
I explicate all this because McElreath's `pbar` is $\mu = \frac{\alpha}{\alpha + \beta}$ and his `theta` is $\kappa = \alpha + \beta$, which is great news because it means that we can understand what McElreath did with his `beta2()` function in terms of the base **R** `dbeta()` function. This also sets us up to understand the distribution of the beta parameters used in `brms::brm()`. To demonstrate, let's walk through McElreath's **R** code 12.1.
Before we get to **R** code 12.1 and our version of the resulting plot, we should discuss themes. In this chapter we'll use theme settings and a color palette from the [**ggthemes** package](https://cran.r-project.org/package=ggthemes).
```{r, warning = F, message = F}
library(ggthemes)
```
We'll take our basic theme settings from the `theme_hc()` function. We'll use the `Green fields` color palette, which we can inspect with the `canva_pal()` function and a little help from `scales::show_col()`.
```{r, fig.height = 2}
scales::show_col(canva_pal("Green fields")(4))
canva_pal("Green fields")(4)
canva_pal("Green fields")(4)[3]
```
Now we finally get to **R** code 12.1.
```{r, fig.width = 4, fig.height = 3, warning = F, message = F}
library(tidyverse)
theme_set(
theme_hc() +
theme(axis.ticks.y = element_blank(),
plot.background = element_rect(fill = "grey92"))
)
pbar <- .5
theta <- 5
ggplot(data = tibble(x = seq(from = 0, to = 1, by = .01)),
aes(x = x, y = rethinking::dbeta2(x, prob = pbar, theta = theta))) +
geom_area(fill = canva_pal("Green fields")(4)[1]) +
scale_x_continuous("probability space", breaks = c(0, .5, 1)) +
scale_y_continuous(NULL, breaks = NULL) +
ggtitle(expression(The~beta~distribution),
subtitle = expression("Defined in terms of "*mu*" (i.e., pbar) and "*kappa*" (i.e., theta)"))
```
In his [-@kruschkeDoingBayesianData2015] text, [*Doing Bayesian data analysis*](https://sites.google.com/site/doingbayesiandataanalysis/), Kruschke provided code for a convenience function that takes `pbar` and `theta` as inputs and returns the corresponding $\alpha$ and $\beta$ values. Here's the function:
```{r}
betaABfromMeanKappa <- function(mean, kappa) {
if (mean <= 0 | mean >= 1) stop("must have 0 < mean < 1")
if (kappa <= 0) stop("kappa must be > 0")
a <- mean * kappa
b <- (1.0 - mean) * kappa
return(list(a = a, b = b))
}
```
Now we can use Kruschke's `betaABfromMeanKappa()` to find the $\alpha$ and $\beta$ values corresponding to `pbar` and `theta`.
```{r}
betaABfromMeanKappa(mean = pbar, kappa = theta)
```
And finally, we can double check that all of this works. Here's the same distribution but defined in terms of $\alpha$ and $\beta$.
```{r, fig.width = 4, fig.height = 3}
ggplot(data = tibble(x = seq(from = 0, to = 1, by = .01)),
aes(x = x, y = dbeta(x, shape1 = 2.5, shape2 = 2.5))) +
geom_area(fill = canva_pal("Green fields")(4)[4]) +
scale_x_continuous("probability space", breaks = c(0, .5, 1)) +
scale_y_continuous(NULL, breaks = NULL) +
ggtitle(expression(The~beta~distribution),
subtitle = expression("This time defined in terms of "*alpha*" and "*beta))
```
McElreath encouraged us to "explore different values for `pbar` and `theta`" (p. 371). Here's a grid of plots with `pbar = c(.25, .5, .75)` and `theta = c(5, 10, 15)`.
```{r, fig.width = 5, fig.height = 5}
# data
crossing(pbar = c(.25, .5, .75),
theta = c(5, 15, 30)) %>%
expand_grid(x = seq(from = 0, to = 1, length.out = 100)) %>%
mutate(density = rethinking::dbeta2(x, prob = pbar, theta = theta),
mu = str_c("mu == ", pbar %>% str_remove(., "0")),
kappa = factor(str_c("kappa == ", theta),
levels = c("kappa == 30", "kappa == 15", "kappa == 5"))) %>%
# plot
ggplot(aes(x = x, y = density)) +
geom_area(fill = canva_pal("Green fields")(4)[4]) +
scale_x_continuous("probability space",
breaks = c(0, .5, 1), labels = c("0", ".5", "1")) +
scale_y_continuous(NULL, labels = NULL) +
theme(axis.ticks.y = element_blank()) +
facet_grid(kappa ~ mu, labeller = label_parsed)
```
If you'd like to see how to make a similar plot in terms of $\alpha$ and $\beta$, see [Chapter 6](https://bookdown.org/content/3686/inferring-a-binomial-probability-via-exact-mathematical-analysis.html) of my [-@kurzDoingBayesianDataAnalysis2023] [ebook](https://bookdown.org/content/3686/) wherein I translated Kruschke's text into **tidyverse** and **brms** code.
But remember, we're not fitting a beta model. We're using the beta-binomial. "We're going to bind our linear model to $\bar p$, so that changes in predictor variables change the central tendency of the distribution" (p. 371). The statistical model we'll be fitting follows the form
\begin{align*}
\text{admit}_i & \sim \operatorname{BetaBinomial}(n_i, \bar p_i, \phi)\\
\operatorname{logit}(\bar p_i) & = \alpha_{\text{gid}[i]} \\
\alpha_j & \sim \operatorname{Normal}(0, 1.5) \\
\phi & \sim \operatorname{Exponential}(1).
\end{align*}
Here the size, $n$, is defined in the `applications` column in the data we'll load in just a moment. In case you're confused, yes, our statistical model is not quite the same as the one McElreath presented on page 371 in the text. If you look closely, we dropped all mention of $\theta$ and jumped directly to $\phi$. Instead of implementing McElreath's $\theta = \phi + 2$ trick, we're going to set the lower bound for $\phi$ directly. Which brings us to the next issue:
The beta-binomial has historically not been officially supported by **brms** (see [GitHub issue #144](https://github.com/paul-buerkner/brms/issues/144))[^5]. However, **brms** versions 2.2.0 and above allow users to define custom distributions. To get all the details, you might check out Bürkner's [-@Bürkner2022Define] vignette, [*Define custom response distributions with brms*](https://CRAN.R-project.org/package=brms/vignettes/brms_customfamilies.html). Happily, Bürkner even used the [beta-binomial distribution as the exemplar](https://cran.r-project.org/package=brms/vignettes/brms_customfamilies.html#the-beta-binomial-distribution) in the vignette.
Before we get carried away, let's load the data and **brms**.
```{r, warning = F, message = F}
data(UCBadmit, package = "rethinking")
d <-
UCBadmit %>%
mutate(gid = ifelse(applicant.gender == "male", "1", "2"))
rm(UCBadmit)
library(brms)
```
I'm not going to go into great detail explaining the ins and outs of making custom distributions for `brm()`. You've got Bürkner's vignette for that. For our purposes, we need a few preparatory steps. First, we need to use the `custom_family()` function to define the name and parameters of the beta-binomial distribution for use in `brm()`. Second, we have to define some functions for Stan which are not defined in Stan itself. We'll save them as `stan_funs`. Third, we'll make a `stanvar()` statement which will allow us to pass our `stan_funs` to `brm()`.
```{r}
beta_binomial2 <- custom_family(
"beta_binomial2", dpars = c("mu", "phi"),
links = c("logit", "log"),
lb = c(0, 2), ub = c(1, NA),
type = "int", vars = "vint1[n]"
)
stan_funs <- "
real beta_binomial2_lpmf(int y, real mu, real phi, int T) {
return beta_binomial_lpmf(y | T, mu * phi, (1 - mu) * phi);
}
int beta_binomial2_rng(real mu, real phi, int T) {
return beta_binomial_rng(T, mu * phi, (1 - mu) * phi);
}
"
stanvars <- stanvar(scode = stan_funs, block = "functions")
```
Did you notice the `lb = c(0, 2)` portion of the code defining `beta_binomial2()`? In Bürkner's vignette, he set the lower bound of `phi` to zero. Since McElreath wanted the lower bound for $\phi$ to be 2, we just set that as the default in the likelihood. We should clarify two more points:
First, what McElreath referred to as the shape parameter, $\theta$, Bürkner called the precision parameter, $\phi$. In our exposition, above, we followed Kruschke's convention and called it $\kappa$. These are all the same thing: $\theta$, $\phi$, and $\kappa$ are **all the same thing**. Perhaps less confusingly, what McElreath called the `pbar` parameter, $\bar p$, Bürkner simply refers to as $\mu$.
Second, we've become accustomed to using the `y | trials() ~ ...` syntax when defining our `formula` arguments for binomial models. Here we are replacing `trials()` with `vint()`. From Bürkner's *Define custom response distributions with brms* vignette, we read:
> To provide information about the number of trials (an integer variable), we are going to use the addition argument `vint()`, which can only be used in custom families. Similarly, if we needed to include additional vectors of real data, we would use `vreal()`. Actually, for this particular example, we could more elegantly apply the addition argument `trials()` instead of `vint()` as in the basic binomial model. However, since the present vignette is meant to give a general overview of the topic, we will go with the more general method.
>
> We now have all components together to fit our custom beta-binomial model:
```{r b12.1}
b12.1 <-
brm(data = d,
family = beta_binomial2, # here's our custom likelihood
admit | vint(applications) ~ 0 + gid,
prior = c(prior(normal(0, 1.5), class = b),
prior(exponential(1), class = phi)),
iter = 2000, warmup = 1000, cores = 4, chains = 4,
stanvars = stanvars, # note our `stanvars`
seed = 12,
file = "fits/b12.01")
```
Success, our results look a lot like those in the text!
```{r}
print(b12.1)
```
Just remember that, perhaps confusingly, what McElreath's output called `theta`, our **brms** output is calling `phi`. I know; this section is a lot. Keep your chin up! Here's what the corresponding `as_draws_df()` data object looks like.
```{r}
post <- as_draws_df(b12.1)
head(post)
```
Now we can compute and summarize a contrast between the two genders, what McElreath called `da`.
```{r, warning = F, message = F}
library(tidybayes)
post %>%
transmute(da = b_gid1 - b_gid2) %>%
mean_qi(.width = .89) %>%
mutate_if(is.double, round, digits = 3)
```
Much like in the text, the difference between genders on admission rates is near zero with wide uncertainty intervals spanning in either direction.
To stay within the **tidyverse** while making the many thin lines in Figure 12.1.a, we're going to need to do a bit of data processing. First, we'll want a variable to index the rows in `post` (i.e., to index the posterior draws). And we'll want to convert the `b_gid2` to the $\bar p$ metric with the `inv_logit_scaled()` function. Then we'll use `slice_sample()` to randomly draw a subset of the posterior draws. With the `expand_grid()` function, we'll insert a dense sequence of `x` values ranging between 0 and 1--the parameter space of beta distribution. Finally, we'll use `pmap_dbl()` to compute the density values for the `rethinking::dbeta2` distribution corresponding to the unique combination of `x`, `p_bar`, and `phi` values in each row.
```{r, warning = F}
set.seed(12)
lines <-
post %>%
mutate(p_bar = inv_logit_scaled(b_gid2)) %>%
slice_sample(n = 100) %>%
select(.draw, p_bar, phi) %>%
expand_grid(x = seq(from = 0, to = 1, by = .005)) %>%
mutate(density = pmap_dbl(list(x, p_bar, phi), rethinking::dbeta2))
str(lines)
```
All that was just for the thin lines. To make the thicker line for the posterior mean, we'll get tricky with `stat_function()`.
```{r, fig.width = 4, fig.height = 3.5}
lines %>%
ggplot(aes(x = x, y = density)) +
stat_function(fun = rethinking::dbeta2,
args = list(prob = mean(inv_logit_scaled(post %>% pull(b_gid2))),
theta = mean(post %>% pull(phi))),
linewidth = 1.5, color = canva_pal("Green fields")(4)[4]) +
geom_line(aes(group = .draw),
alpha = .2, color = canva_pal("Green fields")(4)[4]) +
scale_y_continuous(NULL, breaks = NULL, limits = c(0, 3)) +
labs(subtitle = "distribution of female admission rates",
x = "probability admit")
```
There are other ways to do this. For ideas, check out my blog post, [*Make rotated Gaussians, Kruschke style*](https://solomonkurz.netlify.app/blog/2018-12-20-make-rotated-gaussians-kruschke-style/).
Before we can do our variant of Figure 12.1.b, we'll need to define a few more custom functions. The `log_lik_beta_binomial2()` and `posterior_predict_beta_binomial2()` functions are required for `brms::predict()` to work with our `family = beta_binomial2` brmfit object. Similarly, `posterior_epred_beta_binomial2()` is required for `brms::fitted()` to work properly. And before all that, we need to throw in a line with the `expose_functions()` function. Just go with it.
```{r, results = 'hide', warning = F, message = F}
expose_functions(b12.1, vectorize = TRUE)
# required to use `predict()`
log_lik_beta_binomial2 <- function(i, prep) {
mu <- brms::get_dpar(prep, "mu", i = i)
phi <- brms::get_dpar(prep, "phi", i = i)
trials <- prep$data$vint1[i]
y <- prep$data$Y[i]
beta_binomial2_lpmf(y, mu, phi, trials)
}
posterior_predict_beta_binomial2 <- function(i, prep, ...) {
mu <- brms::get_dpar(prep, "mu", i = i)
phi <- brms::get_dpar(prep, "phi", i = i)
trials <- prep$data$vint1[i]
beta_binomial2_rng(mu, phi, trials)
}
# required to use `fitted()`
posterior_epred_beta_binomial2 <- function(prep) {
mu <- brms::get_dpar(prep, "mu")
trials <- prep$data$vint1
trials <- matrix(trials, nrow = nrow(mu), ncol = ncol(mu), byrow = TRUE)
mu * trials
}
```
With those intermediary steps out of the way, we're ready to make Figure 12.1.b.
```{r, fig.width = 4, fig.height = 3.5}
# the prediction intervals
predict(b12.1) %>%
data.frame() %>%
transmute(ll = Q2.5,
ul = Q97.5) %>%
bind_cols(
# the fitted intervals
fitted(b12.1) %>% data.frame(),
# the original data used to fit the model) %>%
b12.1$data
) %>%
mutate(case = 1:12) %>%
# plot!
ggplot(aes(x = case)) +
geom_linerange(aes(ymin = ll / applications,
ymax = ul / applications),
color = canva_pal("Green fields")(4)[1],
linewidth = 2.5, alpha = 1/4) +
geom_pointrange(aes(ymin = Q2.5 / applications,
ymax = Q97.5 / applications,
y = Estimate/applications),
color = canva_pal("Green fields")(4)[4],
linewidth = 1/2, shape = 1) +
geom_point(aes(y = admit/applications),
color = canva_pal("Green fields")(4)[2],
size = 2) +
scale_x_continuous(breaks = 1:12) +
scale_y_continuous(breaks = 0:5 / 5, limits = c(0, 1)) +
labs(subtitle = "Posterior validation check",
caption = expression(italic(Note.)*" A = admittance probability"),
y = "A") +
theme(axis.ticks.x = element_blank(),
legend.position = "none")
```
As in the text, the raw data are consistent with the prediction intervals. But those intervals are so incredibly wide, they're hardly an endorsement of the model. Once we learn about hierarchical models, we'll be able to do much better.
### Negative-binomial or gamma-Poisson.
Recall from the last chapter how the Poisson distribution presumes $\sigma^2$ scales with $\mu$. The negative binomial distribution relaxes this assumption and presumes "each Poisson count observation has its own rate. It estimates the shape of a gamma distribution to describe the Poisson rates across cases" (p. 373).
If you execute `?dgamma`, you'll see that base **R** will allow you to define the gamma distribution with either the `shape` and `rate` or the `shape` and `scale`. If we define gamma in terms of shape and rate, it follows the formula
$$\operatorname{Gamma}(y | \alpha, \beta) = \frac{\beta^\alpha y^{\alpha - 1} e^{-\beta y}}{\Gamma (\alpha)},$$
where $\alpha$ is the shape, $\beta$ is the rate, $e$ is base of the natural logarithm, and $\Gamma$ is the [gamma function](https://en.wikipedia.org/wiki/Gamma_function). It turns out the rate and scale parameters are the reciprocals of each other. Thus if you'd like to define a gamma distribution in terms of shape and scale, it would follow the formula
$$\operatorname{Gamma}(y | \alpha, \theta) = \frac{y^{\alpha - 1} e^{-x /\theta}}{\theta^\alpha \Gamma (\alpha)},$$
where $\alpha$, $e$, and $\Gamma$ are all as they were before and $\theta$ is the scale parameter. If that all wasn't complicated enough, it turns out there's one more way to define a gamma distribution. You can use the mean and shape. This would follow the formula
$$\operatorname{Gamma}(y | \mu, \alpha) = \frac{ \big (\frac{\alpha}{\mu} \big)^\alpha}{\Gamma (\alpha)} y^{\alpha - 1} \exp (- \frac{\alpha y}{\mu}),$$
where $\alpha$ and $\Gamma$ are still the shape and gamma function, respectively, and $\mu$ is the mean. I know, this is a lot and it probably all seems really abstract, right now. Think of this section as a reference. As you'll see after we fit our model, you may well need it. Returning to the content in the text, we might express the gamma-Poisson (negative binomial) as
$$y_i \sim \operatorname{Gamma-Poisson}(\mu, \alpha),$$
where $\mu$ is the mean or rate, taking the place of $\lambda$ from the Poisson distribution, and $\alpha$ is the shape. I should warn you that this notation is a little different from the notation McElreath used in the text (p. 374), where he used $\lambda$ in place of our $\mu$ and $\phi$ in place of our $\alpha$. The meaning is really the same. The reason I have diverged from McElreath's notation is to emphasize the connection with the walk-out, above. If you want to add injury to insult, compare both of these notations with the notation Bürkner used in his [-@Bürkner2022Parameterization] vignette, [*Parameterization of response distributions in brms*](https://cran.r-project.org/package=brms/vignettes/brms_families.html#ordinal-and-categorical-models).
Since this all is already a pedagogical nightmare, let's throw in one more technical tidbit before we start fitting models. I don't believe McElreath plainly explained this in the text, but when we talk about the gamma-Poisson model as having two parameters, the $\lambda$ parameter (a.k.a. $\mu$) is doing double duty. As with conventional Poisson models, $\lambda$ is the mean for the criterion. What might not be as clear is that $\lambda$ is also the mean of the gamma mixture distribution. So when you want to get a sense of the overall shape of the gamma-Poisson model-implied distribution of $\lambda$ parameters—one for each variable in the data set--, the distribution is gamma with a mean of $\lambda$ and shape of $\phi$ (a.k.a. $\alpha$). For a more detailed walk out of this, see Section 8.2 in @hilbeNegativeBinomialRegression2011.
Okay, that's enough technical background. Let's load the `Kline` data and bring this down to Earth with some models.
```{r, warning = F, message = F}
data(Kline, package = "rethinking")
d <-
Kline %>%
mutate(p = rethinking::standardize(log(population)),
contact_id = ifelse(contact == "high", 2L, 1L),
cid = contact)
rm(Kline)
print(d)
```
If you take a look of McElreath's `m12.2`, you'll see it's a gamma-Poisson version of the non-linear model he fit in last chapter, `m11.11`. You might also recall that we had to employ somewhat complicated non-linear syntax to translate that model into **brms**. Instead of jumping straight into a similarly complicated gamma-Poisson version of that model, I'm going to warm us up with a simple intercept-only model of the data. The formula will be
\begin{align*}
\text{total_tools}_i & \sim \operatorname{Gamma-Poisson} (\mu, \alpha) \\
\text{log}(\mu) & = \beta_0 \\
\beta_0 & \sim \operatorname{Normal}(3, 0.5) \\
\alpha & \sim \operatorname{Gamma}(0.01, 0.01),
\end{align*}
where we have deviated from McElreath's convention of using $\alpha$ for the model in favor $\beta_0$. This is because **brms** parameterizes the gamma likelihood in terms of $\mu$ and shape and, as we discussed above, shape is typically denoted as $\alpha$. I mean, technically we could refer to the shape parameter as $\psi$ or $\xi$ or whatever, but then we'd just be abandoning one convention for another. There's no way to win, here. *Sigh*. The prior for $\beta_0$ is the same one we used for the intercept way back for model `b11.9`. We have assigned a gamma prior for our troublesome new $\alpha$ (shape) parameter. Here's where that prior came from.
```{r}
get_prior(data = d,
family = negbinomial,
total_tools ~ 1)
```
`gamma(0.01, 0.01)` is the **brms** default for the `shape` parameter in this model. Within **brms**, priors using the gamma distribution are based on the shape-rate ($\alpha$-$\theta$) parameterization. This is what $\operatorname{Gamma}(0.01, 0.01)$ looks like.
```{r, fig.width = 3.5, fig.height = 3, warning = F}
ggplot(data = tibble(x = seq(from = 0, to = 60, by = .1)),
aes(x = x, y = dgamma(x, shape = 0.01, rate = 0.01))) +
geom_area(color = "transparent",
fill = canva_pal("Green fields")(4)[2]) +
scale_x_continuous(NULL) +
scale_y_continuous(NULL, breaks = NULL) +
coord_cartesian(xlim = c(0, 50)) +
ggtitle(expression(brms~default~gamma(0.01*", "*0.01)~shape~prior))
```
Let's fit the model.
```{r b12.2a}
b12.2a <-
brm(data = d,
family = negbinomial,
total_tools ~ 1,
prior = c(prior(normal(3, 0.5), class = Intercept), # beta_0
prior(gamma(0.01, 0.01), class = shape)), # alpha
iter = 2000, warmup = 1000, cores = 4, chains = 4,
seed = 12,
file = "fits/b12.02a")
```
Notice how you use the language of `family = negbinomial` to fit these models with **brms**. Here's the summary.
```{r}
print(b12.2a)
```
The intercept is our estimate of $\log \mu$, similar to $\log \lambda$ from a simple Poisson model. The `shape` is our estimate of, well, the shape ($\alpha$). To help us get a sense of what this model is, let's use the `brms::predict()` function to return random samples of the poster predictive distribution. Because we want random samples instead of summary values, we will specify `summary = F`. Let's take a look of what this returns.
```{r}
p <-
predict(b12.2a,
summary = F)
p %>%
str()
```
Because we have 4,000 posterior iterations, we also get back 4,000 rows. We have 10 columns, which correspond to the 10 rows (i.e., cultures) in the original data. In the next block, we'll put convert that output to a data frame and wrangle a little before plotting the results.
```{r, fig.width = 7, fig.height = 3}
p %>%
data.frame() %>%
set_names(d$culture) %>%
pivot_longer(everything(),
names_to = "culture",
values_to = "lambda") %>%
ggplot(aes(x = lambda)) +
geom_density(color = "transparent", fill = canva_pal("Green fields")(4)[2]) +
scale_x_continuous(expression(lambda["[culture]"]), breaks = 0:2 * 100) +
scale_y_continuous(NULL, breaks = NULL) +
coord_cartesian(xlim = c(0, 210)) +
facet_wrap(~ culture, nrow = 2)
```
Because this model had no predictors, we have similar posterior-predictive distributions for each case in the data. It's important, however, to be very clear of what these posterior-predictive distributions are of. They are not of the data, per se. Let's look back at the text:
> A **negative-binomial** model, more usefully called a **gamma-Poisson** model, assumes that each Poisson count observation has its own rate. It estimates the shape of the gamma distribution to describe the Poisson rates across cases. (p. 373, **emphasis** in the original)
As a reminder, the "rate" for the Poisson distribution is just another word for the mean, also called $\lambda$. So unlike a simple Poisson model where we use the individual cases to estimate one overall $\lambda$, here we're presuming each case has its own $\lambda_i$. There are 10 $\lambda_i$ values that generated our data and if we look at those $\lambda_i$ values on the whole, their distribution can be described with a gamma distribution. And again, this is not a gamma distribution for our data. This is a gamma distribution of the $\lambda_i$ values from the 10 separate Poisson distributions that presumably made our data.
After exponentiating the intercept parameter $(\log \mu)$, here are the posterior distributions for those two gamma parameters.
```{r, fig.width = 5.5, fig.height = 2.5, warning = F}
post <- as_draws_df(b12.2a)
post %>%
mutate(mu = exp(b_Intercept),
alpha = shape) %>%
pivot_longer(mu:alpha, names_to = "parameter") %>%
ggplot(aes(x = value)) +
geom_density(color = "transparent", fill = canva_pal("Green fields")(4)[2]) +
scale_y_continuous(NULL, breaks = NULL) +
labs(title = "Behold our gamma parameters!",
x = "posterior") +
facet_wrap(~ parameter, scales = "free", labeller = label_parsed)
```
We might want to use these parameters estimates to visualize the model-implied gamma distribution of $\lambda$ parameters. But recall that the base **R** `dgamma()` function doesn't take the mean. It is based on either the `shape` and `rate` or the `shape and `scale`. Since we already have the shape ($\alpha$), we need a way to compute the scale or rate. Happily, we can define the scale in terms of the mean and the shape with the equation
$$\theta = \frac{\mu}{\alpha}.$$
Behold $\theta$ in a plot.
```{r, fig.width = 3, fig.height = 2.5}
post %>%
mutate(mu = exp(b_Intercept),
alpha = shape) %>%
mutate(theta = mu / alpha) %>%
ggplot(aes(x = theta)) +
geom_density(color = "transparent", fill = canva_pal("Green fields")(4)[2]) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = expression(We~define~the~scale~as~theta==mu/alpha),
x = "posterior") +
coord_cartesian(xlim = c(0, 40))
```
Now we know how to get both $\alpha$ and $\theta$ from the model, we can pump them into `dgamma()` to get a sense of the model-implied gamma distribution, the presumed underlying distribution of $\lambda$ values that generated the `total_tools` data.
```{r, fig.width = 3.5, fig.height = 3, warning = F}
set.seed(12)
# wrangle to get 200 draws
post %>%
mutate(alpha = shape,
theta = exp(b_Intercept) / shape) %>%
slice_sample(n = 200) %>%
expand_grid(x = 0:250)%>%
mutate(density = dgamma(x, shape = alpha, scale = theta)) %>%
# plot
ggplot(aes(x = x, y = density)) +
geom_line(aes(group = .draw),
alpha = .1, color = canva_pal("Green fields")(4)[4]) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = expression("200 credible gamma densities for "*lambda),
x = expression(lambda)) +
coord_cartesian(xlim = c(0, 170),
ylim = c(0, 0.045))
```
Now we've warmed up with an intercept-only gamma-Poisson, it's time to fit a **brms** version of McElreath's `m12.2`. Our model formula will be
\begin{align*}
\text{total_tools_i} & \sim \operatorname{Gamma-Poisson} (\mu_i, \alpha) \\
\mu_i & = \exp (\beta_{0,\text{cid}[i]}) \text{population}_i^{\beta_{1,\text{cid}[i]}} / \gamma \\
\beta_{0,j} & \sim \operatorname{Normal}(1, 1) \\
\beta_{1,j} & \sim \operatorname{Exponential}(1) \\
\gamma & \sim \operatorname{Exponential}(1) \\
\alpha & \sim \operatorname{Exponential}(1),
\end{align*}
where $\mu$ and $\alpha$ and the mean and shape of the gamma distribution for the case-specific $\lambda$ parameters. Here's how we might fit that model with **brms**.
```{r b12.2b}
b12.2b <-
brm(data = d,
family = negbinomial(link = "identity"),
bf(total_tools ~ exp(b0) * population^b1 / g,
b0 + b1 ~ 0 + cid,
g ~ 1,
nl = TRUE),
prior = c(prior(normal(1, 1), nlpar = b0),
prior(exponential(1), nlpar = b1, lb = 0),
prior(exponential(1), nlpar = g, lb = 0),
prior(exponential(1), class = shape)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 12,
file = "fits/b12.02b")
```
Here is the model summary.
```{r}
print(b12.2b)
```
Compute and check the PSIS-LOO estimates along with their diagnostic Pareto $k$ values.
```{r, message = F}
b12.2b <- add_criterion(b12.2b, "loo")
loo(b12.2b)
```
One of those Pareto $k$ values is still on the high side. Can you guess which one that is?
```{r}
d %>%
mutate(k = b12.2b$criteria$loo$diagnostics$pareto_k) %>%
filter(k > .5) %>%
select(culture, k)
```
Before we can make our version of Figure 12.2, we'll need to reload `b11.11` from last chapter. One way is with the `readRDS()` function.
```{r}
b11.11 <- readRDS("fits/b11.11.rds")
```
Here we make the left panel of the figure.
```{r}
# the new data
nd <-
distinct(d, cid) %>%
expand_grid(population = seq(from = 0, to = 300000, length.out = 100))
p1 <-
# compute the expected trajectories
fitted(b11.11,
newdata = nd,
probs = c(.055, .945)) %>%
data.frame() %>%
bind_cols(nd) %>%
# plot
ggplot(aes(x = population, group = cid, color = cid)) +
geom_smooth(aes(y = Estimate, ymin = Q5.5, ymax = Q94.5, fill = cid),
stat = "identity",
alpha = 1/4, linewidth = 1/2) +
geom_point(data = bind_cols(d, b11.11$criteria$loo$diagnostics),
aes(y = total_tools, size = pareto_k),
alpha = 4/5) +
labs(subtitle = "pure Poisson model",
y = "total tools")
```
Now make the right panel.
```{r, fig.width = 3.5, fig.height = 3.25}
# for the annotation
text <-
distinct(d, cid) %>%
mutate(population = c(150000, 110000),
total_tools = c(57, 69),
label = str_c(cid, " contact"))
p2 <-
fitted(b12.2b,
newdata = nd,
probs = c(.055, .945)) %>%
data.frame() %>%
bind_cols(nd) %>%
ggplot(aes(x = population, group = cid, color = cid)) +
geom_smooth(aes(y = Estimate, ymin = Q5.5, ymax = Q94.5, fill = cid),
stat = "identity",
alpha = 1/4, linewidth = 1/2) +
geom_point(data = bind_cols(d, b12.2b$criteria$loo$diagnostics),
aes(y = total_tools, size = pareto_k),
alpha = 4/5) +
geom_text(data = text,
aes(y = total_tools, label = label)) +
scale_y_continuous(NULL, labels = NULL) +
labs(subtitle = "gamma-Poisson model")
```
Combine the two ggplots with **patchwork** syntax to make the full version of Figure 12.2.
```{r, fig.width = 7.25, fig.height = 3.25}
library(patchwork)
(p1 | p2) &
scale_fill_manual(values = canva_pal("Green fields")(4)[c(4, 1)]) &
scale_color_manual(values = canva_pal("Green fields")(4)[c(4, 1)]) &
scale_size(range = c(2, 5)) &
scale_x_continuous("population", breaks = c(0, 50000, 150000, 250000)) &
coord_cartesian(xlim = range(d$population),
ylim = range(d$total_tools)) &
theme(axis.ticks = element_blank(),
legend.position = "none")
```
Oh man!
> Recall that Hawaii was a highly influential point in the pure Poisson model. It does all the work of pulling the low-contact trend down. In this new model, Hawaii is still influential, but it exerts a lot less influence on the trends. Now the high and low contact trends are much more similar, very hard to reliably distinguish. This is because the gamma-Poisson model expects rate variation, and the estimated amount of variation is quite large. Population is still strongly related to the total tools, but the influence of contact rate has greatly diminished. (p. 374)
Before we move on, let's use `predict()` to generate posterior predictive distributions for each of our 10 cultures.
```{r, fig.width = 7, fig.height = 3, warning = F, message = F}
predict(b12.2b,
summary = F) %>%
data.frame() %>%
set_names(d$culture) %>%
pivot_longer(everything(),
names_to = "culture",
values_to = "lambda") %>%
left_join(d, by = "culture") %>%
ggplot(aes(x = lambda, y = 0)) +
stat_halfeye(point_interval = mean_qi, .width = .5,
fill = canva_pal("Green fields")(4)[2],
color = canva_pal("Green fields")(4)[1]) +
geom_vline(aes(xintercept = total_tools),
color = canva_pal("Green fields")(4)[3]) +
scale_x_continuous(expression(lambda["[culture]"]), breaks = 0:2 * 100) +
scale_y_continuous(NULL, breaks = NULL) +
coord_cartesian(xlim = c(0, 210)) +
facet_wrap(~ culture, nrow = 2)
```
Because we used predictors in the model, this time the posterior predictive distributions differ across the cultures. The mean and interquartile range of each distribution are marked off by the light-green dot and horizontal line below each. The vertical lines in the foreground mark off the corresponding `total_tools` values from the data. Recall that these distributions are not based on the `total_tools` values themselves, but rather are estimates of the $\lambda$ values from the underlying Poisson distributions that might have generated such `total_tools` values.
### Over-dispersion, entropy, and information criteria.
> In terms of model comparison using information criteria, a beta-binomial model is a binomial model, and a gamma-Poisson (negative-binomial) is a Poisson model.
>
> You should not use WAIC and PSIS with these models, however, unless you are very sure of what you are doing. The reason is that while ordinary binomial and Poisson models can be aggregated and disaggregated across rows in the data, without changing any causal assumptions, the same is not true of beta-binomial and gamma-Poisson models. The reason is that a beta-binomial or gamma-Poisson likelihood applies an unobserved parameter to each row in the data. When we then go to calculate log-likelihoods, how the data are structured will determine how the beta-distributed or gamma-distributed variation enters the model. (pp. 374--375)
## Zero-inflated outcomes
> Very often, the things we can measure are not emissions from any pure process. Instead, they are *mixtures* of multiple processes. Whenever there are different causes for the same observation, then a **mixture model** may be useful. A mixture model uses more than one simple probability distribution to model a mixture of causes. In effect, these models use more than one likelihood for the same outcome variable.
>
> Count variables are especially prone to needing a mixture treatment. The reason is that a count of zero can often arise more than one way. A "zero" means that nothing happened, and nothing can happen either because the rate of events is low or rather because the process that generates events failed to get started. (p. 376, **emphasis** in the original)
#### Rethinking: Breaking the law.
McElreath discussed how advances in computing have made it possible for working scientists to define their own data generating models. If you'd like to dive deeper into the topic, check out Bürkner's [-@Bürkner2022Define] vignette, [*Define custom response distributions with brms*](https://CRAN.R-project.org/package=brms/vignettes/brms_customfamilies.html).
### Example: Zero-inflated Poisson.
Do you remember the monk data from back in [Chapter 11][Example: Exposure and the offset.]? Here we simulate some more. This time we'll work in a little alcohol.
```{r}
# define parameters
prob_drink <- 0.2 # 20% of days
rate_work <- 1 # average 1 manuscript per day
# sample one year of production
n <- 365
# simulate days monks drink
set.seed(365)
drink <- rbinom(n, size = 1, prob = prob_drink)
# simulate manuscripts completed
y <- (1 - drink) * rpois(n, lambda = rate_work)
```
We'll put those data in a tidy tibble before plotting.
```{r, fig.width = 4.5, fig.height = 3}
d <-
tibble(drink = factor(drink, levels = 1:0),
y = y)
d %>%
ggplot(aes(x = y)) +
geom_histogram(aes(fill = drink),
binwidth = 1, linewidth = 1/10, color = "grey92") +
scale_fill_manual(values = canva_pal("Green fields")(4)[1:2]) +
xlab("Manuscripts completed") +
theme(legend.position = "none")
```
With these data, the likelihood of observing zero on `y`, (i.e., the likelihood zero manuscripts were completed on a given occasion) is
\begin{align*}
\operatorname{Pr}(0 | p, \lambda) & = \operatorname{Pr}(\text{drink} | p) + \operatorname{Pr}(\text{work} | p) \times \operatorname{Pr}(0 | \lambda) \\
& = p + (1 - p) \exp (- \lambda).
\end{align*}
And
> since the Poisson likelihood of $y$ is $\operatorname{Pr}(y | \lambda) = \lambda^y \exp (- \lambda) / y!$, the likelihood of $y = 0$ is just $\exp (- \lambda)$. The above is just the mathematics for:
>
>> *The probability of observing a zero is the probability that the monks didn't drink OR ($+$) the probability that the monks worked AND ($\times$) failed to finish anything*.
>
> And the likelihood of a non-zero value $y$ is:
>
> $$\operatorname{Pr}(y | y > 0, p, \lambda) = \operatorname{Pr}(\text{drink} | p) (0) + \operatorname{Pr}(\text{work} | p) \operatorname{Pr}(y | \lambda) = (1 - p) \frac {\lambda^y \exp (- \lambda)}{y!}$$
>
> Since drinking monks never produce $y > 0$, the expression above is just the chance the monks both work $1 - p$, and finish $y$ manuscripts. (pp. 377--378, *emphasis* in the original)
So letting $p$ be the probability $y$ is zero and $\lambda$ be the shape of the distribution, the zero-inflated Poisson ($\operatorname{ZIPoisson}$) regression model might take the basic form
\begin{align*}
y_i & \sim \operatorname{ZIPoisson}(\color{#5a5f37}{p_i}, \color{#524a3a}{\lambda_i})\\
\color{#5a5f37}{\operatorname{logit}(p_i)} & \color{#5a5f37}= \color{#5a5f37}{\alpha_p + \beta_p x_i} \\
\color{#524a3a}{\log (\lambda_i)} & \color{#524a3a}= \color{#524a3a}{\alpha_\lambda + \beta_\lambda x_i},
\end{align*}
where both parameters in the likelihood, $p_i$ and $\lambda_i$ might get their own statistical model, making this a special case of what Bürkner [-@Bürkner2022Distributional] calls distributional models. One last thing to note is that in **brms**, $p_i$ is denoted `zi`. To fit a zero-inflated Poisson model with **brms**, make sure to specify the correct likelihood with `family = zero_inflated_poisson`. To use a non-default prior for `zi`, make sure to indicate `class = zi` within the `prior()` function.
```{r b12.3}
b12.3 <-
brm(data = d,
family = zero_inflated_poisson,
y ~ 1,
prior = c(prior(normal(1, 0.5), class = Intercept),
prior(beta(2, 6), class = zi)), # the brms default is beta(1, 1)
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 12,
file = "fits/b12.03")
```
```{r}
print(b12.3)
```
If you look at the [*Zero-inflated and hurdle models* section](https://cran.r-project.org/package=brms/vignettes/brms_families.html#zero-inflated-and-hurdle-models) of Bürkner's [-@Bürkner2022Parameterization] [*Parameterization of response distributions in brms*](https://cran.r-project.org/package=brms/vignettes/brms_families.html) document, you'll see the zero-inflated Poisson is set up a little differently in **brms** than it is in **rethinking**. The difference did not influence the estimate for the intercept, $\lambda$. In both here and in the text, $\lambda$ was about zero. However, it did influence the summary of `zi`. Note how McElreath's `mean( inv_logit( post$ap ) )` returned 0.2241255, which seems rather close to our `zi` estimate of `r round(posterior_summary(b12.3)[2, 1], 3)`. Hopefully it's clear that `zi` in **brms** is already in the probability metric. There's no need to convert it. You can further confirm this by looking at the second line from the `print()` output, `Links: mu = log; zi = identity`. When there are no predictors for `zi`, the **brms** default is to use the identity link. In the text, however, McElreath used the logit link for `p`.
In the `prior` argument, we used `beta(2, 6)` for `zi` and also mentioned in the margin that the **brms** default is `beta(1, 1)`. The beta distribution ranges from 0 to 1, making it a natural distribution to use for priors on probabilities when using the identity link. To give you a sense of what those two versions of the beta look like, let's plot.
A typical way to plot a beta distribution would be to use the base **R** `dbeta()` function. Let's try a different approach, instead. The **tidybayes** package includes a `parse_dist()` function which takes the kind of string specifications you would usually include in the `brms::prior()` function and converts them into a format one can plot with. For example, here's what the preparatory work would be in our case.
```{r}
priors <-
c(prior(beta(1, 1)),
prior(beta(2, 6)))
priors %>%
parse_dist(prior)
```
The first several columns look a lot like the kind of output we'd get from the `brms::get_prior()` function. The `parse_dist()` function added those last two columns. Here we put them to work by feeding them into a ggplot.
```{r, fig.width = 3, fig.height = 2.5}
priors %>%
parse_dist(prior) %>%
ggplot(aes(y = prior, dist = .dist, args = .args, fill = prior)) +
stat_dist_halfeye(.width = .95) +
scale_fill_manual(values = canva_pal("Green fields")(4)[c(4, 1)]) +
scale_x_continuous("zi", breaks = c(0, .5, 1)) +
scale_y_discrete(expand = expansion(add = 0.1)) +
ylab(NULL) +
theme(legend.position = "none")
```
```{r, echo = F, eval = F}
# here's McElreath's prior for `ap`
tibble(n = rnorm(1e5, -1.5, 1)) %>%
mutate(p = inv_logit_scaled(n)) %>%
ggplot(aes(x = p)) +
geom_density(fill = canva_pal("Green fields")(4)[1])
```
Whereas the **brms** default is flat, our prior guided the posterior a bit toward 0. In case you were curious, we might write our statistical model for `b12.3` as
\begin{align*}
y_i & \sim \operatorname{ZIPoisson} (p, \lambda) \\
p & = \alpha_p \\
\log \lambda & = \alpha_\lambda \\
\alpha_p & \sim \operatorname{Beta}(2, 6) \\
\alpha_\lambda & \sim \operatorname{Normal}(1, 0.5).
\end{align*}
Anyway, here's that exponentiated $\alpha_\lambda$.
```{r}
fixef(b12.3)[1, ] %>%
exp()
```
#### Overthinking: Zero-inflated Poisson calculations in Stan.
If you're curious, here's the Stan code underlying our **brms** fit, `b12.3`.
```{r}
b12.3$model
```
## Ordered categorical outcomes
> It is very common in the social sciences, and occasional in the natural sciences, to have an outcome variable that is discrete, like a count, but in which the values merely indicate different ordered levels along some dimension. For example, if I were to ask you how much you like to eat fish,on a scale from 1 to 7, you might say 5. If I were to ask 100 people the same question, I'd end up with 100 values between 1 and 7. In modeling each outcome value, I'd have to keep in mind that these values are *ordered*, because 7 is greater than 6, which is greater than 5, and so on. The result is a set of **ordered categories**. Unlike a count, the differences in value are not necessarily equal....
>
> In principle, an ordered categorical variable is just a multinomial prediction problem (page 359). But the constraint that the categories be ordered demands special treatment....
>
> The conventional solution is to use a **cumulative link** function. The cumulative probability of a value is the probability of that value *or any smaller value*. In the context of ordered categories, the cumulative probability of 3 is the sum of the probabilities of 3, 2, and 1. Ordered categories by convention begin at 1, so a result less than 1 has no probability at all. By linking a linear model to cumulative probability, it is possible to guarantee the ordering of the outcomes. (p. 380, *emphasis* in the original)
### Example: Moral intuition.
Let's get the `Trolley` data from **rethinking** [see @cushmanRoleConsciousReasoning2006].
```{r, message = F}
data(Trolley, package = "rethinking")
d <- Trolley
rm(Trolley)
```
Use the `dplyr::glimpse()` to get a sense of the dimensions of the data.
```{r}
glimpse(d)
```
Though we have 9,930 rows, we only have 331 unique individuals.
```{r}
d %>%
distinct(id) %>%
count()
```
### Describing an ordered distribution with intercepts.
Make our version of the simple Figure 12.4 histogram of our primary variable, `response`.
```{r, fig.width = 2.5, fig.height = 3}
p1 <-
d %>%
ggplot(aes(x = response, fill = after_stat(x))) +
geom_histogram(binwidth = 1/4, linewidth = 0) +
scale_fill_gradient(low = canva_pal("Green fields")(4)[4],
high = canva_pal("Green fields")(4)[1]) +
scale_x_continuous(breaks = 1:7) +
theme(axis.ticks = element_blank(),
axis.title.y = element_text(angle = 90),
legend.position = "none")
p1
```
Our cumulative proportion plot, Figure 12.4.b, will require some pre-plot wrangling.
```{r, fig.width = 2.5, fig.height = 3}
p2 <-
d %>%
count(response) %>%
mutate(pr_k = n / nrow(d),
cum_pr_k = cumsum(pr_k)) %>%
ggplot(aes(x = response, y = cum_pr_k,
fill = response)) +
geom_line(color = canva_pal("Green fields")(4)[2]) +
geom_point(shape = 21, color = "grey92",
size = 2.5, stroke = 1) +
scale_fill_gradient(low = canva_pal("Green fields")(4)[4],
high = canva_pal("Green fields")(4)[1]) +
scale_x_continuous(breaks = 1:7) +
scale_y_continuous("cumulative proportion",
breaks = c(0, .5, 1), limits = c(0, 1)) +
theme(axis.ticks = element_blank(),
axis.title.y = element_text(angle = 90),
legend.position = "none")
p2
```
> Then to re-describe the histogram as log-cumulative odds, we'll need a series of intercept parameters. Each intercept will be on the log-cumulative-odds scale and stand in for the cumulative probability of each outcome. So this is just the application of the link function. The log-cumulative-odds that a response value $y_i$ is equal-to-or-less-than some possible outcome value $k$ is:
>
> $$\log \frac{\operatorname{Pr}(y_i \leq k)}{1 - \operatorname{Pr}(y_i \leq k)} = \alpha_k$$
>
> where $\alpha_k$ is an "intercept" unique to each possible outcome value $k$. (p. 383)
We can compute the $\alpha_k$ estimates directly with a little help from McElreath's custom `logit()` function.
```{r}
logit <- function(x) log(x / (1 - x)) # convenience function
d %>%
count(response) %>%
mutate(pr_k = n / nrow(d),
cum_pr_k = cumsum(n / nrow(d))) %>%
mutate(alpha = logit(cum_pr_k) %>% round(digits = 2))
```
Now we plot those joints to make our version of Figure 12.4.c.
```{r, fig.width = 2.5, fig.height = 3}
p3 <-
d %>%
count(response) %>%
mutate(cum_pr_k = cumsum(n / nrow(d))) %>%
filter(response < 7) %>%
# we can do the `logit()` conversion right in `ggplot()
ggplot(aes(x = response, y = logit(cum_pr_k), fill = response)) +
geom_line(color = canva_pal("Green fields")(4)[2]) +
geom_point(shape = 21, colour = "grey92",
size = 2.5, stroke = 1) +
scale_fill_gradient(low = canva_pal("Green fields")(4)[4],
high = canva_pal("Green fields")(4)[1]) +
scale_x_continuous(breaks = 1:7, limits = c(1, 7)) +
ylab("log-cumulative-odds") +
theme(axis.ticks = element_blank(),
axis.title.y = element_text(angle = 90),
legend.position = "none")
p3
```
Why not combine the three subplots with patchwork?
```{r, fig.width = 7, fig.height = 3}
(p1 | p2 | p3) +
plot_annotation(title = "Re-describing a discrete distribution using log-cumulative-odds.")
```
The code for Figure 12.5 is itself something of a monster.
```{r, fig.width = 3.5, fig.height = 3}
# primary data
d_plot <-
d %>%
count(response) %>%
mutate(pr_k = n / nrow(d),
cum_pr_k = cumsum(n / nrow(d))) %>%
mutate(discrete_probability = ifelse(response == 1, cum_pr_k, cum_pr_k - pr_k))
# annotation
text <-
tibble(text = 1:7,
response = seq(from = 1.25, to = 7.25, by = 1),
cum_pr_k = d_plot$cum_pr_k - .065)
d_plot %>%
ggplot(aes(x = response, y = cum_pr_k,
color = cum_pr_k, fill = cum_pr_k)) +
geom_line(color = canva_pal("Green fields")(4)[1]) +
geom_point(shape = 21, colour = "grey92",
size = 2.5, stroke = 1) +
geom_linerange(aes(ymin = 0, ymax = cum_pr_k),
alpha = 1/2, color = canva_pal("Green fields")(4)[1]) +
geom_linerange(aes(x = response + .025,
ymin = ifelse(response == 1, 0, discrete_probability),
ymax = cum_pr_k),
color = "black") +