-
Notifications
You must be signed in to change notification settings - Fork 39
/
Copy path03.Rmd
1034 lines (786 loc) · 39.3 KB
/
03.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
```{r, echo = F, cache = F}
knitr::opts_chunk$set(fig.retina = 2.5)
knitr::opts_chunk$set(fig.align = "center")
options(width = 100)
```
# Sampling the Imaginary
If you would like to know the probability someone is a vampire given they test positive to the blood-based vampire test, you compute
$$
\Pr(\text{vampire}|\text{positive}) = \frac{\Pr(\text{positive}|\text{vampire})\Pr(\text{vampire})}{\Pr(\text{positive})}.
$$
We'll do so within a tibble.
```{r, message = F, warning = F}
library(tidyverse)
tibble(pr_positive_vampire = .95,
pr_positive_mortal = .01,
pr_vampire = .001) %>%
mutate(pr_positive = pr_positive_vampire * pr_vampire + pr_positive_mortal * (1 - pr_vampire)) %>%
mutate(pr_vampire_positive = pr_positive_vampire * pr_vampire / pr_positive) %>%
glimpse()
```
Here's the other way of tackling the vampire problem, this time using the frequency format.
```{r}
tibble(pr_vampire = 100 / 100000,
pr_positive_vampire = 95 / 100,
pr_positive_mortal = 999 / 99900) %>%
mutate(pr_positive = 95 + 999) %>%
mutate(pr_vampire_positive = pr_positive_vampire * 100 / pr_positive) %>%
glimpse()
```
> The posterior distribution is a probability distribution. And like all probability distributions, we can imagine drawing *samples* from it. The sampled events in this case are parameter values. Most parameters have no exact empirical realization. The Bayesian formalism treats parameter distributions as relative plausibility, not as any physical random process. In any event, randomness is always a property of information, never of the real world. But inside the computer, parameters are just as empirical as the outcome of a coin flip or a die toss or an agricultural experiment. The posterior defines the expected frequency that different parameter values will appear, once we start plucking parameters out of it. [@mcelreathStatisticalRethinkingBayesian2020, p. 50, *emphasis* in the original]
## Sampling from a grid-approximate posterior
Once again, here we use grid approximation to generate samples.
```{r}
# how many grid points would you like?
n <- 1000
n_success <- 6
n_trials <- 9
(
d <-
tibble(p_grid = seq(from = 0, to = 1, length.out = n),
# note we're still using a flat uniform prior
prior = 1) %>%
mutate(likelihood = dbinom(n_success, size = n_trials, prob = p_grid)) %>%
mutate(posterior = (likelihood * prior) / sum(likelihood * prior))
)
```
Our `d` data contains all the components in McElreath's **R** code 3.2 block. Do note we've renamed his `prob_p` and `prob_data` as `prior` and `likelihood`, respectively. Now we'll use the `dplyr::slice_sample()` function to sample rows from `d`, saving them as `sample`.
```{r}
# how many samples would you like?
n_samples <- 1e4
# make it reproducible
set.seed(3)
samples <-
d %>%
slice_sample(n = n_samples, weight_by = posterior, replace = T)
glimpse(samples)
```
Now we can plot the left panel of Figure 3.1 with `geom_point()`. But before we do, we'll need to add a variable numbering the samples.
```{r, fig.width = 3.5, fig.height = 3}
samples %>%
mutate(sample_number = 1:n()) %>%
ggplot(aes(x = sample_number, y = p_grid)) +
geom_point(alpha = 1/10) +
scale_y_continuous("proportion of water (p)", limits = c(0, 1)) +
xlab("sample number")
```
We'll make the density in the right panel with `geom_density()`.
```{r, fig.width = 3.5, fig.height = 3}
samples %>%
ggplot(aes(x = p_grid)) +
geom_density(fill = "black") +
scale_x_continuous("proportion of water (p)", limits = c(0, 1))
```
That was based on `1e4` samples. On page 53, McElreath said the density would converge on the idealized shape if we keep increasing the number of samples. Here's what it looks like with `1e6`.
```{r, fig.width = 3.5, fig.height = 3}
set.seed(3)
d %>%
slice_sample(n = 1e6, weight_by = posterior, replace = T) %>%
ggplot(aes(x = p_grid)) +
geom_density(fill = "black") +
scale_x_continuous("proportion of water (p)", limits = c(0, 1))
```
Yep, that's more ideal.
## Sampling to summarize
"Once your model produces a posterior distribution, the model's work is done. But your work has just begun. It is necessary to summarize and interpret the posterior distribution. Exactly how it is summarized depends upon your purpose" (p. 53).
### Intervals of defined boundaries.
To get the proportion of water less than some value of `p_grid` within the **tidyverse**, you might first `filter()` by that value and then take the `sum()` within `summarise()`.
```{r}
d %>%
filter(p_grid < .5) %>%
summarise(sum = sum(posterior))
```
To learn more about `dplyr::summarise()` and related functions, check out Baert's [*Data wrangling part 4: Summarizing and slicing your data*](https://suzan.rbind.io/2018/04/dplyr-tutorial-4/) and [Section 5.6](https://r4ds.had.co.nz/transform.html#grouped-summaries-with-summarise) of *R4DS* [@grolemundDataScience2017].
If what you want is a frequency based on filtering by `samples`, then you might use `n()` within `summarise()`.
```{r}
samples %>%
filter(p_grid < .5) %>%
summarise(sum = n() / n_samples)
```
A more explicit approach for the same computation is to follow up `count()` with `mutate()`.
```{r}
samples %>%
count(p_grid < .5) %>%
mutate(probability = n / sum(n))
```
An even trickier approach for the same is to insert the logical statement `p_grid < .5` within the `mean()` function.
```{r}
samples %>%
summarise(sum = mean(p_grid < .5))
```
Much like McElreath discussed in the **Overthinking: Counting with `sum`** box, this works "because R internally converts a logical expression, like `samples < 0.5`, to a vector of `TRUE` and `FALSE` results, one for each element of `samples`, saying whether or not each element matches the criterion" (p. 54). When we inserted that vector of `TRUE` and `FALSE` values within the `mean()` function, they were then internally converted to a vector of 1's and 0's, the mean of which was the probability. Tricky!
To determine the posterior probability between 0.5 and 0.75, you can use `&` within `filter()`.
```{r}
samples %>%
filter(p_grid > .5 & p_grid < .75) %>%
summarise(sum = n() / n_samples)
```
Just multiply that value by 100 to get a percent.
```{r}
samples %>%
filter(p_grid > .5 & p_grid < .75) %>%
summarise(sum = n() / n_samples,
percent = n() / n_samples * 100)
```
And, of course, you can do that with our `mean()` trick, too.
```{r}
samples %>%
summarise(percent = 100 * mean(p_grid > .5 & p_grid < .75))
```
### Intervals of defined mass.
> It is more common to see scientific journals reporting an interval of defined mass, usually known as a **confidence interval**. An interval of posterior probability, such as the ones we are working with, may instead be called a **credible interval**. We're going to call it a **compatibility interval** instead, in order to avoid the unwarranted implications of "confidence" and "credibility." What the interval indicates is a range of parameter values compatible with the model and data. The model and data themselves may not inspire confidence, in which case the interval will not either. (p. 54, **emphasis** in the original)
As a part of this block quote, McElreath linked to endnote 54, which reads: "I learned this term from Sander Greenland and his collaborators. See @amrheinScientistsRiseStatistical2019 and Gelman and Greenland [-@gelmanAreConfidenceIntervals2019]" (p. 560).
We'll create the upper two panels for Figure 3.2 with `geom_line()`, `geom_area()`, some careful filtering, and a little patchwork syntax.
```{r, fig.width = 6, fig.height = 2.5}
# upper left panel
p1 <-
d %>%
ggplot(aes(x = p_grid, y = posterior)) +
geom_line() +
geom_area(data = d %>% filter(p_grid < .5)) +
labs(x = "proportion of water (p)",
y = "density")
# upper right panel
p2 <-
d %>%
ggplot(aes(x = p_grid, y = posterior)) +
geom_line() +
# note this next line is the only difference in code from the last plot
geom_area(data = d %>% filter(p_grid < .75 & p_grid > .5)) +
labs(x = "proportion of water (p)",
y = "density")
# combine
library(patchwork)
p1 + p2
```
We'll come back for the lower two panels in a bit.
Since we saved our `p_grid` samples within the well-named `samples` tibble, we'll have to index with `$` within `quantile`.
```{r}
(q_80 <- quantile(samples$p_grid, prob = .8))
```
That value will come in handy for the lower left panel of Figure 3.2. For an alternative approach, we could `select()` the `samples` vector, extract it from the tibble with `pull()`, and then pump it into `quantile()`.
```{r}
samples %>%
pull(p_grid) %>%
quantile(prob = .8)
```
We might also use `quantile()` within `summarise()`.
```{r}
samples %>%
summarise(`80th percentile` = quantile(p_grid, p = .8))
```
Here's the `summarise()` approach with two probabilities.
```{r}
samples %>%
summarise(`10th percentile` = quantile(p_grid, p = .1),
`90th percentile` = quantile(p_grid, p = .9))
```
The **tidyverse** approach is nice in that that family of functions typically returns a data frame. But sometimes you just want your values in a numeric vector for the sake of quick indexing. In that case, base **R** `quantile()` shines.
```{r}
(q_10_and_90 <- quantile(samples$p_grid, prob = c(.1, .9)))
```
Now we have our cutoff values saved as `q_80` and `q_10_and_90`, we're ready to make the bottom panels of Figure 3.2.
```{r, fig.width = 6, fig.height = 2.5}
# lower left panel
p1 <-
d %>%
ggplot(aes(x = p_grid, y = posterior)) +
geom_line() +
geom_area(data = d %>% filter(p_grid < q_80)) +
annotate(geom = "text",
x = .25, y = .0025,
label = "lower 80%") +
labs(x = "proportion of water (p)",
y = "density")
# lower right panel
p2 <-
d %>%
ggplot(aes(x = p_grid, y = posterior)) +
geom_line() +
geom_area(data = d %>% filter(p_grid > q_10_and_90[1] & p_grid < q_10_and_90[2])) +
annotate(geom = "text",
x = .25, y = .0025,
label = "middle 80%") +
labs(x = "proportion of water (p)",
y = "density")
# combine
p1 + p2
```
Now we follow along with McElreath's **R** code 3.11 to compute a highly skewed posterior. We've already defined `p_grid` and `prior` within `d`, above. Here we'll reuse them and update the rest of the columns.
```{r}
# here we update the `dbinom()` parameters
n_success <- 3
n_trials <- 3
# update `d`
d <-
d %>%
mutate(likelihood = dbinom(n_success, size = n_trials, prob = p_grid)) %>%
mutate(posterior = (likelihood * prior) / sum(likelihood * prior))
# make the next part reproducible
set.seed(3)
# here's our new samples tibble
(
samples <-
d %>%
slice_sample(n = n_samples, weight_by = posterior, replace = T)
)
```
The `rethinking::PI()` function works like a nice shorthand for `quantile()`.
```{r}
quantile(samples$p_grid, prob = c(.25, .75))
rethinking::PI(samples$p_grid, prob = .5)
```
Now's a good time to introduce Matthew Kay's [-@R-tidybayes] [**tidybayes** package](https://mjskay.github.io/tidybayes/), which offers an array of convenience functions for summarizing Bayesian models of the type we'll be working with in this project. For all the **brms**-related deets, see Kay's [-@kayExtractingVisualizingTidy2021] vignette, [*Extracting and visualizing tidy draws from brms models*](https://mjskay.github.io/tidybayes/articles/tidy-brms.html). Here we start simple.
```{r, message = F, warning = F}
library(tidybayes)
median_qi(samples$p_grid, .width = .5)
```
The **tidybayes** package contains a [family of functions](https://mjskay.github.io/tidybayes/articles/tidy-brms.html#point-summaries-and-intervals) that make it easy to summarize a distribution with a measure of central tendency accompanied by intervals. With `median_qi()`, we asked for the median and quantile-based intervals--just like we've been doing with `quantile()`. Note how the `.width` argument within `median_qi()` worked the same way the `prob` argument did within `rethinking::PI()`. With `.width = .5`, we indicated we wanted a quantile-based 50% interval, which was returned in the `ymin` and `ymax` columns. The tidybayes framework makes it easy to request multiple types of intervals. E.g., here we'll request 50%, 80%, and 99% intervals.
```{r}
median_qi(samples$p_grid, .width = c(.5, .8, .99))
```
The `.width` column in the output indexed which line presented which interval. The value in the `y` column remained constant across rows. That's because that column listed the measure of central tendency, the median in this case.
Now let's use the `rethinking::HPDI()` function to return 50% highest posterior density intervals (HPDIs).
```{r}
rethinking::HPDI(samples$p_grid, prob = .5)
```
The reason I introduce **tidybayes** now is that the functions of the **brms** package only support percentile-based intervals of the type we computed with `quantile()` and `median_qi()`. But **tidybayes** also supports HPDIs.
```{r}
mode_hdi(samples$p_grid, .width = .5)
```
This time we used the mode as the measure of central tendency. With this family of **tidybayes** functions, you specify the measure of central tendency in the prefix (i.e., `mean`, `median`, or `mode`) and then the type of interval you'd like (i.e., `qi` or `hdi`).
If all you want are the intervals without the measure of central tendency or all that other technical information, **tidybayes** also offers the handy `qi()` and `hdi()` functions.
```{r}
qi(samples$p_grid, .width = .5)
hdi(samples$p_grid, .width = .5)
```
These are nice in that they return simple numeric vectors, making them particularly useful to use as references within **ggplot2** plots. Now we have that skill, we can use it to make Figure 3.3.
```{r, fig.width = 5.5, fig.height = 2.5}
# lower left panel
p1 <-
d %>%
ggplot(aes(x = p_grid, y = posterior)) +
# check out our sweet `qi()` indexing
geom_area(data = d %>%
filter(p_grid > qi(samples$p_grid, .width = .5)[1] &
p_grid < qi(samples$p_grid, .width = .5)[2]),
fill = "grey75") +
geom_line() +
labs(subtitle = "50% Percentile Interval",
x = "proportion of water (p)",
y = "density")
# lower right panel
p2 <-
d %>%
ggplot(aes(x = p_grid, y = posterior)) +
geom_area(data = . %>%
filter(p_grid > hdi(samples$p_grid, .width = .5)[1] &
p_grid < hdi(samples$p_grid, .width = .5)[2]),
fill = "grey75") +
geom_line() +
labs(subtitle = "50% HPDI",
x = "proportion of water (p)",
y = "density")
# combine!
p1 | p2
```
In the `geom_area()` line for the HPDI plot, did you notice how we replaced `d` with `.`? When using the pipe (i.e., `%>%`), you can use the `.` as a placeholder for the original data object. It's an odd and handy trick to know about. Go [here](https://magrittr.tidyverse.org/reference/pipe.html) to learn more.
> So the HPDI has some advantages over the PI. But in most cases, these two types of interval are very similar. They only look so different in this case because the posterior distribution is highly skewed. If we instead used samples from the posterior distribution for six waters in nine tosses, these intervals would be nearly identical. Try it for yourself, using different probability masses, such as `prob=0.8` and `prob=0.95`. When the posterior is bell shaped, it hardly matters which type of interval you use. (p. 57)
Let's try it out. First we'll update the simulation for six waters in nine tosses.
```{r}
# "six waters in nine tosses"
n_success <- 6
n_trials <- 9
new_d <-
d %>%
mutate(likelihood = dbinom(n_success, size = n_trials, prob = p_grid)) %>%
mutate(posterior = (likelihood * prior) / sum(posterior))
set.seed(3)
new_samples <-
new_d %>%
slice_sample(n = n_samples, weight_by = posterior, replace = T)
```
Here are the intervals by `.width` and type of `.interval`.
```{r}
bind_rows(mean_hdi(new_samples$p_grid, .width = c(.8, .95)),
mean_qi(new_samples$p_grid, .width = c(.8, .95))) %>%
select(.width, .interval, ymin:ymax) %>%
arrange(.width) %>%
mutate_if(is.double, round, digits = 2)
```
We didn't need that last `mutate_if()` line. It just made it easier to compare the `ymin` and `ymax` values. Anyway, McElreath was right. This time the differences between the HPDIs and QIs were trivial. Here's a look at the posterior.
```{r, fig.width = 3, fig.height = 2.55}
new_d %>%
ggplot(aes(x = p_grid)) +
geom_line(aes(y = posterior)) +
labs(subtitle = "Six waters in nine tosses made\nfor a more symmetrical posterior",
x = "proportion of water (p)",
y = "density")
```
Be warned:
> The HPDI also has some disadvantages. HPDI is more computationally intensive than PI and suffers from greater *simulation variance*, which is a fancy way of saying that it is sensitive to how many samples you draw from the posterior. It is also harder to understand and many scientific audiences will not appreciate its features, while they will immediately understand a percentile interval, as ordinary non-Bayesian intervals are typically interpreted (incorrectly) as percentile intervals (pp. 57--58, *emphasis* in the original)
For convenience, we'll primarily stick to the PI-based intervals in this ebook.
#### Rethinking: What do compatibility intervals mean?
At the start of this section, McElreath poked a little at frequentist confidence intervals. For an introduction to confidence intervals from the perspective of a frequentist, you might check out Cumming's [-@cummingNewStatisticsWhy2014] [*The new statistics: Why and how*](https://journals.sagepub.com/doi/pdf/10.1177/0956797613504966) and the works referenced therein. Though their definition isn't the most intuitive, I usually use confidence intervals when I wear my frequentist hat.
### Point estimates.
We've been calling point estimates measures of central tendency. If we `arrange()` our `d` tibble in descending order by `posterior`, we'll see the corresponding `p_grid` value for its MAP estimate.
```{r}
d %>%
arrange(desc(posterior))
```
To emphasize it, we can use `slice()` to select the top row.
```{r}
d %>%
arrange(desc(posterior)) %>%
slice(1)
```
We can get the mode with `mode_hdi()` or `mode_qi()`.
```{r, warning = F}
samples %>% mode_hdi(p_grid)
samples %>% mode_qi(p_grid)
```
Those returned a lot of output in addition to the mode. If all you want is the mode itself, you can just use `tidybayes::Mode()`.
```{r}
Mode(samples$p_grid)
```
Medians and means are typical measures of central tendency, too.
```{r}
samples %>%
summarise(mean = mean(p_grid),
median = median(p_grid))
```
We can inspect the three types of point estimate in the left panel of Figure 3.4. First we'll bundle the three point estimates together in a tibble.
```{r, warning = F}
(
point_estimates <-
bind_rows(samples %>% mean_qi(p_grid),
samples %>% median_qi(p_grid),
samples %>% mode_qi(p_grid)) %>%
select(p_grid, .point) %>%
# these last two columns will help us annotate
mutate(x = p_grid + c(-.03, .03, -.03),
y = c(.0005, .0012, .002))
)
```
Now plot.
```{r, fig.width = 3.5, fig.height = 3}
d %>%
ggplot(aes(x = p_grid)) +
geom_area(aes(y = posterior),
fill = "grey75") +
geom_vline(xintercept = point_estimates$p_grid) +
geom_text(data = point_estimates,
aes(x = x, y = y, label = .point),
angle = 90) +
labs(x = "proportion of water (p)",
y = "density") +
theme(panel.grid = element_blank())
```
As it turns out "*different loss functions imply different point estimates*" (p. 59, *emphasis* in the original).
Let $p$ be the proportion of the Earth covered by water and $d$ be our guess. If McElreath pays us \$100 if we guess exactly right but subtracts money from the prize proportional to how far off we are, then our loss is proportional to $d - p$. If we decide $d = .5$, we can compute our expected loss.
```{r}
d %>%
summarise(`expected loss` = sum(posterior * abs(0.5 - p_grid)))
```
What McElreath did with `sapply()`, we'll do with `purrr::map()`. If you haven't used it, `map()` is part of a family of similarly-named functions (e.g., `map2()`) from the [**purrr** package](https://purrr.tidyverse.org) [@R-purrr], which is itself part of the [**tidyverse**](https://www.tidyverse.org). The `map()` family is the **tidyverse** alternative to the family of `apply()` functions from the base **R** framework. You can learn more about how to use the `map()` family [here](https://purrr.tidyverse.org/reference/map.html) or [here](https://jennybc.github.io/purrr-tutorial/ls01_map-name-position-shortcuts.html) or [here](https://data.library.virginia.edu/getting-started-with-the-purrr-package-in-r/).
```{r}
make_loss <- function(our_d) {
d %>%
mutate(loss = posterior * abs(our_d - p_grid)) %>%
summarise(weighted_average_loss = sum(loss))
}
(
l <-
d %>%
select(p_grid) %>%
rename(decision = p_grid) %>%
mutate(weighted_average_loss = purrr::map(decision, make_loss)) %>%
unnest(weighted_average_loss)
)
```
Now we're ready for the right panel of Figure 3.4.
```{r, fig.width = 3.5, fig.height = 3}
# this will help us find the x and y coordinates for the minimum value
min_loss <-
l %>%
filter(weighted_average_loss == min(weighted_average_loss)) %>%
as.numeric()
# the plot
l %>%
ggplot(aes(x = decision, y = weighted_average_loss)) +
geom_area(fill = "grey75") +
geom_vline(xintercept = min_loss[1], color = "white", linetype = 3) +
geom_hline(yintercept = min_loss[2], color = "white", linetype = 3) +
ylab("expected proportional loss") +
theme(panel.grid = element_blank())
```
We saved the exact minimum value as `min_loss[1]`, which is `r min_loss[1]`. Within sampling error, this is the posterior median as depicted by our `samples`.
```{r}
samples %>%
summarise(posterior_median = median(p_grid))
```
The quadratic loss $(d - p)^2$ suggests we should use the mean instead. Let's investigate.
```{r, fig.width = 3.5, fig.height = 3}
# amend our loss function
make_loss <- function(our_d) {
d %>%
mutate(loss = posterior * (our_d - p_grid)^2) %>%
summarise(weighted_average_loss = sum(loss))
}
# remake our `l` data
l <-
d %>%
select(p_grid) %>%
rename(decision = p_grid) %>%
mutate(weighted_average_loss = purrr::map(decision, make_loss)) %>%
unnest(weighted_average_loss)
# update to the new minimum loss coordinates
min_loss <-
l %>%
filter(weighted_average_loss == min(weighted_average_loss)) %>%
as.numeric()
# update the plot
l %>%
ggplot(aes(x = decision, y = weighted_average_loss)) +
geom_area(fill = "grey75") +
geom_vline(xintercept = min_loss[1], color = "white", linetype = 3) +
geom_hline(yintercept = min_loss[2], color = "white", linetype = 3) +
ylab("expected proportional loss") +
theme(panel.grid = element_blank())
```
Based on quadratic loss $(d - p)^2$, the exact minimum value is `r min_loss[1]`. Within sampling error, this is the posterior mean of our `samples`.
```{r}
samples %>%
summarise(posterior_meaan = mean(p_grid))
```
> Usually, research scientists don't think about loss functions. And so any point estimate like the mean or MAP that they may report isn't intended to support any particular decision, but rather to describe the shape of the posterior. You might argue that the decision to make is whether or not to accept an hypothesis. But the challenge then is to say what the relevant costs and benefits would be, in terms of the knowledge gained or lost. Usually it's better to communicate as much as you can about the posterior distribution, as well as the data and the model itself, so that others can build upon your work. Premature decisions to accept or reject hypotheses can cost lives. (p. 61)
In the endnote (62) linked to the end of that quote in the text, McElreath wrote: "See @hauerHarmDoneTests2004 for three tales from transportation safety in which testing resulted in premature incorrect decisions and a demonstrable and continuing loss of human life" (p. 561).
## Sampling to simulate prediction
McElreath's five good reasons for simulation were
1. model design
2. model checking,
3. software validation,
4. research design, and
5. forecasting.
### Dummy data.
Dummy data for the globe tossing model arise from the binomial likelihood. If you let $w$ be a count of water and $n$ be the number of tosses, the binomial likelihood is
$$\operatorname{Pr} (w|n, p) = \frac{n!}{w!(n - w)!} p^w (1 - p)^{n - w}.$$
Letting $n = 2$, $p(w) = .7$, and $w_\text{observed} = 0 \text{ through }2$, the densities are:
```{r}
tibble(n = 2,
`p(w)` = .7,
w = 0:2) %>%
mutate(density = dbinom(w, size = n, prob = `p(w)`))
```
If we're going to simulate, we should probably [set our seed](https://stackoverflow.com/questions/13605271/reasons-for-using-the-set-seed-function). Doing so makes the results reproducible.
```{r}
set.seed(3)
rbinom(1, size = 2, prob = .7)
```
Here are ten reproducible draws.
```{r}
set.seed(3)
rbinom(10, size = 2, prob = .7)
```
Now generate 100,000 (i.e., `1e5`) reproducible dummy observations.
```{r}
# how many would you like?
n_draws <- 1e5
set.seed(3)
d <- tibble(draws = rbinom(n_draws, size = 2, prob = .7))
d %>%
count(draws) %>%
mutate(proportion = n / nrow(d))
```
As McElreath mused in the text (p. 63), those simulated `proportion` values are very close to the analytically calculated values in our `density` column a few code blocks up.
Here's the simulation updated so $n = 9$, which we plot in our version of Figure 3.5.
```{r, fig.width = 3.5, fig.height = 3}
set.seed(3)
d <- tibble(draws = rbinom(n_draws, size = 9, prob = .7))
# the histogram
d %>%
ggplot(aes(x = draws)) +
geom_histogram(binwidth = 1, center = 0,
color = "grey92", linewidth = 1/10) +
scale_x_continuous("dummy water count", breaks = 0:4 * 2) +
ylab("frequency") +
coord_cartesian(xlim = c(0, 9)) +
theme(panel.grid = element_blank())
```
McElreath suggested we play around with different values of `size` and `prob`. With the next block of code, we'll simulate nine conditions.
```{r}
n_draws <- 1e5
simulate_binom <- function(n, probability) {
set.seed(3)
rbinom(n_draws, size = n, prob = probability)
}
d <-
crossing(n = c(3, 6, 9),
probability = c(.3, .6, .9)) %>%
mutate(draws = map2(n, probability, simulate_binom)) %>%
ungroup() %>%
mutate(n = str_c("n = ", n),
probability = str_c("p = ", probability)) %>%
unnest(draws)
head(d)
```
Let's plot the simulation results.
```{r, fig.width = 6, fig.height = 5}
d %>%
ggplot(aes(x = draws)) +
geom_histogram(binwidth = 1, center = 0,
color = "grey92", linewidth = 1/10) +
scale_x_continuous("dummy water count", breaks = 0:4 * 2) +
ylab("frequency") +
coord_cartesian(xlim = c(0, 9)) +
theme(panel.grid = element_blank()) +
facet_grid(n ~ probability)
```
### Model checking.
If you're new to applied statistics, you might be surprised how often mistakes arise.
#### Did the software work?
Let this haunt your dreams: "There is no way to really be sure that software works correctly" (p. 64).
If you'd like to dive deeper into these dark waters, check out one my favorite talks from StanCon 2018, [*Esther Williams in the Harold Holt Memorial Swimming Pool*](https://youtu.be/pKZLJPrZLhU?t=26285), by the ineffable [Dan Simpson](https://twitter.com/dan_p_simpson). If Simpson doesn't end up drowning you, see Gabry and Simpson's talk at the Royal Statistical Society 2018, [*Visualization in Bayesian workflow*](https://youtu.be/E8vdXoJId8M), a follow-up blog called [*Maybe it's time to let the old ways die; or We broke R-hat so now we have to fix it*](https://statmodeling.stat.columbia.edu/2019/03/19/maybe-its-time-to-let-the-old-ways-die-or-we-broke-r-hat-so-now-we-have-to-fix-it/), and that blog's associated pre-print by @vehtariRanknormalizationFoldingLocalization2019, [*Rank-normalization, folding, and localization: An improved $\widehat R$ for assessing convergence of MCMC*](https://arxiv.org/abs/1903.08008).
#### Is the model adequate?
> The implied predictions of the model are uncertain in two ways, and it's important to be aware of both.
>
> First, there is observation uncertainty. For any unique value of the parameter $p$, there is a unique implied pattern of observations that the model expects. These patterns of observations are the same gardens of forking data that you explored in the previous chapter. These patterns are also what you sampled in the previous section. There is uncertainty in the predicted observations, because even if you know $p$ with certainty, you won’t know the next globe toss with certainty (unless $p = 0$ or $p = 1$).
>
> Second, there is uncertainty about $p$. The posterior distribution over $p$ embodies this uncertainty. And since there is uncertainty about $p$, there is uncertainty about everything that depends upon $p$. The uncertainty in $p$ will interact with the sampling variation, when we try to assess what the model tells us about outcomes.
>
> We'd like to *propagate* the parameter uncertainty--carry it forward--as we evaluate the implied predictions. All that is required is averaging over the posterior density for $p$, while computing the predictions. For each possible value of the parameter $p$, there is an implied distribution of outcomes. So if you were to compute the sampling distribution of outcomes at each value of $p$, then you could average all of these prediction distributions together, using the posterior probabilities of each value of $p$, to get a **posterior predictive distribution**. (pp. 64--65, *emphasis* in the original)
All this is depicted in Figure 3.6. To get ready to make our version, let's first refresh our original grid approximation `d`.
```{r}
# how many grid points would you like?
n <- 1001
n_success <- 6
n_trials <- 9
(
d <-
tibble(p_grid = seq(from = 0, to = 1, length.out = n),
# note we're still using a flat uniform prior
prior = 1) %>%
mutate(likelihood = dbinom(n_success, size = n_trials, prob = p_grid)) %>%
mutate(posterior = (likelihood * prior) / sum(likelihood * prior))
)
```
We can make our version of the top of Figure 3.6 with a little tricky `filter`ing.
```{r, fig.width = 7, fig.height = 2}
d %>%
ggplot(aes(x = p_grid, y = posterior)) +
geom_area(color = "grey67", fill = "grey67") +
geom_segment(data = . %>%
filter(p_grid %in% c(seq(from = .1, to = .9, by = .1), 3 / 10)),
aes(xend = p_grid, yend = 0, linewidth = posterior),
color = "grey33", show.legend = F) +
geom_point(data = . %>%
filter(p_grid %in% c(seq(from = .1, to = .9, by = .1), 3 / 10))) +
annotate(geom = "text",
x = .08, y = .0025,
label = "Posterior probability") +
scale_linewidth_continuous(range = c(0, 1)) +
scale_x_continuous("probability of water", breaks = 0:10 / 10) +
scale_y_continuous(NULL, breaks = NULL) +
theme(panel.grid = element_blank())
```
Note how we weighted the widths of the vertical lines by the `posterior` density.
We'll need to do a bit of wrangling before we're ready to make the plot in the middle panel of Figure 3.6.
```{r}
n_draws <- 1e5
simulate_binom <- function(probability) {
set.seed(3)
rbinom(n_draws, size = 9, prob = probability)
}
d_small <-
tibble(probability = seq(from = .1, to = .9, by = .1)) %>%
mutate(draws = purrr::map(probability, simulate_binom)) %>%
unnest(draws) %>%
mutate(label = str_c("p = ", probability))
head(d_small)
```
Now we're ready to plot.
```{r, fig.width = 8, fig.height = 1.75, message = F}
d_small %>%
ggplot(aes(x = draws)) +
geom_histogram(binwidth = 1, center = 0,
color = "grey92", linewidth = 1/10) +
scale_x_continuous(NULL, breaks = 0:3 * 3) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Sampling distributions") +
coord_cartesian(xlim = c(0, 9)) +
theme(panel.grid = element_blank()) +
facet_wrap(~ label, ncol = 9)
```
To make the plot at the bottom of Figure 3.6, we'll redefine our `samples`, this time including the `w` variable (see the **R** code 3.26 block in the text).
```{r}
# how many samples would you like?
n_samples <- 1e4
# make it reproducible
set.seed(3)
samples <-
d %>%
slice_sample(n = n_samples, weight_by = posterior, replace = T) %>%
mutate(w = purrr::map_dbl(p_grid, rbinom, n = 1, size = 9))
glimpse(samples)
```
Here's our histogram.
```{r, fig.width = 3, fig.height = 2}
samples %>%
ggplot(aes(x = w)) +
geom_histogram(binwidth = 1, center = 0,
color = "grey92", linewidth = 1/10) +
scale_x_continuous("number of water samples",
breaks = 0:3 * 3) +
scale_y_continuous(NULL, breaks = NULL) +
ggtitle("Posterior predictive distribution") +
coord_cartesian(xlim = c(0, 9),
ylim = c(0, 3000)) +
theme(panel.grid = element_blank())
```
In Figure 3.7, McElreath considered the longest sequence of the sample values. We've been using `rbinom()` with the size parameter set to 9 for our simulations. E.g.,
```{r}
rbinom(10, size = 9, prob = .6)
```
Notice this collapsed (i.e., aggregated) over the sequences within the individual sets of 9. What we need is to simulate nine individual trials many times over. For example, this
```{r}
rbinom(9, size = 1, prob = .6)
```
would be the disaggregated version of just one of the numerals returned by `rbinom()` when `size = 9`. So let's try simulating again with un-aggregated samples. We'll keep adding to our `samples` tibble. In addition to the disaggregated `draws` based on the $p$ values listed in `p_grid`, we'll also want to add a row index for each of those `p_grid` values--it'll come in handy when we plot.
```{r}
# make it reproducible
set.seed(3)
samples <-
samples %>%
mutate(iter = 1:n(),
draws = purrr::map(p_grid, rbinom, n = 9, size = 1)) %>%
unnest(draws)
glimpse(samples)
```
The main action is in the `draws` column.
Now we have to count the longest sequences. The base **R** [`rle()` function](https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/rle) will help with that. Consider McElreath's sequence of tosses.
```{r}
tosses <- c("w", "l", "w", "w", "w", "l", "w", "l", "w")
```
You can plug that into `rle()`.
```{r}
rle(tosses)
```
For our purposes, we're interested in `lengths`. That tells us the length of each sequences of the same value. The `3` corresponds to our run of three `w`s. The `max()` function will help us confirm it's the largest value.
```{r}
rle(tosses)$lengths %>% max()
```
Now let's apply our method to the data and plot.
```{r, fig.width = 3.5, fig.height = 3, message = F}
samples %>%
group_by(iter) %>%
summarise(longest_run_length = rle(draws)$lengths %>% max()) %>%
ggplot(aes(x = longest_run_length)) +
geom_histogram(aes(fill = longest_run_length == 3),
binwidth = 1, center = 0,
color = "grey92", linewidth = 1/10) +
scale_fill_viridis_d(option = "D", end = .9) +
scale_x_continuous("longest run length", breaks = 0:3 * 3) +
ylab("frequency") +
coord_cartesian(xlim = c(0, 9)) +
theme(legend.position = "none",
panel.grid = element_blank())
```
Let's look at `rle()` again.
```{r}
rle(tosses)
```
We can use the length of the output (i.e., 7 in this example) as the numbers of switches from, in this case, "w" and "l".
```{r}
rle(tosses)$lengths %>% length()
```
With that new trick, we're ready to make the right panel of Figure 3.7.
```{r, fig.width = 3.5, fig.height = 3, message = F, warning = F}
samples %>%
group_by(iter) %>%
summarise(longest_run_length = rle(draws)$lengths %>% length()) %>%
ggplot(aes(x = longest_run_length)) +
geom_histogram(aes(fill = longest_run_length == 7),
binwidth = 1, center = 0,
color = "grey92", linewidth = 1/10) +
scale_fill_viridis_d(option = "D", end = .9) +
scale_x_continuous("number of switches", breaks = 0:3 * 3) +
ylab("frequency") +
coord_cartesian(xlim = c(0, 9)) +
theme(legend.position = "none",
panel.grid = element_blank())
```
## ~~Summary~~ Let's practice with **brms**
Open **brms**.
```{r, warning = F, message = F}
library(brms)
```
With **brms**, we'll fit the primary model of $w = 6$ and $n = 9$ much like we did at the end of [Chapter 2][Markov chain Monte Carlo].
```{r b3.1}
b3.1 <-
brm(data = list(w = 6),
family = binomial(link = "identity"),
w | trials(9) ~ 0 + Intercept,
# this is a flat prior
prior(beta(1, 1), class = b, lb = 0, ub = 1),
iter = 5000, warmup = 1000,
seed = 3,
file = "fits/b03.01")
```
We'll learn more about the beta distribution in [Chapter 12][Monsters and Mixtures]. But for now, here's the posterior summary for `b_Intercept`, the probability of a "w".
```{r}
posterior_summary(b3.1)["b_Intercept", ] %>%
round(digits = 2)
```
As we'll fully cover in the next chapter, `Estimate` is the posterior mean, the two `Q` columns are the quantile-based 95% intervals, and `Est.Error` is the posterior standard deviation.
Much like the way we used the `samples()` function to simulate probability values, above, we can do so with the `brms::fitted()` function. But we will have to specify `scale = "linear"` in order to return results in the probability metric. By default, `brms::fitted()` will return summary information. Since we want actual simulation draws, we'll specify `summary = F`.
```{r, warning = F}
f <-
fitted(b3.1,
summary = F,
scale = "linear") %>%
data.frame() %>%
set_names("p")
glimpse(f)
```
By default, we have a generically-named vector of 4,000 samples. We'll explain the defaults in later chapters. For now, notice we can view these in a density.
```{r, fig.width = 7, fig.height = 2, message = F}
f %>%
ggplot(aes(x = p)) +
geom_density(fill = "grey50", color = "grey50") +
annotate(geom = "text", x = .08, y = 2.5,
label = "Posterior probability") +
scale_x_continuous("probability of water",
breaks = c(0, .5, 1),
limits = 0:1) +
scale_y_continuous(NULL, breaks = NULL) +
theme(panel.grid = element_blank())
```
Looks a lot like the posterior probability density at the top of Figure 3.6, doesn't it? Much like we did with `samples`, we can use this distribution of probabilities to predict histograms of `w` counts. With those in hand, we can make an analogue to the histogram in the bottom panel of Figure 3.6.
```{r, fig.width = 3, fig.height = 2, warning = F, message = F}
# the simulation