-
Notifications
You must be signed in to change notification settings - Fork 39
/
Copy path16.Rmd
1533 lines (1196 loc) · 68 KB
/
16.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
```{r, echo = F, cache = F}
knitr::opts_chunk$set(fig.retina = 2.5)
knitr::opts_chunk$set(fig.align = "center")
options(width = 110)
```
# Generalized Linear Madness
> Applied statistics has to apply to all the sciences, and so it is often much vaguer about models. Instead it focuses on average performance, regardless of the model. The generalized linear models in the preceding chapters are not credible scientific models of most natural processes. They are powerful, geocentric ([Chapter 4][Geocentric Models]) descriptions of associations. In combination with a logic of causal inference, for example DAGs and *do*-calculus, generalized linear models can nevertheless be unreasonably powerful.
>
> But there are problems with this GLMs-plus-DAGs approach. Not everything can be modeled as a GLM—a linear combination of variables mapped onto a non-linear outcome. But if it is the only approach you know, then you have to use it....
>
> In this chapter, I will go **beyond generalized linear madness**. I'll work through examples in which the scientific context provides a causal model that will breathe life into the statistical model. I've chosen examples which are individually distinct and highlight different challenges in developing and translating causal models into *bespoke* (see the Rethinking ~~box~~ [section] below) statistical models. You won't require any specialized scientific expertise to grasp these examples. And the basic strategy is the same as it has been from the start: Define a generative model of a phenomenon and then use that model to design strategies for causal inference and statistical estimation. [@mcelreathStatisticalRethinkingBayesian2020, p. 525, *emphasis* in the original]
McElreath then reported he was going to work with Stan code, via `rstan::stan()`, in this chapter because of the unique demands of some of the models. Our approach will be mixed. We can fit at least a few of the models with **brms**, particularly with help from the non-linear syntax. However, some of the models to come are either beyond the current scope of **brms** or are at least beyond my current skill set. In those cases, we'll follow McElreath's approach and fit the models with `stan()`.
#### Rethinking: Bespoken for.
> Mass production has some advantages, but it also makes our clothes fit badly. Garments bought off-the-shelf are not manufactured with you in mind. They are not *bespoke* products, designed for any particular person with a particular body. Unless you are lucky to have a perfectly average body shape, you will need a tailor to get better.
>
> Statistical analyses are similar. Generalized linear models are off-the-shelf products, mass produced for a consumer market of impatient researchers with diverse goals. Science asked statisticians for tools that could be used anywhere. And so they delivered. But the clothes don't always fit. (p. 526, *emphasis* in the original)
## Geometric people
> Back in [Chapter 4][Geocentric Models], you met linear regression in the context of building a predictive model of height using weight. You even saw how to measure non-linear associations between the two variables. But nothing in that example was scientifically satisfying. The height-weight model was just a statistical device. It contains no biological information and tells us nothing about how the association between height and weight arises. Consider for example that weight obviously does not *cause* height, at least not in humans. If anything, the causal relationship is the reverse.
>
> So now let's try to do better. Why? Because when the model is scientifically inspired, rather than just statistically required, disagreements between model and data are informative of real causal relationships.
>
> Suppose for example that a person is shaped like a cylinder. Of course a person isn't exactly shaped like a cylinder. There are arms and a head. But let's see how far this cylinder model gets us. The weight of the cylinder is a consequence of the volume of the cylinder. And the volume of the cylinder is a consequence of growth in the height and width of the cylinder. So if we can relate the height to the volume, then we’d have a model to predict weight from height. (p. 526, *emphasis* in the original)
### The scientific model.
If we let $V$ stand for volume, $r$ stand for a radius, and $h$ stand for height, we can solve for volume by
$$V = \pi r^2 h.$$
If we further presume a person's radius is unknown, but some proportion ($p$) of height ($ph$), we can rewrite the formula as
\begin{align*}
V & = \pi (ph)^2 h \\
& = \pi p^2 h^3.
\end{align*}
Though we're not interested in volume per se, we might presume weight is some proportion of volume. Thus we could include a final parameter $k$ to stand for the conversion form weight to volume, leaving us with the formula
$$W = kV = k \pi p^2 h^3,$$
where $W$ denotes weight.
### The statistical model.
For one last time, together, let's load the `Howell1` data.
```{r, warning = F, message = F}
library(tidyverse)
data(Howell1, package = "rethinking")
d <- Howell1
rm(Howell1)
# scale observed variables
d <-
d %>%
mutate(w = weight / mean(weight),
h = height / mean(height))
```
McElreath's proposed statistical model follows the form
\begin{align*}
\text{w}_i & \sim \operatorname{Log-Normal}(\mu_i, \sigma) \\
\exp(\mu_i) & = k \pi p^2 \text{h}_i^3 \\
k & \sim \operatorname{Exponential}(0.5) \\
p & \sim \operatorname{Beta}(2, 18) \\
\sigma & \sim \operatorname{Exponential}(1), && \text{where} \\
\text w_i & = \text{weight}_i \big / \overline{\text{weight}}, && \text{and} \\
\text h_i & = \text{height}_i \big / \overline{\text{height}}.
\end{align*}
The Log-Normal likelihood ensures the predictions for $\text{weight}_i$ will always be non-negative. Because our parameter $p$ is the ratio of radius to height, $p = r / h$, it must be positive. Since people are typically taller than their width, it should also be less than one, and probably substantially less than that. Our next step will be taking a look at our priors.
For the plots in this chapter, we'll give a nod the minimalistic plots in the authoritative text by @gelman2013bayesian, [*Bayesian data analysis: Third edition*](https://stat.columbia.edu/~gelman/book/). Just to be a little kick, we'll set the font to `family = "Times"`. Most of the adjustments will come from `ggthemes::theme_base()`.
```{r, message = F, warninf = F}
library(ggthemes)
theme_set(
theme_base(base_size = 12) +
theme(text = element_text(family = "Times"),
axis.text = element_text(family = "Times"),
axis.ticks = element_line(linewidth = 0.25),
axis.ticks.length = unit(0.1, "cm"),
panel.background = element_rect(linewidth = 0.1),
plot.background = element_blank(),
)
)
```
Now we have our theme, let's get a sense of our priors.
```{r, message = F, warning = F, fig.width = 8, fig.height = 2.75}
library(tidybayes)
library(brms)
c(prior(beta(2, 18), nlpar = p, coef = italic(p)),
prior(exponential(0.5), nlpar = p, coef = italic(k)),
prior(exponential(1), class = sigma, coef = sigma)) %>%
parse_dist(prior) %>%
ggplot(aes(y = 0, dist = .dist, args = .args)) +
stat_dist_halfeye(.width = .5, size = 1, p_limits = c(0, 0.9995),
n = 2e3, normalize = "xy") +
scale_y_continuous(NULL, breaks = NULL) +
xlab(expression(theta)) +
facet_wrap(~ coef, scales = "free_x", labeller = label_parsed)
```
Here the points are the posterior medians and the horizontal lines the quantile-based 50% intervals. Turns out that $\operatorname{Beta}(2, 18)$ prior for $p$ pushes the bulk of the prior mass down near zero. The beta distribution also forces the parameter space for $p$ to range between 0 and 1. If we denote the two parameters of the beta distribution as $\alpha$ and $\beta$, we can compute the mean for any beta distribution as $\alpha / (\alpha + \beta)$. Thus the mean for our $\operatorname{Beta}(2, 18)$ prior is $2 / (2 + 18) = 2 / 20 = 0.1$.
Because we computed our weight and height variables, `w` and `h`, by dividing the original variables by their respective means, each now has a mean of 1.
```{r, message = F}
d %>%
pivot_longer(w:h) %>%
group_by(name) %>%
summarise(mean = mean(value))
```
Here's their bivariate distribution in a scatter plot.
```{r, fig.width = 5, fig.height = 3}
d %>%
ggplot(aes(x = h, y = w)) +
geom_vline(xintercept = 1, linetype = 2, linewidth = 1/4, color = "grey50") +
geom_hline(yintercept = 1, linetype = 2, linewidth = 1/4, color = "grey50") +
geom_point(size = 1/4)
```
With this scaling, here is the formula for an individual with average weight and height:
\begin{align*}
1 & = k \pi p^2 1^3 \\
& = k \pi p^2.
\end{align*}
If you assume $p < .5$, $k$ must be greater than 1. $k$ also has to be positive. To get a sense of this, we can further work the algebra:
\begin{align*}
1 & = k \pi p^2 \\
1/k & = \pi p^2 \\
k & = 1 / \pi p^2.
\end{align*}
To get a better sense of that relation, we might plot.
```{r, fig.width = 5, fig.height = 3}
tibble(p = seq(from = 0.001, to = 0.499, by = 0.001)) %>%
mutate(k = 1 / (pi * p^2)) %>%
ggplot(aes(x = p, y = k)) +
geom_line() +
labs(x = expression(italic(p)),
y = expression(italic(k))) +
coord_cartesian(ylim = c(0, 500))
```
McElreath's quick and dirty solution was to set $k \sim \operatorname{Exponential}(0.5)$, which has a prior predictive mean of 2.
By setting up his model formula as `exp(mu) = ...`, McElreath effectively used the log link. It turns out that **brms** only supports the `identity` and `inverse` links for `family = lognormal`. However, we can sneak in the log link by nesting the right-hand side of the formula within `log()`.
```{r b16.1}
b16.1 <-
brm(data = d,
family = lognormal,
bf(w ~ log(3.141593 * k * p^2 * h^3),
k + p ~ 1,
nl = TRUE),
prior = c(prior(beta(2, 18), nlpar = p, lb = 0, ub = 1),
prior(exponential(0.5), nlpar = k, lb = 0),
prior(exponential(1), class = sigma)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 16,
file = "fits/b16.01")
```
Check the parameter summary.
```{r}
print(b16.1)
```
McElreath didn't show the parameter summary for his `m16.1` in the text. If you fit the model with both **rethinking** and **brms**, you'll see our `b16.1` matches up quite well. To make our version of Figure 16.2, we'll use a `GGally::ggpairs()` workflow. First we'll save our customizes settings for the three subplot types.
```{r}
my_lower <- function(data, mapping, ...) {
# get the x and y data to use the other code
x <- eval_data_col(data, mapping$x)
y <- eval_data_col(data, mapping$y)
# compute the correlations
corr <- cor(x, y, method = "p", use = "pairwise")
abs_corr <- abs(corr)
# plot the cor value
ggally_text(
label = formatC(corr, digits = 2, format = "f") %>% str_replace(., "0.", "."),
mapping = aes(),
size = 3.5,
color = "black",
family = "Times") +
scale_x_continuous(NULL, breaks = NULL) +
scale_y_continuous(NULL, breaks = NULL)
}
my_diag <- function(data, mapping, ...) {
ggplot(data = data, mapping = mapping) +
geom_histogram(linewidth = 1/4, color = "white", fill = "grey67", bins = 20) +
scale_x_continuous(NULL, breaks = NULL) +
scale_y_continuous(NULL, breaks = NULL)
}
my_upper <- function(data, mapping, ...) {
ggplot(data = data, mapping = mapping) +
geom_point(size = 1/4, alpha = 1/4) +
scale_x_continuous(NULL, breaks = NULL) +
scale_y_continuous(NULL, breaks = NULL)
}
```
Now we make our version of Figure 16.2.a.
```{r, warning = F, message = F, fig.width = 4.5, fig.height = 4}
library(GGally)
as_draws_df(b16.1) %>%
select(b_k_Intercept:sigma) %>%
set_names(c("italic(k)", "italic(p)", "sigma")) %>%
ggpairs(upper = list(continuous = my_upper),
diag = list(continuous = my_diag),
lower = list(continuous = my_lower),
labeller = label_parsed) +
theme(strip.text = element_text(size = 8),
strip.text.y = element_text(angle = 0))
```
We see the lack of identifiability of $k$ and $p$ resulted in a strong inverse relation between them. Now here's how we might make Figure 16.2.b.
```{r, fig.width = 5, fig.height = 3}
nd <-
tibble(h = seq(from = 0, to = 1.5, length.out = 50))
p <-
predict(b16.1,
newdata = nd) %>%
data.frame() %>%
bind_cols(nd)
d %>%
ggplot(aes(x = h)) +
geom_smooth(data = p,
aes(y = Estimate, ymin = Q2.5, ymax = Q97.5),
stat = "identity",
fill = "grey67", color = "black", linewidth = 1/4) +
geom_point(aes(y = w),
size = 1/3) +
coord_cartesian(xlim = c(0, max(d$h)),
ylim = c(0, max(d$w))) +
labs(x = "height (scaled)",
y = "weight (scaled)")
```
Overall the model did okay, but the poor fit for the cases with lower values of height and weight suggests we might be missing important differences between children and adults.
### GLM in disguise.
Recall that because **brms** does not support the log link for the Log-Normal likelihood, we recast our `b16.1` likelihood as
\begin{align*}
\text{w}_i & \sim \operatorname{Log-Normal}(\mu_i, \sigma) \\
\mu_i & = \log(k \pi p^2 \text{h}_i^3).
\end{align*}
Because multiplication becomes addition on the log scale, we can also express this as
\begin{align*}
\text{w}_i & \sim \operatorname{Log-Normal}(\mu_i, \sigma) \\
\mu_i & = \log(k) + \log(\pi) + 2 \log(p) + 3 \log(\text{h}_i),
\end{align*}
which means our fancy non-linear model is just linear regression on the log scale. McElreath pointed this out
> to highlight one of the reasons that generalized linear models are so powerful. Lots of natural relationships are GLM relationships, on a specific scale of measurement. At the same time, the GLM approach wants to simply estimate parameters which may be informed by a proper theory, as in this case. (p. 531)
## Hidden minds and observed behavior
Load the `Boxes` data [@vanleeuwenDevelopmentHumanSocial2018].
```{r}
data(Boxes, package = "rethinking")
d <- Boxes
rm(Boxes)
rethinking::precis(d)
```
The data are from 629 children.
```{r}
d %>% count()
```
Their ages ranged from 4 to 14 years and, as indicated by the histogram above, the bulk were on the younger end.
```{r}
d %>%
summarise(min = min(age),
max = max(age))
```
Here's a depiction of our criterion variable `y`.
```{r, fig.width = 6.25, fig.height = 2}
# wrangle
d %>%
mutate(color = factor(y,
levels = 1:3,
labels = c("unchosen", "majority", "minority"))) %>%
mutate(y = str_c(y, " (", color, ")")) %>%
count(y) %>%
mutate(percent = (100 * n / sum(n))) %>%
mutate(label = str_c(round(percent, digits = 1), "%")) %>%
# plot
ggplot(aes(y = y)) +
geom_col(aes(x = n)) +
geom_text(aes(x = n + 4, label = label),
hjust = 0, family = "Times") +
scale_x_continuous(expression(italic(n)), expand = expansion(mult = c(0, 0.1))) +
labs(title = "The criterion variable y indexes three kinds of color choices",
subtitle = "Children tended to prefer the 'majority' color.",
y = NULL) +
theme(axis.text.y = element_text(hjust = 0),
axis.ticks.y = element_blank())
```
### The scientific model.
> The key, as always, is to think generatively. Consider for example a group of children in which half of them choose at random and the other half follow the majority. If we simulate choices for these children, we can figure out how often we might see the "2" choice, the one that indicates the majority color. (p. 532)
Here's an alternative way to set up McEreath's **R** code 16.6.
```{r}
set.seed(16)
n <- 30 # number of children
tibble(name = rep(c("y1", "y2"), each = n / 2),
value = c(sample(1:3, size = n / 2, replace = T),
rep(2, n / 2))) %>%
mutate(y = sample(value)) %>%
summarise(`number of 2s` = sum(y == 2) / n)
```
> We'll consider 5 different strategies children might use.
>
> 1. Follow the Majority: Copy the majority demonstrated color.
> 2. Follow the Minority: Copy the minority demonstrated color.
> 3. Maverick: Choose the color that no demonstrator chose.
> 4. Random: Choose a color at random, ignoring the demonstrators.
> 5. Follow First: Copy the color that was demonstrated first. This was either the majority color (when `majority_first` equals 1) or the minority color (when 0). (p. 533)
### The statistical model.
Our statistical will follow the form
\begin{align*}
y_i & \sim \operatorname{Categorical}(\theta) \\
\theta_j & = \sum_{s = 1}^5 p_s \operatorname{Pr}(j | s) \;\;\; \text{for} \; j = 1, \dots, 3 \\
p & \sim \operatorname{Dirichlet}(4, 4, 4, 4, 4),
\end{align*}
where $s$ indexes one of our five latent strategies and $\operatorname{Pr}(j | s)$ is the probability of one of the three color choices given a child is using the $s$th strategy. The $\sum_{s = 1}^5 p_s$ portion conveys these five probabilities are treated as a simplex, which means they will sum to one. We give these probabilities a Dirichlet prior, which we learned about back in [Section 12.4][Ordered categorical predictors]. Here's what that prior looks like.
```{r, fig.width = 8, fig.height = 2.25}
set.seed(16)
rdirichlet(n = 1e5, alpha = rep(4, times = 5)) %>%
data.frame() %>%
set_names(1:5) %>%
pivot_longer(everything()) %>%
mutate(name = name %>% as.double(),
alpha = str_c("alpha[", name, "]")) %>%
ggplot(aes(x = value, group = name)) +
geom_histogram(fill = "grey67", binwidth = .02, boundary = 0) +
scale_x_continuous(expression(italic(p[s])), limits = c(0, 1),
breaks = c(0, .2, .5, 1), labels = c("0", ".2", ".5", "1"), ) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = expression("Dirichlet"*(4*", "*4*", "*4*", "*4*", "*4))) +
facet_wrap(~ alpha, labeller = label_parsed, nrow = 1)
```
### Coding the statistical model.
```{r, eval = F, echo = F}
# doesn't work
b16.2 <-
brm(data = d,
family = cumulative,
y ~ 1,
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 16)
# doesn't work
b16.2 <-
brm(data = d,
family = mixture(cumulative, cumulative, cumulative, cumulative, cumulative),
y ~ 1,
prior= c(prior(dirichlet(4, 4, 4, 4, 4), theta),
prior(constant(), class = Intercept, coef = , dpar = mu1),
prior(constant(), class = Intercept, coef = , dpar = mu1),
prior(constant(), class = Intercept, coef = , dpar = mu2),
prior(constant(), class = Intercept, coef = , dpar = mu2),
prior(constant(), class = Intercept, coef = , dpar = mu3),
prior(constant(), class = Intercept, coef = , dpar = mu3),
prior(constant(), class = Intercept, coef = , dpar = mu4),
prior(constant(), class = Intercept, coef = , dpar = mu4),
prior(constant(), class = Intercept, coef = , dpar = mu5),
prior(constant(), class = Intercept, coef = , dpar = mu5)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 16)
# I don't think the `constant()` prior trick will work, here. Rather, I suspect this will require a custom likelihood. If you look at McElreath's Stan code at the top of page 536, I think that's him defining a custom likelihood.
```
Let's take a look at McElreath's Stan code.
```{r, warning = F, message = F}
library(rethinking)
data(Boxes_model)
cat(Boxes_model)
```
I'm not aware that one can fit this model directly with **brms**. My guess is that if it's possible, it would require a custom likelihood [see @Bürkner2022Define]. If you can reproduce McElreath's Stan code with this or some other approach in **brms**, please [share your code on GitHub](https://github.com/ASKurz/Statistical_Rethinking_with_brms_ggplot2_and_the_tidyverse_2_ed/issues). In the mean time, we're going to follow along with the text and fit the model with `rstan::stan()`.
```{r m16.2, echo = F}
# save(list = c("dat_list", "m16.2"), file = "fits/m16.2.rda")
load(file = "fits/m16.2.rda")
```
```{r, eval = F}
# prep data
dat_list <- list(
N = nrow(d),
y = d$y,
majority_first = d$majority_first )
# run the sampler
m16.2 <-
stan(model_code = Boxes_model,
data = dat_list,
chains = 3, cores = 3,
seed = 16)
```
Here's what it looks like if you `print()` a fit object from the `stan()` function.
```{r}
print(m16.2)
```
Here's the summary with `rethinking::precis()`.
```{r, warning = F, message = F}
precis(m16.2, depth = 2)
```
Here we show marginal posterior for $p_s$.
```{r, fig.width = 5, fig.height = 1.75}
label <- c("Majority~(italic(s)[1])",
"Minority~(italic(s)[2])",
"Maverick~(italic(s)[3])",
"Random~(italic(s)[4])",
"Follow~First~(italic(s)[5])")
precis(m16.2, depth = 2) %>%
data.frame() %>%
mutate(name = factor(label, levels = label)) %>%
mutate(name = fct_rev(name)) %>%
ggplot(aes(x = mean, xmin = X5.5., xmax = X94.5., y = name)) +
geom_vline(xintercept = .2, linewidth = 1/4, linetype = 2) +
geom_pointrange(linewidth = 1/4, fatten = 6/4) +
scale_x_continuous(expression(italic(p[s])), limits = c(0, 1),
breaks = 0:5 / 5) +
scale_y_discrete(NULL, labels = ggplot2:::parse_safe) +
theme(axis.text.y = element_text(hjust = 0))
```
As an alternative, this might be a good time to revisit the `tidybayes::stat_ccdfinterval()` approach (see [Section 12.3.3][Adding predictor variables.]), which will depict those posteriors with a bar plot where the ends of the bars depict our uncertainty in terms of cumulative density curves.
```{r, fig.width = 5, fig.height = 2.5}
extract.samples(m16.2) %>%
data.frame() %>%
select(-lp__) %>%
set_names(label) %>%
pivot_longer(everything()) %>%
mutate(name = factor(name, levels = label)) %>%
mutate(name = fct_rev(name)) %>%
ggplot(aes(x = value, y = name)) +
geom_vline(xintercept = .2, linewidth = 1/4, linetype = 2) +
stat_ccdfinterval(.width = .95, size = 1/4, alpha = .8) +
scale_x_continuous(expression(italic(p[s])), expand = c(0, 0), limits = c(0, 1),
breaks = 0:5 / 5) +
scale_y_discrete(NULL, labels = ggplot2:::parse_safe) +
theme(axis.text.y = element_text(hjust = 0),
axis.ticks.y = element_blank())
```
### State space models.
In this section of the text, McElreath mentioned state space models and hidden Markov models. Based on [this thread](https://discourse.mc-stan.org/t/state-space-models-with-brms/4087) in the Stan forums and [this issue](https://github.com/paul-buerkner/brms/issues/173) in the **brms** GitHub repo, it looks like state space models are not supported in **brms** at this time. As for hidden Markov models, I'm not sure whether they are supported by **brms**. The best I could find was [this thread](https://discourse.mc-stan.org/t/modelling-with-non-binary-proportions-as-outcome/12324) on the Stan forums. If you know more about either of these topics, please share your knowledge [on this book's GitHub repo](https://github.com/ASKurz/Statistical_Rethinking_with_brms_ggplot2_and_the_tidyverse_2_ed/issues).
## Ordinary differential nut cracking
Load the `Panda_nuts` data [@boeschLearningCurvesTeaching2019].
```{r}
data(Panda_nuts, package = "rethinking")
d <- Panda_nuts
rm(Panda_nuts)
```
Anticipating McElreath's **R** code 16.11, we'll wrangle a little.
```{r}
d <-
d %>%
mutate(n = nuts_opened,
age_s = age / max(age))
glimpse(d)
```
Our criterion is `n`, the number of *Panda* nuts opened by a chimpanzee on a given occasion. The two focal predictor variables are `age` and `seconds`. Here they are depicted in a pairs plot.
```{r, warning = F, message = F, fig.width = 4.5, fig.height = 4}
d %>%
select(n, age, seconds) %>%
ggpairs(upper = list(continuous = my_upper),
diag = list(continuous = my_diag),
lower = list(continuous = my_lower)) +
theme(strip.text.y = element_text(hjust = 0, angle = 0))
```
### Scientific model.
As a starting point, McElreath proposed we the strength of a chimpanzee would relate to the number of nuts they might open. We don't have a measure of strength, but we do have `age`, which is a proxy for how close a chimp might be to their maximum body size, and we presume body size would be proportional to strength. If we let $t$ index time, $M_\text{max}$ be the maximum body size (mass), $M_t$ be the current body size, and $k$ stand for the rate of skill gain the comes with age, we can write
$$M_t = M_\text{max} [1 - \exp(-kt) ]$$
to solve for mass at a given age [@vonbertalanffyUntersuchungenUeberGesetzlichkeit1934]. But again, we actually care about strength, not mass. Letting $S_t$ be strength at time $t$, we can express a proportional relation between the two as $S_t = \beta M_t$. Now if we let $\lambda$ stand in for the number of nuts opening, $\alpha$ express the relation of strength to nut opening, we can write
$$\lambda = \alpha S_t^\theta = \alpha \big ( \beta M_\text{max} [1 - \exp(-kt) ] \big ) ^\theta,$$
"where $\theta$ is some exponent greater than 1" (p. 538). If we rescale $M_\text{max} = 1$, we can simplify the equation to
$$\lambda = \alpha \beta^\theta [1 - \exp(-kt) ]^\theta.$$
As "the product $\alpha \beta^\theta$ in the front just rescales strength to nuts-opened-per-second" (p. 538), we can collapse it to a single parameter, $\phi$, which leaves us with
$$\lambda = \phi [1 - \exp(-kt) ]^\theta.$$
This is our scientific model.
### Statistical model.
Now if we let $n_i$ be the number of nuts opened, we can write our statistical model as
\begin{align*}
n_i & \sim \operatorname{Poisson}(\lambda_i) \\
\lambda_i & = \text{seconds}_i \, \phi [1 - \exp(-k \,\text{age}_i) ]^\theta,
\end{align*}
where we have replaced our time index, $t$, with the variable `age`. By including the variable `seconds` in the equation, we have scaled the results to be nuts per second. McElreath proposed the priors:
\begin{align*}
\phi & \sim \operatorname{Log-Normal}(\log 1, 0.10) \\
k & \sim \operatorname{Log-Normal}(\log 2, 0.25) \\
\theta & \sim \operatorname{Log-Normal}(\log 5, 0.25),
\end{align*}
all of which were Log-Normal to ensure the parameters were positive and continuous. To get a sense of what these priors implied, he simulated. Here's our version of his simulations, which make up Figure 16.4.
```{r, fig.width = 6, fig.height = 3.5}
n <- 1e4
# define the x-axis breaks
at <- 0:6 / 4
# how many prior draws would you like?
n_draws <- 50
# simulate
set.seed(16)
prior <-
tibble(index = 1:n,
phi = rlnorm(n, meanlog = log(1), sdlog = 0.1),
k = rlnorm(n, meanlog = log(2), sdlog = 0.25),
theta = rlnorm(n, meanlog = log(5), sdlog = 0.25)) %>%
slice_sample(n = n_draws) %>%
expand_grid(age = seq(from = 0, to = 1.5, length.out = 1e2)) %>%
mutate(bm = 1 - exp(-k * age),
ns = phi * (1 - exp(-k * age))^theta)
# left panel
p1 <-
prior %>%
ggplot(aes(x = age, y = bm, group = index)) +
geom_line(linewidth = 1/4, alpha = 1/2) +
scale_x_continuous(breaks = at, labels = round(at * max(d$age))) +
ylab("body mass")
# right panel
p2 <-
prior %>%
ggplot(aes(x = age, y = ns, group = index)) +
geom_line(linewidth = 1/4, alpha = 1/2) +
scale_x_continuous(breaks = at, labels = round(at * max(d$age))) +
ylab("nuts per second")
# combine and plot
library(patchwork)
p1 + p2 +
plot_annotation(title = "Prior predictive simulation for the nut opening model",
subtitle = "Each panel shows the results from 50 prior draws.")
```
McElreath suggested we inspect the distributions of these priors. Here they are in a series of histograms.
```{r, fig.width = 8, fig.height = 2.75}
set.seed(16)
tibble(phi = rlnorm(n, meanlog = log(1), sdlog = 0.1),
`italic(k)` = rlnorm(n, meanlog = log(2), sdlog = 0.25),
theta = rlnorm(n, meanlog = log(5), sdlog = 0.25)) %>%
pivot_longer(everything()) %>%
ggplot(aes(x = value)) +
geom_histogram(fill = "grey67", bins = 40, boundary = 0) +
scale_x_continuous("marginal prior", limits = c(0, NA)) +
scale_y_continuous(NULL, breaks = NULL) +
facet_wrap(~ name, scales = "free", labeller = label_parsed)
```
Happily, we can fit this model using the non-linear **brms** syntax.
```{r b16.4}
b16.4 <-
brm(data = d,
family = poisson(link = identity),
bf(n ~ seconds * phi * (1 - exp(-k * age_s))^theta,
phi + k + theta ~ 1,
nl = TRUE),
prior = c(prior(lognormal(log(1), 0.1), nlpar = phi, lb = 0),
prior(lognormal(log(2), 0.25), nlpar = k, lb = 0),
prior(lognormal(log(5), 0.25), nlpar = theta, lb = 0)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 16,
file = "fits/b16.04")
```
Check the parameter summary.
```{r}
print(b16.4)
```
No we might get a sense of what this posterior means by plotting nuts per second as a function of `age` in our version of Figure 16.5.
```{r, fig.width = 4, fig.height = 3.2}
set.seed(16)
as_draws_df(b16.4) %>%
slice_sample(n = n_draws) %>%
expand_grid(age = seq(from = 0, to = 1.5, length.out = 1e2)) %>%
mutate(ns = b_phi_Intercept * (1 - exp(-b_k_Intercept * age))^b_theta_Intercept) %>%
ggplot() +
geom_line(aes(x = age, y = ns, group = .draw),
linewidth = 1/4, alpha = 1/2) +
geom_jitter(data = d,
aes(x = age_s, y = n / seconds, size = seconds),
shape = 1, width = 0.01, color = "grey50") +
scale_size_continuous(breaks = c(1, 50, 100), limits = c(1, NA)) +
scale_x_continuous(breaks = at, labels = round(at * max(d$age))) +
labs(title = "Posterior predictive distribution for the\nnut opening model",
y = "nuts per second") +
theme(legend.background = element_blank(),
legend.position = c(.9, .25))
```
Looks like things flatten out around `age == 16`. Yet since the data drop off at that `age`, we probably shouldn't get overconfident.
## Population dynamics
Load the `Lynx_Hare` population dynamics data [@hewittTheConservation1921].
```{r}
data(Lynx_Hare, package = "rethinking")
glimpse(Lynx_Hare)
```
As McElreath indicated in his endnote #238 (p. 570), this example is based on Stan case study by the great [Bob Carpenter](https://bob-carpenter.github.io/), [*Predator-prey population dynamics: The Lotka-Volterra model in Stan*](https://mc-stan.org/users/documentation/case-studies/lotka-volterra-predator-prey.html). You might bookmark that link. It'll come up later in this section.
Figure 6.6 will give us a sense of how the lynx and hare populations ebbed and flowed.
```{r, fig.width = 6.5, fig.height = 3}
# for annotation
text <-
tibble(name = c("Hare", "Lynx"),
label = c("Lepus", "Lynx"),
Year = c(1913.5, 1915.5),
value = c(78, 52))
# wrangle
Lynx_Hare %>%
pivot_longer(-Year) %>%
# plot!
ggplot(aes(x = Year, y = value)) +
geom_line(aes(color = name),
linewidth = 1/4) +
geom_point(aes(fill = name),
size = 3, shape = 21, color = "white") +
geom_text(data = text,
aes(label = label, color = name),
hjust = 0, family = "Times") +
scale_fill_grey(start = 0, end = .5) +
scale_color_grey(start = 0, end = .5) +
scale_y_continuous("thousands of pelts", breaks = 0:4 * 20, limits = c(0, 90)) +
theme(legend.position = "none")
```
Note, however, that these are numbers of pelts, not of actual animals. This will become important when we start modeling.
A typical way to model evenly-spaced time series data like this would be with an autoregressive model with the basic structure
$$\operatorname{E}(y_t) = \alpha + \beta_1 y_{t-1},$$
where $t$ indexes time and $t - 1$ is the time point immediately before $t$. Models following this form are called first-order autoregressive models, AR(1), meaning that the current time point is only influenced by the previous time point, but none of the earlier ones. You can build on this format by adding other predictors. A natural way would be to use a predictor from $t - 1$ to predict $y_t$, following the form
$$\operatorname{E}(y_t) = \alpha + \beta_1 y_{t-1} + \beta_2 x_{t-1}.$$
But that's still a first-order model. A second-order model, AR(2), would include a term for $y_{t - 2}$, such as
$$\operatorname{E}(y_t) = \alpha + \beta_1 y_{t-1} + \beta_2 x_{t-1} + \beta_3 y_{t-2}.$$
McElreath isn't a huge fan of these models, particularly from the scientific modeling perspective he developed in this chapter. But **brms** can fit them and we'll practice a little in a bonus section, later on. In the mean time, we'll follow along and learn about **ordinary differential equations** (ODEs).
### The scientific model.
We'll start off simple and focus first on hares. If we let $H_t$ be the number of hares at time $t$, we can express the **rate of change** in the hare population as
$$\frac{\mathrm{d} H}{\mathrm{d} t} = H_t \times (\text{birth rate}) - H_t \times (\text{death rate}).$$
If we presume both birth rates and death rates (*mortality* rates) are constants, we might denote them $b_H$ and $m_H$, respectively, and re-express the formula as
$$\frac{\mathrm{d} H}{\mathrm{d} t} = H_t b_H - H_t m_H = H_t (b_H - m_H).$$
Building, if we let $L_t$ stand for the number of lynx present at time $t$, we can allow the mortality rate depend on that variable with the expanded formula
$$\frac{\mathrm{d} H}{\mathrm{d} t} = H_t (b_H - L_t m_H).$$
We can expand this even further to model how the number of hares at a given time influence the birth rate for lynx ($b_L$) to help us model the rate of change in the lynx population as
$$\frac{\mathrm{d} L}{\mathrm{d} t} = L_t (H_t b_L - m_L),$$
where the lynx mortality rate ($m_l$) is now constant. This is called the **Lotka-Volterra model** [@lotkaPrinciplesOfPhysicalBiology1925; @volterraFluctuationsAbundanceSpecies1926]. You may have noticed how the above equations shifted our focus from what were were originally interested in, $\operatorname{E}(H_t)$, to a rate of change, $\mathrm{d} H / \mathrm{d} t$. Happily, our equation for $\mathrm{d} H / \mathrm{d} t$, "tells us how to update $H$ after each tiny unit of passing time $\mathrm d t$" (p. 544). You update by
$$H_{t +\mathrm d t} = H_t + \mathrm d t \frac{\mathrm d H}{\mathrm d t} = H_t + \mathrm d t H_t (b_H - L_t m_H).$$
Here we'll use a custom function called `sim_lynx_hare()` to simulate how this can work. Our version of the function is very similar to the one McElreath displayed in his **R** code 16.14, but we changed it so it returns a tibble which includes a time index, `t`.
```{r}
sim_lynx_hare <- function(n_steps, init, theta, dt = 0.002) {
L <- rep(NA, n_steps)
H <- rep(NA, n_steps)
# set initial values
L[1] <- init[1]
H[1] <- init[2]
for (i in 2:n_steps) {
H[i] <- H[i - 1] + dt * H[i - 1] * (theta[1] - theta[2] * L[i - 1])
L[i] <- L[i - 1] + dt * L[i - 1] * (theta[3] * H[i - 1] - theta[4])
}
# return a tibble
tibble(t = 1:n_steps,
H = H,
L = L)
}
```
Now we simulate.
```{r}
# set the four theta values
theta <- c(0.5, 0.05, 0.025, 0.5)
# simulate
z <- sim_lynx_hare(n_steps = 1e4,
init = c(filter(Lynx_Hare, Year == 1900) %>% pull("Lynx"),
filter(Lynx_Hare, Year == 1900) %>% pull("Hare")),
theta = theta)
# what did we do?
glimpse(z)
```
Each row is a stand-in index for time. Here we'll explicitly add a `time` column and them plot the results in our version of Figure 16.7.
```{r, fig.width = 4, fig.height = 3}
z %>%
pivot_longer(-t) %>%
ggplot(aes(x = t, y = value, color = name)) +
geom_line(linewidth = 1/4) +
scale_color_grey(start = 0, end = .5) +
scale_x_continuous(expression(time~(italic(t))), breaks = NULL) +
scale_y_continuous("number (thousands)", breaks = 0:4 * 10, limits = c(0, 45)) +
theme(legend.position = "none")
```
"This model produces cycles, similar to what we see in the data. The model behaves this way, because lynx eat hares. Once the hares are eaten, the lynx begin to die off. Then the cycle repeats (p. 545)."
### The statistical model.
If we continue to let $H_t$ and $L_t$ be the number of hares and lynx at time $t$, we might also want to rehearse the distinction between those numbers and our observations by letting $h_t$ and $l_t$ stand for the observed numbers of hares and lynx. These observed numbers, recall, are from counts of pelts. We want a statistical model that can connect $h_t$ to $H_t$ and connect $l_t$ to $L_t$. Part of that model would include the probability a hare was trapped on a given year, $p_h$, and a similar probability for a lynx getting trapped, $p_l$. To make things worse, further imagine the number of pelts for each, in a given year, was rounded to the nearest $100$ and divided by $1{,}000$. Those are our values.
We practice simulating all this in Figure 16.8. Here we propose a population of $H_t = 10^4$ hares and an average trapping rate of about $10\%$, as expressed by $p_t \sim \operatorname{Beta}(2, 18)$. As described above, we then divide the number of observed pelts by $1{,}000$ and round the results, yielding $h_t$.
```{r, fig.width = 4, fig.height = 3}
n <- 1e4
Ht <- 1e4
set.seed(16)
# simulate
tibble(pt = rbeta(n, shape1 = 2, shape2 = 18)) %>%
mutate(ht = rbinom(n, size = Ht, prob = pt)) %>%
mutate(ht = round(ht / 1000, digits = 2)) %>%
# plot
ggplot(aes(x = ht)) +
geom_histogram(linewidth = 1/4, binwidth = 0.1,
color = "white", fill = "grey67") +
scale_y_continuous(NULL, breaks = NULL) +
xlab(expression(thousand~of~pelts~(italic(h[t]))))
```
On page 546, McElreath encouraged us to try the simulation with different values of $H_t$ and $p_t$. Here we'll do so with a $3 \times 3$ grid of $H_t = \{5{,}000, 10{,}000, 15{,}000\}$ and $p_t \sim \{ \operatorname{Beta}(2, 18), \operatorname{Beta}(10, 10), \operatorname{Beta}(18, 2) \}$.
```{r, fig.width = 7, fig.height = 5}
set.seed(16)
# define the 3 X 3 grid
tibble(shape1 = c(2, 10, 18),
shape2 = c(18, 10, 2)) %>%
expand_grid(Ht = c(5e3, 1e4, 15e3)) %>%
# simulate
mutate(pt = purrr::map2(shape1, shape2, ~rbeta(n, shape1 = .x, shape2 = .y))) %>%
mutate(ht = purrr::map2(Ht, pt, ~rbinom(n, size = .x, prob = .y))) %>%
unnest(c(pt, ht)) %>%
# wrangle
mutate(ht = round(ht / 1000, digits = 2),
beta = str_c("italic(p[t])%~%'Beta '(", shape1, ", ", shape2, ")"),
Htlab = str_c("italic(H[t])==", Ht)) %>%
mutate(beta = factor(beta,
levels = c("italic(p[t])%~%'Beta '(2, 18)", "italic(p[t])%~%'Beta '(10, 10)", "italic(p[t])%~%'Beta '(18, 2)")),
Htlab = factor(Htlab,
levels = c("italic(H[t])==15000", "italic(H[t])==10000", "italic(H[t])==5000"))) %>%
# plot!
ggplot(aes(x = ht)) +
geom_histogram(aes(fill = beta == "italic(p[t])%~%'Beta '(2, 18)" & Htlab == "italic(H[t])==10000"),
linewidth = 1/10, binwidth = 0.25, boundary = 0) +
geom_vline(aes(xintercept = Ht / 1000),
linewidth = 1/4, linetype = 2) +
scale_fill_grey(start = .67, end = 0, breaks = NULL) +
scale_y_continuous(NULL, breaks = NULL) +
xlab(expression(thousand~of~pelts~(italic(h[t])))) +
facet_grid(Htlab ~ beta, labeller = label_parsed, scales = "free_y")
```
The vertical dashed lines mark off the maximum values in each panel. The histogram in black is of the simulation parameters based on our version of Figure 16.8, above.
McElreath's proposed model is
\begin{align*}
h_t & \sim \operatorname{Log-Normal} \big (\log(p_H H_t), \sigma_H \big) \\
l_t & \sim \operatorname{Log-Normal} \big (\log(p_L L_t), \sigma_L \big) \\
H_1 & \sim \operatorname{Log-Normal}(\log 10, 1) \\
L_1 & \sim \operatorname{Log-Normal}(\log 10, 1) \\
H_{T >1} & = H_1 + \int_1^T H_t (b_H - m_H L_t) \mathrm{d} t \\
L_{T >1} & = L_1 + \int_1^T L_t (b_L H_T - m_L) \mathrm{d} t \\
\sigma_H & \sim \operatorname{Exponential}(1) \\
\sigma_L & \sim \operatorname{Exponential}(1) \\
p_H & \sim \operatorname{Beta}(\alpha_H, \beta_H) \\
p_L & \sim \operatorname{Beta}(\alpha_L, \beta_L) \\
b_H & \sim \operatorname{Half-Normal}(1, 0.5) \\
b_L & \sim \operatorname{Half-Normal}(0.5, 0.5) \\
m_H & \sim \operatorname{Half-Normal}(0.5, 0.5) \\
m_L & \sim \operatorname{Half-Normal}(1, 0.5).
\end{align*}
It's not immediately clear from the text, but if you look closely at the output from `cat(Lynx_Hare_model)` (see below), you'll see $\alpha_H = \alpha_L = 40$ and $\beta_H = \beta_L = 200$. If you're curious, here's a plot of what the $\operatorname{Beta}(40, 200)$ prior looks like.
```{r, fig.width = 5, fig.height = 2.5}
set.seed(16)
tibble(p = rbeta(n = 1e6, shape1 = 40, shape2 = 200)) %>%
ggplot(aes(x = p)) +
geom_histogram(linewidth = 1/6, binwidth = 0.005, boundary = 0,
color = "white", fill = "grey67") +
scale_x_continuous(expression(prior~predictive~distribution~of~italic(p[H])~and~italic(p[L])),
breaks = 0:5 / 5, expand = c(0, 0), limits = c(0, 1)) +
scale_y_continuous(NULL, breaks = NULL, expand = expansion(mult = c(0, 0.05))) +
labs(subtitle = expression("1,000,000 draws from Beta"*(40*", "*200)))
```
The $\operatorname{Beta}(40, 200)$ prior suggests an average trapping rate near 16%.
`r emo::ji("warning")` The content to follow is going to diverge from the text, a bit. As you can see from the equation, above, McElreath's statistical model is a beast. We can fit this model with **brms**, but the workflow is more complicated than usual. To make this material more approachable, I am going to divide the remainder of this section into two subsections. In the first subsection, we'll fit a simplified version of McElreath's `m16.5`, which does not contain the measurement-error portion. In the second subsection, we'll tack on the measurement-error portion and fit the full model. `r emo::ji("warning")`
#### The simple Lotka-Volterra model.
Before we get into it, I should acknowledge that this **brms** approach to fitting ODE's is a direct result of the generous contributions from [Markus Gesmann](https://www.magesblog.com/). It was one of his older blog posts, [*PK/PD reserving models*](https://magesblog.com/post/2018-01-30-pkpd-reserving-models/), that led me to believe one could fit an ODE model with **brms**. When I reached out to Gesmann on GitHub (see [Issue #18](https://github.com/ASKurz/Statistical_Rethinking_with_brms_ggplot2_and_the_tidyverse_2_ed/issues/18)), he went so far as to write a new blog post on exactly this model: [*Fitting multivariate ODE models with brms*](https://magesblog.com/post/2021-02-08-fitting-multivariate-ode-models-with-brms/). The workflow to follow is something of a blend of the methods in his blog post, McElreath's model in the text, and the [original post](https://mc-stan.org/users/documentation/case-studies/lotka-volterra-predator-prey.html) by Carpenter that started this all.
As far as the statistical model goes, we might express the revision of McElreath's model omitting the measurement-error portion as
\begin{align*}
h_t & \sim \operatorname{Log-Normal} \big (\log(H_t), \sigma_H \big) \\
l_t & \sim \operatorname{Log-Normal} \big (\log(L_t), \sigma_L \big) \\
H_1 & \sim \operatorname{Log-Normal}(\log 10, 1) \\
L_1 & \sim \operatorname{Log-Normal}(\log 10, 1) \\
H_{T >1} & = H_1 + \int_1^T H_t (b_H - m_H L_t) \mathrm{d} t \\
L_{T >1} & = L_1 + \int_1^T L_t (b_L H_T - m_L) \mathrm{d} t \\
b_H & \sim \operatorname{Half-Normal}(1, 0.5) \\
b_L & \sim \operatorname{Half-Normal}(0.5, 0.5) \\
m_H & \sim \operatorname{Half-Normal}(0.5, 0.5) \\
m_L & \sim \operatorname{Half-Normal}(1, 0.5) \\
\sigma_H & \sim \operatorname{Exponential}(1) \\
\sigma_L & \sim \operatorname{Exponential}(1).
\end{align*}
With the exception of the priors for the $\sigma$ parameters, this is basically the same model Carpenter fit with his original Stan code. Carpenter expressed his model using a different style of notation, but the parts are all there.
As for our **brms**, the first issue we need to address is that, at the time of this writing, **brms** is only set up to fit a univariate ODE model. As Gesmann pointed out, the way around this is to convert the `Lynx_Hare` data into the long format where the pelt values from the `Lynx` and `Hare` columns are all listed in a `pelts` columns and the two animal populations are differentiated in a `population` column. We'll call this long version of the data `Lynx_Hare_long`.
```{r}
Lynx_Hare_long <-
Lynx_Hare %>%
pivot_longer(-Year,
names_to = "population",
values_to = "pelts") %>%
mutate(delta = if_else(population == "Lynx", 1, 0),
t = Year - min(Year) + 1) %>%
arrange(delta, Year)
# what did we do?
head(Lynx_Hare_long)
```
You'll note how we converted the information in the `population` column into a dummy variable, `delta`, which is coded 0 = hares, 1 = lynxes. It's that dummy variable that will allow us to adjust our model formula so we express a bivariate model as if it were univariate. You'll see. Also notice how we added a `t` index for *time*. This is because the Stan code to follow will expect us to index time in that way.
The next step is to write a script that will tell **brms** how to tell Stan how to fit a Lotka-Volterra model. In his blog, Gesmann called this `LotkaVolterra`. Our script to follow is a very minor adjustment of his.
```{r}
LotkaVolterra <- "
// Sepcify dynamical system (ODEs)
real[] ode_LV(real t, // time
real [] y, // the system rate
real [] theta, // the parameters (i.e., the birth and mortality rates)
real [] x_r, // data constant, not used here
int [] x_i) { // data constant, not used here
// the outcome
real dydt[2];
// differential equations
dydt[1] = (theta[1] - theta[2] * y[2]) * y[1]; // Hare process
dydt[2] = (theta[3] * y[1] - theta[4]) * y[2]; // Lynx process