-
Notifications
You must be signed in to change notification settings - Fork 21
/
Copy path07.Rmd
1636 lines (1272 loc) · 78 KB
/
07.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
```{r, echo = F, cache = F}
knitr::opts_chunk$set(fig.retina = 2.5)
knitr::opts_chunk$set(fig.align = "center")
options(width = 110)
```
# Examining the Multilevel Model's Error Covariance Structure
> In this chapter..., we focus on the model's *random* effects as embodied in its error covariance structure. Doing so allows us to both describe the particular error covariance structure that the "standard" multilevel model for change invokes and it also allows us to broaden its representation to other—sometimes more tenable assumptions—about its behavior. [@singerAppliedLongitudinalData2003, p. 243, *emphasis* in the original]
## The "standard" specification of the multilevel model for change
Load the `opposites_pp` data [@willett1988QuestionsAndAnswers].
```{r, warning = F, message = F}
library(tidyverse)
opposites_pp <- read_csv("~/Dropbox/Recoding Applied Longitudinal Data Analysis/data/opposites_pp.csv")
glimpse(opposites_pp)
```
@willett1988QuestionsAndAnswers clarified these data were simulated for pedagogical purposes, which will become important later on. For now, here's our version of Table 7.1.
```{r}
opposites_pp %>%
mutate(name = str_c("opp", wave)) %>%
select(id, name, opp, cog) %>%
pivot_wider(names_from = name, values_from = opp) %>%
select(id, starts_with("opp"), cog) %>%
head(n = 10)
```
The data are composed of 35 people's scores on a cognitive task, across 4 time points.
```{r}
opposites_pp %>%
count(id)
```
Our first model will serve as a comparison model for all that follow. We'll often refer to it as the *standard* multilevel model for change. You'll see why in a bit. It follows the form
$$
\begin{align*}
\text{opp}_{ij} & = \pi_{0i} + \pi_{1i} \text{time}_{ij} + \epsilon_{ij}\\
\pi_{0i} & = \gamma_{00} + \gamma_{01} (\text{cog}_i - \overline{\text{cog}}) + \zeta_{0i}\\
\pi_{1i} & = \gamma_{10} + \gamma_{11} (\text{cog}_i - \overline{\text{cog}}) + \zeta_{1i} \\
\epsilon_{ij} & \stackrel{iid}{\sim} \operatorname{Normal}(0, \sigma_\epsilon) \\
\begin{bmatrix} \zeta_{0i} \\ \zeta_{1i} \end{bmatrix} & \stackrel{iid}{\sim} \operatorname{Normal} \left (
\begin{bmatrix} 0 \\ 0 \end{bmatrix}, \mathbf D \mathbf\Omega \mathbf D' \right ) \\
\mathbf D & = \begin{bmatrix} \sigma_0 & 0 \\ 0 & \sigma_1 \end{bmatrix} \\
\mathbf \Omega & = \begin{bmatrix} 1 & \rho_{01} \\ \rho_{01} & 1 \end{bmatrix},
\end{align*}
$$
where the $(\text{cog}_i - \overline{\text{cog}})$ notation is meant to indicate we are mean centering the time-invariant covariate $\text{cog}_i$. In the data, the mean-centered version is saved as `ccog`.
```{r, fig.width = 3, fig.height = 2}
opposites_pp %>%
filter(wave == 1) %>%
ggplot(aes(ccog)) +
geom_histogram(binwidth = 5) +
theme(panel.grid = element_blank())
```
To keep things simple, we'll be using **brms** default priors for all the models in this chapter. Fit the model.
```{r fit7.1, message = F, warning = F}
library(brms)
fit7.1 <-
brm(data = opposites_pp,
family = gaussian,
opp ~ 0 + Intercept + time + ccog + time:ccog + (1 + time | id),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 7,
file = "fits/fit07.01")
```
Check the summary.
```{r}
print(fit7.1, digits = 3)
```
If you're curious, here's the summary of our $\sigma$ parameters transformed to the variance/covariance metric.
```{r, warning = F, message = F}
library(tidybayes)
levels <- c("sigma[epsilon]^2", "sigma[0]^2", "sigma[1]^2", "sigma[0][1]")
sigma <-
as_draws_df(fit7.1) %>%
transmute(`sigma[0]^2` = sd_id__Intercept^2,
`sigma[1]^2` = sd_id__time^2,
`sigma[epsilon]^2` = sigma^2,
`sigma[0][1]` = sd_id__Intercept * sd_id__time * cor_id__Intercept__time)
sigma %>%
pivot_longer(everything()) %>%
mutate(name = factor(name, levels = levels)) %>%
group_by(name) %>%
median_qi(value) %>%
mutate_if(is.double, round, digits = 2)
```
## Using the composite model to understand assumptions about the error covariance matrix
Dropping the terms specifying the distributional assumptions, we can reexpress the model formula, from above, in the composite format as
$$
\begin{align*}
\text{opp}_{ij} & = [\gamma_{00} + \gamma_{10} \text{time}_{ij} + \gamma_{01} (\text{cog}_i - \overline{\text{cog}}) + \gamma_{11} (\text{cog}_i - \overline{\text{cog}}) \times \text{time}_{ij}] \\
& \;\;\; + [\zeta_{0i} + \zeta_{1i} \text{time}_{ij} + \epsilon_{ij}],
\end{align*}
$$
where we've divided up the structural and stochastic components with our use of brackets. We might think of the terms of the stochastic portion as a *composite residual*, $r_{ij} = [\epsilon_{ij} + \zeta_{0i} + \zeta_{1i} \text{time}_{ij}]$. Thus, we can rewrite the composite equation with the composite residual term $r_{ij}$ as
$$
\text{opp}_{ij} = [\gamma_{00} + \gamma_{10} \text{time}_{ij} + \gamma_{01} (\text{cog}_i - \overline{\text{cog}}) + \gamma_{11} (\text{cog}_i - \overline{\text{cog}}) \times \text{time}_{ij}] + r_{ij}.
$$
If we were willing to presume, as in OLS or single-level Bayesian regression, that all residuals are independent and normally distributed, we could express that in statistical notation as
\begin{align*}
\begin{bmatrix} r_{11} \\ r_{12} \\ r_{13} \\ r_{14} \\ r_{21} \\ r_{22} \\ r_{23} \\ r_{24} \\ \vdots \\ r_{n1} \\ r_{n2} \\ r_{n3} \\ r_{n4} \end{bmatrix} & \sim \mathcal{N}
\begin{pmatrix}
\begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ \vdots \\ 0 \\ 0 \\ 0 \\ 0 \\ \end{bmatrix},
\begin{bmatrix}
\sigma_r^2 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \dots & 0 & 0 & 0 & 0 \\
0 & \sigma_r^2 & 0 & 0 & 0 & 0 & 0 & 0 & \dots & 0 & 0 & 0 & 0 \\
0 & 0 & \sigma_r^2 & 0 & 0 & 0 & 0 & 0 & \dots & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & \sigma_r^2 & 0 & 0 & 0 & 0 & \dots & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & \sigma_r^2 & 0 & 0 & 0 & \dots & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & \sigma_r^2 & 0 & 0 & \dots & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 & \sigma_r^2 & 0 & \dots & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 & 0 & \sigma_r^2 & \dots & 0 & 0 & 0 & 0 \\
\vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots & \vdots & \\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \dots & \sigma_r^2 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \dots & 0 & \sigma_r^2 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \dots & 0 & 0 & \sigma_r^2 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \dots & 0 & 0 & 0 & \sigma_r^2
\end{bmatrix}
\end{pmatrix},
\end{align*}
where $r_{ij}$ is the $i$th person's residual on the $j$th time point. The variance/covariance matrix $\mathbf \Sigma$ is diagonal (i.e., all the off-diagonal elements are 0's) and *homoscedastic* (i.e., all the diagonal elements are the same value, $\sigma_r^2$).
These assumptions are absurd for longitudinal data, which is why we don't analyze such data with single-level models. If we were to make the less restrictive assumptions that, within people, the residuals were *correlated over time* and were *heteroscedastic*, we could express that as
\begin{align*}
\begin{bmatrix} r_{11} \\ r_{12} \\ r_{13} \\ r_{14} \\ r_{21} \\ r_{22} \\ r_{23} \\ r_{24} \\ \vdots \\ r_{n1} \\ r_{n2} \\ r_{n3} \\ r_{n4} \end{bmatrix} & \sim \mathcal{N}
\begin{pmatrix}
\begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ \vdots \\ 0 \\ 0 \\ 0 \\ 0 \\ \end{bmatrix},
\begin{bmatrix}
\sigma_{r_1}^2 & \sigma_{r_1 r_2} & \sigma_{r_1 r_3} & \sigma_{r_1 r_4} & 0 & 0 & 0 & 0 & \dots & 0 & 0 & 0 & 0 \\
\sigma_{r_2 r_1} & \sigma_{r_2}^2 & \sigma_{r_2 r_3} & \sigma_{r_2 r_4} & 0 & 0 & 0 & 0 & \dots & 0 & 0 & 0 & 0 \\
\sigma_{r_3 r_1} & \sigma_{r_3 r_2} & \sigma_{r_3}^2 & \sigma_{r_3 r_4} & 0 & 0 & 0 & 0 & \dots & 0 & 0 & 0 & 0 \\
\sigma_{r_4 r_1} & \sigma_{r_4 r_2} & \sigma_{r_4 r_3} & \sigma_{r_4}^2 & 0 & 0 & 0 & 0 & \dots & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & \sigma_{r_1}^2 & \sigma_{r_1 r_2} & \sigma_{r_1 r_3} & \sigma_{r_1 r_4} & \dots & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & \sigma_{r_2 r_1} & \sigma_{r_2}^2 & \sigma_{r_2 r_3} & \sigma_{r_2 r_4} & \dots & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & \sigma_{r_3 r_1} & \sigma_{r_3 r_2} & \sigma_{r_3}^2 & \sigma_{r_3 r_4} & \dots & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & \sigma_{r_4 r_1} & \sigma_{r_4 r_2} & \sigma_{r_4 r_3} & \sigma_{r_4}^2 & \dots & 0 & 0 & 0 & 0 \\
\vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots & \vdots & \\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \dots & \sigma_{r_1}^2 & \sigma_{r_1 r_2} & \sigma_{r_1 r_3} & \sigma_{r_1 r_4} \\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \dots & \sigma_{r_2 r_1} & \sigma_{r_2}^2 & \sigma_{r_2 r_3} & \sigma_{r_2 r_4} \\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \dots & \sigma_{r_3 r_1} & \sigma_{r_3 r_2} & \sigma_{r_3}^2 & \sigma_{r_3 r_4} \\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \dots & \sigma_{r_4 r_1} & \sigma_{r_4 r_2} & \sigma_{r_4 r_3} & \sigma_{r_4}^2
\end{bmatrix}
\end{pmatrix}.
\end{align*}
this kind of structure can be called *block diagonal*, which means the off-diagonal elements are zero between persons, but allowed to be non-zero within persons (i.e., within blocks). The zero elements between person blocks explicate how the residuals are independent between persons. Notice that the variances on the diagonal vary across the four time points (i.e., $\sigma_{r_1}^2, \dots, \sigma_{r_4}^2$). Yet also notice that the block for one person is identical to the block for all others. Thus, this model allows for *heterogeneity* across time within persons, but *homogeneity* between persons.
We can express this in the more compact notation,
\begin{align*}
r & \sim \mathcal{N}
\begin{pmatrix} \mathbf 0,
\begin{bmatrix}
\mathbf{\Sigma}_r & \mathbf 0 & \mathbf 0 & \dots & \mathbf 0 \\
\mathbf 0 & \mathbf{\Sigma}_r & \mathbf 0 & \dots & \mathbf 0 \\
\mathbf 0 & \mathbf 0 & \mathbf{\Sigma}_r & \dots & \mathbf 0 \\
\vdots & \vdots & \vdots & \ddots & \mathbf 0 \\
\mathbf 0 & \mathbf 0 & \mathbf 0 & \mathbf 0 & \mathbf{\Sigma}_r
\end{bmatrix}
\end{pmatrix}, \;\;\; \text{where} \\
\mathbf{\Sigma}_r & = \begin{bmatrix}
\sigma_{r_1}^2 & \sigma_{r_1 r_2} & \sigma_{r_1 r_3} & \sigma_{r_1 r_4} \\
\sigma_{r_2 r_1} & \sigma_{r_2}^2 & \sigma_{r_2 r_3} & \sigma_{r_2 r_4} \\
\sigma_{r_3 r_1} & \sigma_{r_3 r_2} & \sigma_{r_3}^2 & \sigma_{r_3 r_4} \\
\sigma_{r_4 r_1} & \sigma_{r_4 r_2} & \sigma_{r_4 r_3} & \sigma_{r_4}^2 \end{bmatrix}.
\end{align*}
The bulk of the rest of the material in this chapter will focus around how different models handle $\mathbf{\Sigma}_r$. The *standard* multilevel growth model has one way. There are many others.
### Variance of the composite residual.
Under the conventional multilevel growth model
$$
\begin{align*}
\sigma_{r_j}^2 & = \operatorname{Var} \left ( \epsilon_{ij} + \zeta_{0i} + \zeta_{1i} \text{time}_j \right ) \\
& = \sigma_\epsilon^2 + \sigma_0^2 + 2 \sigma_{01} \text{time}_j + \sigma_1^2 \text{time}_j^2.
\end{align*}
$$
Here's how to use our posterior samples to compute $\sigma_{r_1}^2, \dots, \sigma_{r_4}^2$.
```{r, fig.width = 5, fig.height = 3}
sigma %>%
mutate(iter = 1:n()) %>%
expand(nesting(iter, `sigma[epsilon]^2`, `sigma[0]^2`, `sigma[1]^2`, `sigma[0][1]`),
time = 0:3) %>%
mutate(r = `sigma[epsilon]^2` + `sigma[0]^2` + 2 * `sigma[0][1]` * time + `sigma[1]^2` * time^2) %>%
mutate(name = str_c("sigma[italic(r)[", time + 1, "]]^2")) %>%
ggplot(aes(x = r, y = name)) +
stat_halfeye(.width = .95, size = 1) +
scale_x_continuous("marginal posterior", expand = expansion(mult = c(0, 0.05)), limits = c(0, NA)) +
scale_y_discrete(NULL, labels = ggplot2:::parse_safe) +
coord_cartesian(ylim = c(1.5, 4.2)) +
theme(panel.grid = element_blank())
```
As is often the case with variance parameters, the posteriors show marked right skew. Here are the numeric summaries.
```{r}
sigma %>%
mutate(iter = 1:n()) %>%
expand(nesting(iter, `sigma[epsilon]^2`, `sigma[0]^2`, `sigma[1]^2`, `sigma[0][1]`),
time = 0:3) %>%
mutate(r = `sigma[epsilon]^2` + `sigma[0]^2` + 2 * `sigma[0][1]` * time + `sigma[1]^2` * time^2) %>%
mutate(name = str_c("sigma[italic(r)[", time + 1, "]]^2")) %>%
group_by(name) %>%
median_qi(r)
```
Though our precise numeric values are different from those in the text, we see the same overall pattern. Using our posterior medians, we can update $\mathbf{\Sigma}_r$ to
$$
\begin{align*}
\hat{\mathbf{\Sigma}}_r & = \begin{bmatrix}
1430 & \hat{\sigma}_{r_1 r_2} & \hat{\sigma}_{r_1 r_3} & \hat{\sigma}_{r_1 r_4} \\
\hat{\sigma}_{r_2 r_1} & 1201 & \hat{\sigma}_{r_2 r_3} & \hat{\sigma}_{r_2 r_4} \\
\hat{\sigma}_{r_3 r_1} & \hat{\sigma}_{r_3 r_2} & 1203 & \hat{\sigma}_{r_3 r_4} \\
\hat{\sigma}_{r_4 r_1} & \hat{\sigma}_{r_4 r_2} & \hat{\sigma}_{r_4 r_3} & 1416 \end{bmatrix}.
\end{align*}
$$
> For the opposites-naming data, composite residual variance is greatest at the beginning and end of data collection and smaller in between. And, while not outrageously heteroscedastic, this situation is clearly beyond the bland homoscedasticity that we routinely assume for residuals in cross-sectional data. (p. 252)
If you work through the equation at the beginning of this section--which I am not going to do, here--, you'll see that the standard multilevel growth model is set up such that the residual variance follows a quadratic function with time. To give a sense, here we plot the expected $\sigma_r^2$ values over a wider and more continuous range of time values.
```{r, fig.width = 4.75, fig.height = 3.25}
set.seed(7)
sigma %>%
mutate(iter = 1:n()) %>%
slice_sample(n = 50) %>%
expand(nesting(iter, `sigma[epsilon]^2`, `sigma[0]^2`, `sigma[1]^2`, `sigma[0][1]`),
time = seq(from = -4.2, to = 6.6, length.out = 200)) %>%
mutate(r = `sigma[epsilon]^2` + `sigma[0]^2` + 2 * `sigma[0][1]` * time + `sigma[1]^2` * time^2) %>%
ggplot(aes(x = time, y = r, group = iter)) +
geom_line(linewidth = 1/6, alpha = 1/2) +
scale_x_continuous(expand = c(0, 0)) +
scale_y_continuous(expression(sigma[italic(r)]^2),
expand = expansion(mult = c(0, 0.05)), limits = c(0, NA)) +
labs(subtitle = expression("50 posterior draws showing the quadratic shape of "*sigma[italic(r)[time]]^2)) +
theme(panel.grid = element_blank())
```
Since we have 4,000 posterior draws for all the parameters, we also have 4,000 posterior draws for the quadratic curve. Here we just show 50. The curve is at its minimum at $\text{time} = -(\sigma_{01} / \sigma_1^2)$. Since we have posterior distributions for $\sigma_{01}$ and $\sigma_1^2$, we'll also have a posterior distribution for the minimum point. Here it is.
```{r, fig.width = 4, fig.height = 2.75}
sigma %>%
mutate(minimum = -`sigma[0][1]` / `sigma[1]^2`) %>%
ggplot(aes(x = minimum, y = 0)) +
stat_halfeye(.width = .95) +
scale_x_continuous("time", expand = c(0, 0), limits = c(-4.2, 6.6)) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = expression(Minimum~value~(-sigma[0][1]/sigma[1]^2))) +
theme(panel.grid = element_blank())
```
If we plug those minimum time values into the equation for $\sigma_{r_\text{time}}^2$, we'll get the posterior distribution for the minimum variance value.
```{r, fig.width = 4, fig.height = 2.75}
sigma %>%
mutate(minimum = -`sigma[0][1]` / `sigma[1]^2`) %>%
mutate(r = `sigma[epsilon]^2` + `sigma[0]^2` + 2 * `sigma[0][1]` * minimum + `sigma[1]^2` * minimum^2) %>%
ggplot(aes(x = r, y = 0)) +
stat_halfeye(.width = .95) +
scale_x_continuous(expression(sigma[italic(r)[time]]^2), limits = c(0, NA)) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Minimum variance") +
theme(panel.grid = element_blank())
```
Here's the numeric summary.
```{r}
sigma %>%
mutate(minimum = -`sigma[0][1]` / `sigma[1]^2`) %>%
mutate(r = `sigma[epsilon]^2` + `sigma[0]^2` + 2 * `sigma[0][1]` * minimum + `sigma[1]^2` * minimum^2) %>%
median_qi(r)
```
### Covariance of the composite residuals.
In addition to the variances in $\mathbf{\Sigma}_r$, we might focus on the off-diagonal covariances, too. We can define the covariance between two time points $\sigma_{r_j, r_{j'}}$ as
$$
\sigma_{r_j, r_{j'}} = \sigma_0^2 + \sigma_{01} (t_j + t_{j'}) + \sigma_1^2 t_j t_{j'},
$$
where $\sigma_0^2$, $\sigma_{01}$ and $\sigma_1^2$ all have their usual interpretation, and $t_j$ and $t_{j'}$ are the numeric values for whatever variable is used to index time in the model, which is `time` in the case of `fit7.1`. Here we compute and plot the marginal posteriors for all $4 \times 4 = 16$ parameters.
```{r, fig.width = 6, fig.height = 4.5}
# arrange the panels
levels <-
c("sigma[italic(r)[1]]^2", "sigma[italic(r)[1]][italic(r)[2]]", "sigma[italic(r)[1]][italic(r)[3]]", "sigma[italic(r)[1]][italic(r)[4]]",
"sigma[italic(r)[2]][italic(r)[1]]", "sigma[italic(r)[2]]^2", "sigma[italic(r)[2]][italic(r)[3]]", "sigma[italic(r)[2]][italic(r)[4]]",
"sigma[italic(r)[3]][italic(r)[1]]", "sigma[italic(r)[3]][italic(r)[2]]", "sigma[italic(r)[3]]^2", "sigma[italic(r)[3]][italic(r)[4]]",
"sigma[italic(r)[4]][italic(r)[1]]", "sigma[italic(r)[4]][italic(r)[2]]", "sigma[italic(r)[4]][italic(r)[3]]", "sigma[italic(r)[4]]^2")
# wrangle
sigma <-
sigma %>%
mutate(`sigma[italic(r)[1]]^2` = `sigma[epsilon]^2` + `sigma[0]^2` + 2 * `sigma[0][1]` * 0 + `sigma[1]^2` * 0^2,
`sigma[italic(r)[2]]^2` = `sigma[epsilon]^2` + `sigma[0]^2` + 2 * `sigma[0][1]` * 1 + `sigma[1]^2` * 1^2,
`sigma[italic(r)[3]]^2` = `sigma[epsilon]^2` + `sigma[0]^2` + 2 * `sigma[0][1]` * 2 + `sigma[1]^2` * 2^2,
`sigma[italic(r)[4]]^2` = `sigma[epsilon]^2` + `sigma[0]^2` + 2 * `sigma[0][1]` * 3 + `sigma[1]^2` * 3^2,
`sigma[italic(r)[2]][italic(r)[1]]` = `sigma[0]^2` + `sigma[0][1]` * (1 + 0) + `sigma[1]^2` * 1 * 0,
`sigma[italic(r)[3]][italic(r)[1]]` = `sigma[0]^2` + `sigma[0][1]` * (2 + 0) + `sigma[1]^2` * 2 * 0,
`sigma[italic(r)[4]][italic(r)[1]]` = `sigma[0]^2` + `sigma[0][1]` * (3 + 0) + `sigma[1]^2` * 3 * 0,
`sigma[italic(r)[3]][italic(r)[2]]` = `sigma[0]^2` + `sigma[0][1]` * (2 + 1) + `sigma[1]^2` * 2 * 1,
`sigma[italic(r)[4]][italic(r)[2]]` = `sigma[0]^2` + `sigma[0][1]` * (3 + 1) + `sigma[1]^2` * 3 * 1,
`sigma[italic(r)[4]][italic(r)[3]]` = `sigma[0]^2` + `sigma[0][1]` * (3 + 2) + `sigma[1]^2` * 3 * 2,
`sigma[italic(r)[1]][italic(r)[2]]` = `sigma[0]^2` + `sigma[0][1]` * (0 + 1) + `sigma[1]^2` * 0 * 1,
`sigma[italic(r)[1]][italic(r)[3]]` = `sigma[0]^2` + `sigma[0][1]` * (0 + 2) + `sigma[1]^2` * 0 * 2,
`sigma[italic(r)[2]][italic(r)[3]]` = `sigma[0]^2` + `sigma[0][1]` * (1 + 2) + `sigma[1]^2` * 1 * 2,
`sigma[italic(r)[1]][italic(r)[4]]` = `sigma[0]^2` + `sigma[0][1]` * (0 + 3) + `sigma[1]^2` * 0 * 3,
`sigma[italic(r)[2]][italic(r)[4]]` = `sigma[0]^2` + `sigma[0][1]` * (1 + 3) + `sigma[1]^2` * 1 * 3,
`sigma[italic(r)[3]][italic(r)[4]]` = `sigma[0]^2` + `sigma[0][1]` * (2 + 3) + `sigma[1]^2` * 2 * 3)
sigma %>%
select(contains("italic")) %>%
pivot_longer(everything()) %>%
mutate(name = factor(name, levels = levels)) %>%
# plot!
ggplot(aes(x = value, y = 0)) +
stat_halfeye(.width = .95, size = 1) +
scale_x_continuous("marginal posterior", expand = expansion(mult = c(0, 0.05))) +
scale_y_discrete(NULL, breaks = NULL) +
coord_cartesian(xlim = c(0, 4000),
ylim = c(0.5, NA)) +
theme(panel.grid = element_blank()) +
facet_wrap(~ name, labeller = label_parsed)
```
It might be helpful to reduce the complexity of this plot by focusing on the posterior medians. With a little help from `geom_tile()` and `geom_text()`, we'll make a plot version of the matrix at the top of page 255 in the text.
```{r, fig.width = 5, fig.height = 3.5}
sigma %>%
select(contains("italic")) %>%
pivot_longer(everything()) %>%
mutate(name = factor(name, levels = levels)) %>%
group_by(name) %>%
median_qi(value) %>%
mutate(label = round(value, digits = 0)) %>%
ggplot(aes(x = 0, y = 0)) +
geom_tile(aes(fill = value)) +
geom_text(aes(label = label)) +
scale_fill_viridis_c("posterior\nmedian", option = "A", limits = c(0, NA)) +
scale_x_continuous(NULL, breaks = NULL, expand = c(0, 0)) +
scale_y_continuous(NULL, breaks = NULL, expand = c(0, 0)) +
labs(subtitle = expression(hat(Sigma)[italic(r)]*" for the standard multilevel model for change")) +
theme(legend.text = element_text(hjust = 1)) +
facet_wrap(~ name, labeller = label_parsed)
```
Although our posterior median values differ a bit from the REML values reported in the text, the overall pattern holds. Hopefully the coloring in the plot helps highlight what Singer and Willett described as a "'band diagonal' structure, in which the overall magnitude of the residual covariances tends to decline in diagonal 'bands' the further you get from the main diagonal" (p. 255).
One of the consequences for this structure is that in cases where both $\sigma_1^2 \rightarrow 0$ and $\sigma_{01} \rightarrow 0$, the residual covariance matrix becomes *compound symmetric*, which is:
$$
\begin{align*}
\mathbf{\Sigma}_r & = \begin{bmatrix}
\sigma_\epsilon^2 + \sigma_0^2 & \sigma_0^2 & \sigma_0^2 & \sigma_0^2 \\
\sigma_0^2 & \sigma_\epsilon^2 + \sigma_0^2 & \sigma_0^2 & \sigma_0^2 \\
\sigma_0^2 & \sigma_0^2 & \sigma_\epsilon^2 + \sigma_0^2 & \sigma_0^2 \\
\sigma_0^2 & \sigma_0^2 & \sigma_0^2 & \sigma_\epsilon^2 + \sigma_0^2 \end{bmatrix}.
\end{align*}
$$
> Compound symmetric error covariance structures are particularly common in longitudinal data, especially if the slopes of the change trajectories do not differ much asroc people. Regardless of these special cases, however, the most sensible question to ask of your data is whether the error covariance structure that the "standard" multilevel model for change demands is realistic when applied to data in practice? The answer to this question will determine whether the standard model can be applied ubiquitously, as question we soon address. (pp. 255--256)
### Autocorrelation of the composite residuals.
We can use the following equation to convert our $\mathbf{\Sigma}_r$ into a correlation matrix:
$$\rho_{r_j r_{j^\prime}} = \sigma_{r_j r_{j^\prime}} \Big / \sqrt{\sigma_{r_j}^2 \sigma_{r_{j^\prime}}^2}.$$
Here we use the formula and plot the posteriors.
```{r, fig.width = 6, fig.height = 4.5}
# arrange the panels
levels <-
c("sigma[italic(r)[1]]", "rho[italic(r)[1]][italic(r)[2]]", "rho[italic(r)[1]][italic(r)[3]]", "rho[italic(r)[1]][italic(r)[4]]",
"rho[italic(r)[2]][italic(r)[1]]", "sigma[italic(r)[2]]", "rho[italic(r)[2]][italic(r)[3]]", "rho[italic(r)[2]][italic(r)[4]]",
"rho[italic(r)[3]][italic(r)[1]]", "rho[italic(r)[3]][italic(r)[2]]", "sigma[italic(r)[3]]", "rho[italic(r)[3]][italic(r)[4]]",
"rho[italic(r)[4]][italic(r)[1]]", "rho[italic(r)[4]][italic(r)[2]]", "rho[italic(r)[4]][italic(r)[3]]", "sigma[italic(r)[4]]")
sigma <-
sigma %>%
select(contains("italic")) %>%
mutate(`sigma[italic(r)[1]]` = `sigma[italic(r)[1]]^2` / sqrt(`sigma[italic(r)[1]]^2`^2),
`rho[italic(r)[2]][italic(r)[1]]` = `sigma[italic(r)[2]][italic(r)[1]]` / sqrt(`sigma[italic(r)[2]]^2` * `sigma[italic(r)[1]]^2`),
`rho[italic(r)[3]][italic(r)[1]]` = `sigma[italic(r)[3]][italic(r)[1]]` / sqrt(`sigma[italic(r)[3]]^2` * `sigma[italic(r)[1]]^2`),
`rho[italic(r)[4]][italic(r)[1]]` = `sigma[italic(r)[4]][italic(r)[1]]` / sqrt(`sigma[italic(r)[4]]^2` * `sigma[italic(r)[1]]^2`),
`rho[italic(r)[1]][italic(r)[2]]` = `sigma[italic(r)[1]][italic(r)[2]]` / sqrt(`sigma[italic(r)[1]]^2` * `sigma[italic(r)[2]]^2`),
`sigma[italic(r)[2]]` = `sigma[italic(r)[2]]^2` / sqrt(`sigma[italic(r)[2]]^2`^2),
`rho[italic(r)[3]][italic(r)[2]]` = `sigma[italic(r)[3]][italic(r)[2]]` / sqrt(`sigma[italic(r)[3]]^2` * `sigma[italic(r)[2]]^2`),
`rho[italic(r)[4]][italic(r)[2]]` = `sigma[italic(r)[4]][italic(r)[2]]` / sqrt(`sigma[italic(r)[4]]^2` * `sigma[italic(r)[2]]^2`),
`rho[italic(r)[1]][italic(r)[3]]` = `sigma[italic(r)[1]][italic(r)[3]]` / sqrt(`sigma[italic(r)[1]]^2` * `sigma[italic(r)[3]]^2`),
`rho[italic(r)[2]][italic(r)[3]]` = `sigma[italic(r)[2]][italic(r)[3]]` / sqrt(`sigma[italic(r)[2]]^2` * `sigma[italic(r)[3]]^2`),
`sigma[italic(r)[3]]` = `sigma[italic(r)[3]]^2` / sqrt(`sigma[italic(r)[3]]^2`^2),
`rho[italic(r)[4]][italic(r)[3]]` = `sigma[italic(r)[4]][italic(r)[3]]` / sqrt(`sigma[italic(r)[4]]^2` * `sigma[italic(r)[3]]^2`),
`rho[italic(r)[1]][italic(r)[4]]` = `sigma[italic(r)[1]][italic(r)[4]]` / sqrt(`sigma[italic(r)[1]]^2` * `sigma[italic(r)[4]]^2`),
`rho[italic(r)[2]][italic(r)[4]]` = `sigma[italic(r)[2]][italic(r)[4]]` / sqrt(`sigma[italic(r)[2]]^2` * `sigma[italic(r)[4]]^2`),
`rho[italic(r)[3]][italic(r)[4]]` = `sigma[italic(r)[3]][italic(r)[4]]` / sqrt(`sigma[italic(r)[3]]^2` * `sigma[italic(r)[4]]^2`),
`sigma[italic(r)[4]]` = `sigma[italic(r)[4]]^2` / sqrt(`sigma[italic(r)[4]]^2`^2))
sigma %>%
select(`sigma[italic(r)[1]]`:`sigma[italic(r)[4]]`) %>%
pivot_longer(everything()) %>%
mutate(name = factor(name, levels = levels)) %>%
# plot!
ggplot(aes(x = value, y = 0)) +
stat_halfeye(.width = .95, size = 1) +
scale_x_continuous("marginal posterior", expand = c(0, 0), limits = c(0, 1),
breaks = 0:4 / 4, labels = c("0", ".25", ".5", ".75", "1")) +
scale_y_discrete(NULL, breaks = NULL) +
coord_cartesian(ylim = c(0.5, NA)) +
theme(panel.grid = element_blank()) +
facet_wrap(~ name, labeller = label_parsed)
```
As before, it might be helpful to reduce the complexity of this plot by focusing on the posterior medians. We'll make a plot version of the correlation matrix in the middle of page 256 in the text.
```{r, fig.width = 5, fig.height = 3.5}
sigma %>%
select(`sigma[italic(r)[1]]`:`sigma[italic(r)[4]]`) %>%
pivot_longer(everything()) %>%
mutate(name = factor(name, levels = levels)) %>%
group_by(name) %>%
median_qi(value) %>%
mutate(label = round(value, digits = 2)) %>%
ggplot(aes(x = 0, y = 0)) +
geom_tile(aes(fill = value)) +
geom_text(aes(label = label)) +
scale_fill_viridis_c("posterior\nmedian", option = "A", limits = c(0, 1)) +
scale_x_continuous(NULL, breaks = NULL, expand = c(0, 0)) +
scale_y_continuous(NULL, breaks = NULL, expand = c(0, 0)) +
labs(subtitle = expression(hat(Omega)[italic(r)]*" for the standard multilevel model for change")) +
theme(legend.text = element_text(hjust = 1)) +
facet_wrap(~ name, labeller = label_parsed)
```
The correlation matrix has an even more pronounced band-diagonal structure. Thus even though the standard multilevel model of change does not contain an explicit autocorrelation parameter $\rho$, the model does account for residual autocorrelation. We might take this even further. Notice the parameters $\rho_{r_2, r_1}$, $\rho_{r_3, r_2}$, and $\rho_{r_4, r_3}$ are all autocorrelations of the first order, and further note how similar their posterior medians all are. Here's a summary of the average of those three parameters, which we'll just call $\rho$.
```{r}
sigma %>%
# take the average of the three parameters
transmute(rho = (`rho[italic(r)[2]][italic(r)[1]]` + `rho[italic(r)[3]][italic(r)[2]]` + `rho[italic(r)[4]][italic(r)[3]]`) / 3) %>%
# summarize
median_qi()
```
If you look at the "estimate" column in Table 7.3 (pp. 258--259), you'll see the $\hat \rho$ values for the autoregressive and heterogeneous-autoregressive models are very similar to the $\hat \rho$ posterior we just computed for the standard multilevel model of change. The standard multilevel model of change accounts for autocorrelation, but it does so without an explicit $\rho$ parameter.
## Postulating an alternative error covariance structure
Singer and Willett wrote:
> it is easy to specify alternative covariance structures for the composite residual and determine analytically which specification—the "standard" or an alternative—fits best. You already possess the analytic tools and skills needed for this work. (p. 257)
Frequentest **R** users would likely do this with the [**nlme** package](https://CRAN.R-project.org/package=nlme) [@R-nlme]. Recent additions to **brms** makes this largely possible, but not completely so. The six error structures listed in this section were:
* unstructured,
* compound symmetric,
* heterogeneous compound symmetric,
* autoregressive,
* heterogeneous autoregressive, and
* Toeplitz.
The Toeplitz structure is not currently available with **brms**, but we can experiment with the first five. For details, see [issue #403](https://github.com/paul-buerkner/brms/issues/403) in the **brms** GitHub repo.
### Unstructured error covariance matrix.
Models using an unstructured error covariance matrix include $k (k + 1) / 2$ variance/covariance parameters, where $k$ is the number of time waves in the data. In the case of our `opposites_pp` data, we can compute the number of parameters as follows.
```{r}
k <- 4
k * (k + 1) / 2
```
We end up with 4 variances and 6 covariances. Following the notation Singer and Willett used in the upper left corner of Table 7.3 (p. 258), we can express this as
$$
\begin{align*}
\mathbf{\Sigma}_r & = \begin{bmatrix}
\sigma_1^2 & \sigma_{12} & \sigma_{13} & \sigma_{14} \\
\sigma_{21} & \sigma_2^2 & \sigma_{23} & \sigma_{24} \\
\sigma_{31} & \sigma_{32} & \sigma_3^2 & \sigma_{34} \\
\sigma_{41} & \sigma_{42} & \sigma_{43} & \sigma_4^2 \end{bmatrix}.
\end{align*}
$$
> The great appeal of an unstructured error covariance structure is that it places no restrictions on the structure of $\mathbf{\Sigma}_r$. For a given set of fixed effects, its deviance statistic will always be the smallest of any error covariance structure. If you have just a few waves of data, this choice can be attractive. But if you have many waves, it can require an exorbitant number of parameters... [However,] the "standard" model requires only 3 variance components ($\sigma_0^2$, $\sigma_1^2$, and $\sigma_\epsilon^2$) and one covariance component, $\sigma_{01}$. (p. 260)
To fit the unstructured model with **brms**, we use the `unstr()` function. Notice how we've dropped the usual `(1 + time | id)` syntax. Instead, we indicate the data are temporally structured by the `time` variable by setting `time = time` within `unstr()`, and we indicate the data are grouped by `id` by setting `gr = id`, also within `unstr()`. Also, notice we've wrapped the entire model formula within the `bf()` function. The second line within the `bf()` function has `sigma` on the left side of the `~` operator, which is something we haven't seen before. With that line, we have allowed the residual variance $\sigma_\epsilon$ to vary across time points. By default, **brms** will use the log link, to insure the model for $\sigma_{\epsilon j}$ will never predict negative variances.
```{r fit7.2}
fit7.2 <-
brm(data = opposites_pp,
family = gaussian,
bf(opp ~ 0 + Intercept + time + ccog + time:ccog + unstr(time = time, gr = id),
sigma ~ 0 + factor(time)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 7,
file = "fits/fit07.02")
```
Check the summary.
```{r}
print(fit7.2, digits = 3, robust = T)
```
There's a lot of exciting things going on in that output. We'll start with the bottom 4 rows in the `Population-Level Effects` section, which contains our the summaries for our $\log (\sigma_{\epsilon j})$ parameters. To get them out of the log metric, we exponentiate. Here's a quick conversion.
```{r}
fixef(fit7.2)[5:8, -2] %>% exp()
```
Now let's address the new `Correlation Structures` section of the `print()` output. Just as **brms** decomposes the typical multilevel model level-2 variance/covariance matrix $\mathbf{\Sigma}$ as
$$
\begin{align*}
\mathbf \Sigma & = \mathbf D \mathbf \Omega \mathbf D, \text{where} \\
\mathbf D & = \begin{bmatrix} \sigma_0 & 0 \\ 0 & \sigma_1 \end{bmatrix} \text{and} \\
\mathbf \Omega & = \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix},
\end{align*}
$$
the same kind of thing happens when we fit a model with an unstructured variance/covariance matrix with the `unstr()` function. The `print()` output returned posterior summaries for the elements of the correlation matrix $\mathbf \Omega$ in the `Correlation Structures` section, and it returned posterior summaries for the elements of the diagonal matrix of standard deviations $\mathbf D$ in the last four rows of the `Population-Level Effects` section. But notice that instead of $2 \times 2$ matrices like we got with our conventional growth model `fit7.1`, both $\mathbf D$ and $\mathbf \Omega$ are now $4 \times 4$ matrices right out of the gate. Thus if we use the posterior medians from the `print()` output as our point estimates, we can express the $\hat{\mathbf \Sigma}$ matrix from our unstructured `fit7.2` model as
$$
\begin{align*}
\mathbf \Sigma & = \mathbf D \mathbf \Omega \mathbf D \\
\hat{\mathbf D} & = \begin{bmatrix}
35.6 & 0 & 0 & 0 \\
0 & 31.6 & 0 & 0 \\
0 & 0 & 33.0 & 0 \\
0 & 0 & 0 & 33.7
\end{bmatrix} \\
\hat{\mathbf \Omega} & = \begin{bmatrix}
1 & .75 & .66 & .33 \\
.75 & 1 & .81 & .64 \\
.66 & .81 & 1 & .73 \\
.33 & .64 & .73 & 1
\end{bmatrix}.
\end{align*}
$$
```{r, warning = F, eval = F, echo = F}
as_draws_df(fit7.2) %>%
transmute(`sigma[1]` = exp(b_sigma_factortime0),
`sigma[2]` = exp(b_sigma_factortime1),
`sigma[3]` = exp(b_sigma_factortime2),
`sigma[4]` = exp(b_sigma_factortime3)) %>%
pivot_longer(everything()) %>%
group_by(name) %>%
summarise(m = median(value) %>% round(digits = 3))
as_draws_df(fit7.2) %>%
transmute(`rho[12]` = cortime__0__1,
`rho[13]` = cortime__0__2,
`rho[14]` = cortime__0__3,
`rho[23]` = cortime__1__2,
`rho[24]` = cortime__1__3,
`rho[34]` = cortime__2__3,
`rho[21]` = cortime__0__1,
`rho[31]` = cortime__0__2,
`rho[32]` = cortime__1__2,
`rho[41]` = cortime__0__3,
`rho[42]` = cortime__1__3,
`rho[43]` = cortime__2__3) %>%
pivot_longer(everything()) %>%
group_by(name) %>%
summarise(m = median(value) %>% round(digits = 3))
```
Here's how to compute and summarize the $\mathbf D$ and $\mathbf \Omega$ parameters with the `as_draws_df()` output.
```{r, warning = F}
# wrangle
sigma.us <-
as_draws_df(fit7.2) %>%
transmute(`sigma[1]` = exp(b_sigma_factortime0),
`sigma[2]` = exp(b_sigma_factortime1),
`sigma[3]` = exp(b_sigma_factortime2),
`sigma[4]` = exp(b_sigma_factortime3),
`rho[12]` = cortime__0__1,
`rho[13]` = cortime__0__2,
`rho[14]` = cortime__0__3,
`rho[23]` = cortime__1__2,
`rho[24]` = cortime__1__3,
`rho[34]` = cortime__2__3,
`rho[21]` = cortime__0__1,
`rho[31]` = cortime__0__2,
`rho[32]` = cortime__1__2,
`rho[41]` = cortime__0__3,
`rho[42]` = cortime__1__3,
`rho[43]` = cortime__2__3)
# summarize
sigma.us %>%
pivot_longer(everything()) %>%
group_by(name) %>%
median_qi(value) %>%
mutate_if(is.double, round, digits = 2)
```
Since Singer and Willett preferred the variance/covariance parameterization for $\mathbf \Sigma$, we'll practice wrangling the posterior draws to transform our results into that metric, too.
```{r}
# transform
sigma.us <- sigma.us %>%
transmute(`sigma[1]^2` = `sigma[1]`^2,
`sigma[12]` = `sigma[1]` * `sigma[2]` * `rho[12]`,
`sigma[13]` = `sigma[1]` * `sigma[3]` * `rho[13]`,
`sigma[14]` = `sigma[1]` * `sigma[4]` * `rho[14]`,
`sigma[21]` = `sigma[2]` * `sigma[1]` * `rho[21]`,
`sigma[2]^2` = `sigma[2]`^2,
`sigma[23]` = `sigma[2]` * `sigma[3]` * `rho[23]`,
`sigma[24]` = `sigma[2]` * `sigma[4]` * `rho[24]`,
`sigma[31]` = `sigma[3]` * `sigma[1]` * `rho[31]`,
`sigma[32]` = `sigma[3]` * `sigma[2]` * `rho[32]`,
`sigma[3]^2` = `sigma[3]`^2,
`sigma[34]` = `sigma[3]` * `sigma[4]` * `rho[34]`,
`sigma[41]` = `sigma[4]` * `sigma[1]` * `rho[41]`,
`sigma[42]` = `sigma[4]` * `sigma[2]` * `rho[42]`,
`sigma[43]` = `sigma[4]` * `sigma[3]` * `rho[43]`,
`sigma[4]^2` = `sigma[4]`^2)
# summarize
sigma.us %>%
pivot_longer(everything()) %>%
group_by(name) %>%
median_qi(value) %>%
mutate_if(is.double, round, digits = 2)
```
But again, I suspect it will be easier to appreciate our posterior $\hat{\mathbf \Sigma}$ in a tile plot. Here's a summary using the posterior medians, similar to what Singer and Willett reported in the rightmost column of Table 7.3.
```{r, fig.width = 5, fig.height = 3.5}
levels <-
c("sigma[1]^2", "sigma[12]", "sigma[13]", "sigma[14]",
"sigma[21]", "sigma[2]^2", "sigma[23]", "sigma[24]",
"sigma[31]", "sigma[32]", "sigma[3]^2", "sigma[34]",
"sigma[41]", "sigma[42]", "sigma[43]", "sigma[4]^2")
sigma.us %>%
pivot_longer(everything()) %>%
mutate(name = factor(name, levels = levels)) %>%
group_by(name) %>%
median_qi(value) %>%
mutate(label = round(value, digits = 0)) %>%
ggplot(aes(x = 0, y = 0)) +
geom_tile(aes(fill = value)) +
geom_text(aes(label = label, color = value < 1),
show.legend = F) +
scale_fill_viridis_c("posterior\nmedian", option = "A", limits = c(0, NA)) +
scale_color_manual(values = c("black", "white")) +
scale_x_continuous(NULL, breaks = NULL, expand = c(0, 0)) +
scale_y_continuous(NULL, breaks = NULL, expand = c(0, 0)) +
labs(subtitle = expression(hat(Sigma)[italic(r)]*" for the unstructured model")) +
theme(legend.text = element_text(hjust = 1)) +
facet_wrap(~ name, labeller = label_parsed)
```
In case you're curious, here's the variance/covariance matrix from the sample data.
```{r}
fit7.2$data %>%
select(id, time, opp) %>%
pivot_wider(names_from = time, values_from = opp) %>%
select(-id) %>%
cov() %>%
round(digits = 0)
```
The **brms** default priors are weakly regularizing, particularly the LKJ prior for the correlation matrix $\mathbf \Omega$, and I believe this is why the values from our model are systemically lower that the sample statistics. If you find this upsetting, collect more data, which hill help the likelihood dominate the prior.
### Compound symmetric error covariance matrix.
"A *compound symmetric* error covariance matrix requires just two parameters, labeled $\sigma^2$ and $\sigma_1^2$ in table 7.3" (p. 260). From the table, we see that matrix follows the form
$$
\begin{align*}
\mathbf{\Sigma}_r & = \begin{bmatrix}
\sigma^2 + \sigma_1^2 & \sigma_1^2 & \sigma_1^2 & \sigma_1^2 \\
\sigma_1^2 & \sigma^2 + \sigma_1^2 & \sigma_1^2 & \sigma_1^2 \\
\sigma_1^2 & \sigma_1^2 & \sigma^2 + \sigma_1^2 & \sigma_1^2 \\
\sigma_1^2 & \sigma_1^2 & \sigma_1^2 & \sigma^2 + \sigma_1^2 \end{bmatrix},
\end{align*}
$$
where $\sigma_1^2$ does not have the same meaning we've become accustomed to (i.e., the level-2 variance in linear change over time). The meaning of $\sigma^2$ might also be a little opaque. Happily, there's another way to express this matrix, which is a modification of the heterogeneous compound symmetric matrix we see listed in Table 7.3. That alternative is:
$$
\begin{align*}
\mathbf{\Sigma}_r & = \begin{bmatrix}
\sigma_\epsilon^2 & \sigma_\epsilon^2 \rho & \sigma_\epsilon^2 \rho & \sigma_\epsilon^2 \rho \\
\sigma_\epsilon^2 \rho & \sigma_\epsilon^2 & \sigma_\epsilon^2 \rho & \sigma_\epsilon^2 \rho \\
\sigma_\epsilon^2 \rho & \sigma_\epsilon^2 \rho & \sigma_\epsilon^2 & \sigma_\epsilon^2 \rho \\
\sigma_\epsilon^2 \rho & \sigma_\epsilon^2 \rho & \sigma_\epsilon^2 \rho & \sigma_\epsilon^2 \end{bmatrix},
\end{align*}
$$
where the term on the diagonal, $\sigma_\epsilon^2$, is the residual variance, which is constrained to equality across all four time points. In all cells in the off-diagonal, we see $\sigma_\epsilon^2$ multiplied by $\rho$. In this parameterization, $\rho$ is the correlation between time points, and that correlation is constrained to equality across all possible pairs of time points. Although this notation is a little different from the notation used in the text, I believe it will help us interpret our model. As we'll see, **brms** uses this alternative parameterization.
To fit the compound symmetric model with **brms**, we use the `cosy()` function. Notice how like with the unstructured model `fit7.2`, we've dropped the usual `(1 + time | id)` syntax. Instead, we impose compound symmetry *within persons* by setting `gr = id` within `cosy()`.
```{r fit7.3}
fit7.3 <-
brm(data = opposites_pp,
family = gaussian,
opp ~ 0 + Intercept + time + ccog + time:ccog + cosy(gr = id),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 7,
file = "fits/fit07.03")
```
Check the model summary.
```{r}
print(fit7.3, digits = 3)
```
See that new `cosy` row? That's $\rho$, the residual correlation among the time points. The `sigma` row on the bottom has it's typical interpretation, it's the residual standard deviation, what we typically call $\sigma_\epsilon$. Square it and you'll have what we called $\sigma_\epsilon^2$ in the matrix, above. Okay, since our **brms** model is parameterized differently from what Singer and Willett reported in the text (see Table 7.3, p. 258), we'll wrangle the posterior draws a bit.
```{r, warning = F}
sigma.cs <-
as_draws_df(fit7.3) %>%
transmute(rho = cosy,
sigma_e = sigma,
`sigma^2 + sigma[1]^2` = sigma^2) %>%
mutate(`sigma[1]^2` = rho * sigma_e^2) %>%
mutate(`sigma^2` = `sigma^2 + sigma[1]^2` - `sigma[1]^2`)
# what did we do?
head(sigma.cs)
```
Here's the numeric summary.
```{r}
sigma.cs %>%
pivot_longer(everything()) %>%
group_by(name) %>%
median_qi(value) %>%
mutate_if(is.double, round, digits = 2)
```
To simplify, we might pull the posterior medians for $\sigma^2 + \sigma_1^2$ and $\sigma_1^2$. We'll call them `diagonal` and `off_diagonal`, respectively.
```{r}
diagonal <- median(sigma.cs$`sigma^2 + sigma[1]^2`)
off_diagonal <- median(sigma.cs$`sigma[1]^2`)
```
Now we have them, we can make our colored version of the $\mathbf{\Sigma}_r$ Singer and Willett reported in the rightmost column of Table 7.3.
```{r, fig.width = 4.5, fig.height = 2.5}
crossing(row = 1:4,
col = factor(1:4)) %>%
mutate(value = if_else(row == col, diagonal, off_diagonal)) %>%
mutate(label = round(value, digits = 0),
col = fct_rev(col)) %>%
ggplot(aes(x = row, y = col)) +
geom_tile(aes(fill = value)) +
geom_text(aes(label = label)) +
scale_fill_viridis_c("posterior\nmedian", option = "A", limits = c(0, NA)) +
scale_x_continuous(NULL, breaks = NULL, position = "top", expand = c(0, 0)) +
scale_y_discrete(NULL, breaks = NULL, expand = c(0, 0)) +
labs(subtitle = expression(hat(Sigma)[italic(r)]*" for the compound symmetric model")) +
theme(legend.text = element_text(hjust = 1))
```
### Heterogeneous compound symmetric error covariance matrix .
Now we extend the compound symmetric matrix by allowing the residual variances to vary across the time waves. Thus, instead of a single $\sigma_\epsilon^2$ parameter, we'll have $\sigma_1^2$ through $\sigma_4^2$. However, we still have a single correlation parameter $\rho$. We can express this as
$$
\begin{align*}
\mathbf{\Sigma}_r & = \begin{bmatrix}
\sigma_1^2 & \sigma_1 \sigma_2 \rho & \sigma_1 \sigma_3 \rho & \sigma_1 \sigma_4 \rho \\
\sigma_2 \sigma_1 \rho & \sigma_1^2 & \sigma_2 \sigma_3 \rho & \sigma_2 \sigma_4 \rho \\
\sigma_3 \sigma_1 \rho & \sigma_3 \sigma_2 \rho & \sigma_3^2 & \sigma_3 \sigma_4 \rho \\
\sigma_4 \sigma_1 \rho & \sigma_4 \sigma_2 \rho & \sigma_4 \sigma_3 \rho & \sigma_4^2 \end{bmatrix},
\end{align*}
$$
where, even though the correlation is the same in all cells, the covariances will differ because they are based on different combinations of the $\sigma$ parameters. To fit this model with **brms**, we will continue to use `cosy(gr = id)`. But now we wrap the entire model `formula` within the `bf()` function and allow the residual standard deviations to vary across he waves with the line `sigma ~ 0 + factor(time)`.
```{r fit7.4}
fit7.4 <-
brm(data = opposites_pp,
family = gaussian,
bf(opp ~ 0 + Intercept + time + ccog + time:ccog + cosy(gr = id),
sigma ~ 0 + factor(time)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 7,
file = "fits/fit07.04")
```
Check the summary.
```{r}
print(fit7.4, digits = 3)
```
If you look at the second row in the output, you'll see that the **brms** default was to model $\log(\sigma_j)$. Thus, you'll have to exponentiate those posteriors to get them in their natural metric. Here's a quick conversion.
```{r}
fixef(fit7.4)[5:8, -2] %>% exp()
```
To get the marginal posteriors for the full $\mathbf{\Sigma}_r$ matrix, we'll want to work directly with the output from `as_draws_df()`.
```{r, warning = F}
sigma.hcs <-
as_draws_df(fit7.4) %>%
transmute(`sigma[1]` = exp(b_sigma_factortime0),
`sigma[2]` = exp(b_sigma_factortime1),
`sigma[3]` = exp(b_sigma_factortime2),
`sigma[4]` = exp(b_sigma_factortime3),
rho = cosy,
`sigma[1]^2` = exp(b_sigma_factortime0)^2,
`sigma[2]^2` = exp(b_sigma_factortime1)^2,
`sigma[3]^2` = exp(b_sigma_factortime2)^2,
`sigma[4]^2` = exp(b_sigma_factortime3)^2) %>%
mutate(`sigma[2]*sigma[1]*rho` = `sigma[2]` * `sigma[1]` * rho,
`sigma[3]*sigma[1]*rho` = `sigma[3]` * `sigma[1]` * rho,
`sigma[4]*sigma[1]*rho` = `sigma[4]` * `sigma[1]` * rho,
`sigma[1]*sigma[2]*rho` = `sigma[1]` * `sigma[2]` * rho,
`sigma[3]*sigma[2]*rho` = `sigma[3]` * `sigma[2]` * rho,
`sigma[4]*sigma[2]*rho` = `sigma[4]` * `sigma[2]` * rho,
`sigma[1]*sigma[3]*rho` = `sigma[1]` * `sigma[3]` * rho,
`sigma[2]*sigma[3]*rho` = `sigma[2]` * `sigma[3]` * rho,
`sigma[4]*sigma[3]*rho` = `sigma[4]` * `sigma[3]` * rho,
`sigma[1]*sigma[4]*rho` = `sigma[1]` * `sigma[4]` * rho,
`sigma[2]*sigma[4]*rho` = `sigma[2]` * `sigma[4]` * rho,
`sigma[3]*sigma[4]*rho` = `sigma[3]` * `sigma[4]` * rho)
# what did we do?
glimpse(sigma.hcs)
```
Here's the numeric summary.
```{r}
sigma.hcs %>%
pivot_longer(everything()) %>%
group_by(name) %>%
median_qi(value) %>%
mutate_if(is.double, round, digits = 2)
```
That's a lot of information to wade through. Here we simplify the picture by making our plot version of the matrix Singer and Willett reported in the rightmost column of Table 7.3.
```{r, fig.width = 5, fig.height = 3.5}
# arrange the panels
levels <-
c("sigma[1]^2", "sigma[1]*sigma[2]*rho", "sigma[1]*sigma[3]*rho", "sigma[1]*sigma[4]*rho",
"sigma[2]*sigma[1]*rho", "sigma[2]^2", "sigma[2]*sigma[3]*rho", "sigma[2]*sigma[4]*rho",
"sigma[3]*sigma[1]*rho", "sigma[3]*sigma[2]*rho", "sigma[3]^2", "sigma[3]*sigma[4]*rho",
"sigma[4]*sigma[1]*rho", "sigma[4]*sigma[2]*rho", "sigma[4]*sigma[3]*rho", "sigma[4]^2")
sigma.hcs %>%
select(`sigma[1]^2`:`sigma[3]*sigma[4]*rho`) %>%
pivot_longer(everything()) %>%
mutate(name = factor(name, levels = levels)) %>%
group_by(name) %>%
median_qi(value) %>%
mutate(label = round(value, digits = 0)) %>%
ggplot(aes(x = 0, y = 0)) +
geom_tile(aes(fill = value)) +
geom_text(aes(label = label)) +
scale_fill_viridis_c("posterior\nmedian", option = "A", limits = c(0, NA)) +
scale_x_continuous(NULL, breaks = NULL, expand = c(0, 0)) +
scale_y_continuous(NULL, breaks = NULL, expand = c(0, 0)) +
labs(subtitle = expression(hat(Sigma)[italic(r)]*" for the heterogeneous compound symmetric model")) +
theme(legend.text = element_text(hjust = 1)) +
facet_wrap(~ name, labeller = label_parsed)
```
### Autoregressive error covariance matrix.
The first-order autoregressive has a strict "band-diagonal" structure governed by two parameters, which Singer and Willett called $\sigma^2$ and $\rho$. From Table 7.3 (p. 260), we see that matrix follows the form
$$
\begin{align*}
\mathbf{\Sigma}_r & = \begin{bmatrix}
\sigma^2 & \sigma^2 \rho & \sigma^2 \rho^2 & \sigma^2 \rho^3 \\
\sigma^2 \rho & \sigma^2 & \sigma^2 \rho & \sigma^2 \rho^2 \\
\sigma^2 \rho^2 & \sigma^2 \rho & \sigma^2 & \sigma^2 \rho \\
\sigma^2 \rho^3 & \sigma^2 \rho^2 & \sigma^2 \rho & \sigma^2 \end{bmatrix},
\end{align*}
$$
where $\rho$ is the correlation of one time point to the one immediately before or after, after conditioning on the liner model. In a similar way, $\rho^2$ is the correlation between time points with one degree of separation (e.g., time 1 with time 3) and $\rho^3$ is the correlation between the first and fourth time point. The other parameter, $\sigma^2$ is the residual variance after conditioning on the linear model.
Once can fit this model with **brms** using a version of the `ar()` syntax. However, the model will follow a slightly different parameterization, following the form:
$$
\begin{align*}
\mathbf{\Sigma}_r & = \begin{bmatrix}
\left (\sigma_\epsilon \Big / \sqrt{1 - \rho^2} \right )^2 & \left (\sigma_\epsilon \Big / \sqrt{1 - \rho^2} \right )^2 \rho & \left (\sigma_\epsilon \Big / \sqrt{1 - \rho^2} \right )^2 \rho^2 & \left (\sigma_\epsilon \Big / \sqrt{1 - \rho^2} \right )^2 \rho^3 \\
\left (\sigma_\epsilon \Big / \sqrt{1 - \rho^2} \right )^2 \rho & \left (\sigma_\epsilon \Big / \sqrt{1 - \rho^2} \right )^2 & \left (\sigma_\epsilon \Big / \sqrt{1 - \rho^2} \right )^2 \rho & \left (\sigma_\epsilon \Big / \sqrt{1 - \rho^2} \right )^2 \rho^2 \\
\left (\sigma_\epsilon \Big / \sqrt{1 - \rho^2} \right )^2 \rho^2 & \left (\sigma_\epsilon \Big / \sqrt{1 - \rho^2} \right )^2 \rho & \left (\sigma_\epsilon \Big / \sqrt{1 - \rho^2} \right )^2 & \left (\sigma_\epsilon \Big / \sqrt{1 - \rho^2} \right )^2 \rho \\
\left (\sigma_\epsilon \Big / \sqrt{1 - \rho^2} \right )^2 \rho^3 & \left (\sigma_\epsilon \Big / \sqrt{1 - \rho^2} \right )^2 \rho^2 & \left (\sigma_\epsilon \Big / \sqrt{1 - \rho^2} \right )^2 \rho & \left (\sigma_\epsilon \Big / \sqrt{1 - \rho^2} \right )^2