-
Notifications
You must be signed in to change notification settings - Fork 14
/
Copy path15-RepeatedSurveys.Rmd
1150 lines (941 loc) · 68.9 KB
/
15-RepeatedSurveys.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Repeated sample surveys for monitoring population parameters {#RepeatedSurveys}
The previous chapters are all about sampling to estimate population parameters *at a given time*. The survey is done in a relatively short period of time, so that we can safely assume that the study variable has not changed during that period. This chapter is about repeating the sample survey two or more times, to estimate, for instance, a temporal change in a population parameter\index{Repeated surveys for monitoring}. Sampling locations are selected by probability sampling, by any design type. In most cases, sampling times are not selected randomly, but purposively. For instance, to monitor the carbon stock in the soil of a country, we may decide to repeat the survey after five years, in the same season of the year as the first survey.
## Space-time designs
An overview of space-time designs\index{Space-time design} is presented by @gru06. Five of these designs are schematically shown in Figure \@ref(fig:SpaceTimeDesigns). With repeated sampling in two-dimensional space, there are three dimensions: two spatial dimensions and one time dimension. Sampling locations shown in Figure \@ref(fig:SpaceTimeDesigns) are selected by simple random sampling, but this is not essential for the space-time designs.
In the static-synchronous (SS) design\index{Space-time design!static-synchronous design}, referred to as a pure panel\index{Pure panel} by @ful99, all sampling locations selected in the first survey are revisited in all subsequent surveys. On the contrary, in an independent synchronous (IS) design\index{Space-time design!independent-synchronous design}, a probability sample is selected in each survey independently from the samples selected in the previous surveys. The serially alternating (SA) design\index{Space-time design!serially alternating design} is a compromise between an SS and an IS design. The sample selected in the first survey is revisited in the third survey. The sample of the second survey is selected independently from the sample of the first survey, and these locations are revisited in the fourth survey. In this case, the period of revisits is two, i.e., two sampling intervals between consecutive surveys, but this can also be increased. For instance, with a period of three, three samples are selected, independently from each other, for the first three surveys, and these samples are revisited in subsequent surveys.
Two other compromise designs are a supplemented panel (SP) design\index{Space-time design!supplemented panel design} and a rotating panel (RP) design\index{Space-time design!rotating panel design}. In an SP design only a subset of the sampling locations of the first survey is revisited in the subsequent surveys. These are the permanent sampling locations observed in all subsequent surveys. The permanent sampling locations are supplemented by samples that are selected independently from the samples in the previous surveys. In Figure \@ref(fig:SpaceTimeDesigns), half of the sampling locations (ten locations) is permanent (panel a), i.e., revisited in all surveys, but the proportion of permanent sampling locations can be smaller or larger and, if prior information on the variation in space and time is available, even can be optimised for estimating the current mean. Also in an RP design, sampling units of the previous survey are partially replaced by new units. The difference with an SP design is that there are no permanent sampling units, i.e., no units observed in all surveys. All sampling units are sequentially rotated out and in again at the subsequent sampling times.
```{r SpaceTimeDesigns, echo = FALSE, out.width = "100%", fig.asp = 1, fig.cap = "Space-time designs for monitoring population parameters. The sampling locations in 2D are plotted in one dimension, along the horizontal axis. A selected unit along this axis actually represents a sampling location in 2D. Twenty sampling locations are selected by simple random sampling. SS: static-synchronous design; IS: independent synchronous design; SA: serially alternating design; SP: supplemented panel design; RP: rotating panel design."}
#pure panel
set.seed(314)
Space <- sample(1000, size = 20, replace = FALSE)
mysampleSS <- data.frame(
Space = rep(x = Space, times = 4),
Time = rep(x = c(1, 2, 3, 4), each = length(Space)),
Panel = rep("a", 4 * length(Space))
)
mysampleSS$design <- "SS"
#independent synchronous
Space <- sample(1000, size = 80, replace = FALSE)
mysampleIS <- data.frame(
Space,
Time = rep(x = c(1, 2, 3, 4), each = 20)
)
mysampleIS$Panel <- rep(x = c("a", "b", "c", "d"), each = 20)
mysampleIS$design <- "IS"
#serially alternating
# set periodicity
p <- 2
# construct data.frame
mysample1 <- NULL
for (i in seq_len(p)) {
mysample1 <-
rbind(mysample1,
data.frame(Space = sample(1000, size = 20, replace = FALSE), Time = i, Panel = letters[i]))
}
mysample2 <- mysample1
mysample2$Time <- mysample1$Time + p
mysample2$Panel <- mysample1$Panel
mysampleSA <- rbind(mysample1, mysample2)
mysampleSA$design <- "SA"
#supplemented panel
p <- 5 #number of panels
mysample1 <- NULL
for (i in seq_len(p)) {
mysample1 <- rbind(
mysample1,
data.frame(
Space = sample(1000, size = 10, replace = FALSE),
Time = (i == 1) * 1 + (i > 1) * (i - 1),
Panel = letters[i])
)
}
mysample2 <- NULL
mysample2$Space[1:10] <- mysample1$Space[1:10]
mysample2$Space[11:20] <- mysample1$Space[1:10]
mysample2$Space[21:30] <- mysample1$Space[1:10]
mysample2$Time <- rep(x = c(2, 3, 4), each = 10)
mysample2$Panel <- rep(x = "a", each = 30)
mysampleSP <- rbind(mysample1, mysample2)
mysampleSP$design <- "SP"
#rotating panel
p <- 5 #number of panels
mysample1 <- NULL
for (i in seq_len(p)) {
mysample1 <- rbind(
mysample1,
data.frame(
Space = sample(1000, size = 10, replace = FALSE),
Time = (i == 1) * 1 + (i > 1) * (i - 1),
Panel = letters[i])
)
}
mysample2 <- NULL
mysample2$Space[1:10] <- mysample1$Space[11:20]
mysample2$Space[11:20] <- mysample1$Space[21:30]
mysample2$Space[21:30] <- mysample1$Space[31:40]
mysample2$Time <- rep(x = c(2, 3, 4), each = 10)
mysample2$Panel <- rep(x = c("b", "c", "d"), each = 10)
mysampleRP <- rbind(mysample1, mysample2)
mysampleRP$design <- "RP"
mysample <- rbind(mysampleSS, mysampleIS, mysampleSA, mysampleSP, mysampleRP)
mysample$design <- factor(x = mysample$design, levels = c("SS", "IS", "SA", "SP", "RP"), ordered = TRUE)
# create plot
ggplot(data = mysample) +
geom_point(mapping = aes(x = Space, y = Time, shape = factor(Panel), colour = factor(Panel)), size = 2) +
scale_x_continuous(limits = c(0, 1000)) +
scale_y_continuous(breaks = 1:4) +
scale_shape(name = "Panel", solid = FALSE) +
scale_colour_viridis_d(name = "Panel") +
facet_wrap(~design, nrow = 3, ncol = 2)
```
In Figure \@ref(fig:SpaceTimeDesigns) the shape and colour of the symbols represent a panel. A panel is a group of sampling locations that is observed in the same surveys. In the SS design, there is only one panel. All locations are observed in all surveys, so all locations are in the same panel. In the IS design, there are as many panels as there are surveys. In the SA design with a period of two, the number of panels equals the number of surveys divided by two. In these three space-time designs (SS, IS, and SA), all sampling locations of a given survey are in the same panel. This is not the case in the SP and RP designs. In Figure \@ref(fig:SpaceTimeDesigns) in each survey, two panels are observed. In the SP sample, there is one panel of permanent sampling locations (pure panel part of sample) and another panel of swarming sampling locations observed in one survey only. In the RP sample of Figure \@ref(fig:SpaceTimeDesigns), the sampling locations are observed in two consecutive surveys; however, this number can be increased. For instance, in an 'in-for-three' rotational sample [@McLaren2001] the sampling locations stay in the sample for three consecutive surveys. The number of panels per sampling time is then three. Also, similar to an SA design, in an IS, SP, and RP design we may decide after several surveys to stop selecting new sampling locations and to revisit existing locations. The concept of panels is needed hereafter in estimating space-time population parameters.
## Space-time population parameters {#SpaceTimeparameters}
The data of repeated surveys can be used to estimate various parameters of the space-time universe. I will focus on the mean as a parameter, but estimation of a total or proportion is straightforward, think for instance of the total amount of greenhouse gas emissions from the soil in a given study area during some period. This chapter shows how to estimate the current mean, i.e., the population mean (spatial mean) in the last survey, the change of the mean between two surveys, the temporal trend of the mean, and the space-time mean. The current mean need not be defined here as only one survey (one sampling time) is involved in this parameter, so that the definition in Subsection \@ref(PopulationParameters) is also relevant here.
The change of the mean is defined as the spatial mean at a given survey minus this mean at an earlier survey\index{Change of spatial mean (total)}. For finite populations, the definition is
\begin{equation}
\bar{d}_{ab}=\frac{1}{N}\left(\sum_{k=1}^N z_k(t_b) -\sum_{k=1}^N z_k(t_a) \right)=\frac{1}{N}\sum_{k=1}^N d_{abk}\;,
(\#eq:ChangePopMeanFinite)
\end{equation}
with $d_{abk}$ the change of the study variable in the period between time $t_a$ and $t_b$ for unit $k$. For infinite populations, the sums are replaced by integrals:
\begin{equation}
\bar{d}_{ab}=\frac{1}{A}\left(\int_{\mathbf{s} \in \mathcal{A}} z(\mathbf{s},t_b) \;\mathrm{d}\mathbf{s}-\int_{\mathbf{s} \in \mathcal{A}} z(\mathbf{s},t_a) \;\mathrm{d}\mathbf{s}\right)=\frac{1}{A}\int_{\mathbf{s} \in \mathcal{A}}d_{ab}(\mathbf{s})\;,
(\#eq:ChangePopMean)
\end{equation}
with $d_{ab}(\mathbf{s})$ the change of the study variable in the period between time $t_a$ and $t_b$ at location $\mathbf{s}$.
With more than two surveys, an interesting population parameter is the *average change per time unit* of the mean, referred to as the temporal trend of the spatial mean\index{Temporal trend of spatial mean}. It is defined as a linear combination of the spatial means at the sampling times [@Breidt99]:
\begin{equation}
b=\sum_{j=1}^R w_j \bar{z}_j \;,
(\#eq:TrendofMean)
\end{equation}
with $R$ the number of sampling times, $\bar{z}_j$ the spatial mean at time $t_j$, and weights $w_j$ equal to
\begin{equation}
w_j = \frac{t_j-\bar{t}}{\sum_{j=1}^R(t_j-\bar{t})^2} \;,
(\#eq:weightsTrendofMean)
\end{equation}
with $\bar{t}$ the mean of the sampling times.
```{block2, type='rmdnote'}
The temporal trend is defined as a parameter of a space-time population, not as a parameter of a time-series model.
```
A space-time mean can be defined as the average of the spatial means at the sampling times. In this definition, the temporal universe is discrete and restricted to the sampling times. The target universe consists of a finite set of spatial populations:
\begin{equation}
\bar{\bar{z}}_{\mathcal{U}}=\frac{1}{R} \sum_{j=1}^R \bar{z_j}\;.
(\#eq:SpacetimeMeanDiscrete)
\end{equation}
Alternatively, a space-time mean for a continuous temporal universe $\mathcal{T}$ is defined as
\begin{equation}
\bar{\bar{z}}_{\mathcal{U}} = \frac{1}{T}\int_{t \in \mathcal{T}}\bar{z}_t\;,
(\#eq:SpacetimeMeanContinuous)
\end{equation}
with $T$ the length of the monitoring period and $\bar{z}_t$ the spatial mean at time $t$.
## Design-based generalised least squares estimation of spatial means
Rotational sampling has a long tradition in agronomy, forestry, and in social studies. Early papers on how sample data from previous times can be used to increase the precision of estimates of the current mean are @jessen1942, @patterson1950, @Ware1962, and @woodruff1963. @gurney1965 developed general theory for these estimators, which I will present now.
In overlapping samples such as an SP and an RP sample, we may define one estimate of a spatial mean per 'panel'. These panel-specific estimates of the mean at a given sampling time, based on observations at that time only, are referred to as elementary estimates\index{Elementary estimate}.
The essence of the estimation method described in this section is to estimate the spatial mean at a given time point as a weighted average of the elementary estimates *of all time points*, with the weights determined by the variances and covariances of the elementary estimates. Collecting all elementary estimates of the spatial means of the different sampling times in vector $\hat{\mathbf{z}}$, we can write
\begin{equation}
\hat{\mathbf{z}}=\mathbf{X} \mathbf{z} + \mathbf{e} \;,
(\#eq:zbf)
\end{equation}
with $\mathbf{z}$ the vector of true spatial means $\bar{z}(t_1), \dots ,\bar{z}(t_R)$ at the $R$ sampling times, $\mathbf{X}$ the ($P \times R$) design matrix with zeroes and ones that selects the appropriate elements from $\mathbf{z}$ ($P$ is the total number of elementary estimates), and $\mathbf{e}$ the $P$-vector of sampling errors with variance-covariance matrix $\mathbf{C}$. With unbiased elementary estimators, the expectation of $\mathbf{e}$ is a vector with zeroes.
With an SS, IS, and SA design, the design matrix $\mathbf{X}$ is the identity matrix of size $R$, i.e., an $R \times R$ square matrix with ones on the diagonal and zeroes in all off-diagonal entries. For the SP design of Figure \@ref(fig:SpaceTimeDesigns), the design matrix $\mathbf{X}$ is
\begin{equation}
\mathbf{X}=
\begin{bmatrix}
1 &0 &0 &0\\
0 &1 &0 &0\\
0 &0 &1 &0\\
0 &0 &0 &1\\
1 &0 &0 &0\\
0 &1 &0 &0\\
0 &0 &1 &0\\
0 &0 &0 &1
\end{bmatrix}
\;.
(\#eq:XSP)
\end{equation}
The first four rows of this matrix are associated with the elementary estimates of the spatial means at the four sampling times, estimated from panel a, the panel with permanent sampling locations. Hereafter, this panel is referred to as the static-synchronous subsample. The remaining rows correspond to the elementary estimates from the other four panels, the swarming locations, hereafter referred to as the independent-synchronous subsamples. For the RP design of Figure \@ref(fig:SpaceTimeDesigns), the design matrix equals
\begin{equation}
\mathbf{X}=
\begin{bmatrix}
1 &0 &0 &0\\
1 &0 &0 &0\\
0 &1 &0 &0\\
0 &1 &0 &0\\
0 &0 &1 &0\\
0 &0 &1 &0\\
0 &0 &0 &1\\
0 &0 &0 &1
\end{bmatrix}
\;.
(\#eq:XRP)
\end{equation}
The first two rows correspond to the two elementary estimates of the mean at time $t_1$ from panels a and b, the third and fourth rows correspond to the elementary estimate at time $t_2$ from panels b and c, respectively, etc.
The minimum variance linear unbiased estimator (MVLUE) of the spatial means at the different times is the design-based generalised least squares (GLS) estimator \index{Design-based generalised least squares estimator} [@bin88]:
\begin{equation}
\hat{\mathbf{z}}_{\mathrm{GLS}}=(\mathbf{X}^{\text{T}}\mathbf{C}^{-1}\mathbf{X})^{-1} \mathbf{X}^{\text{T}} \mathbf{C}^{-1} \hat{\mathbf{z}}\;.
(\#eq:DBGLS)
\end{equation}
To define matrix $\mathbf{C}$ for the SP design in Figure \@ref(fig:SpaceTimeDesigns), let $\hat{\bar{z}}_{jp}$ denote the estimated mean at time $t_j, j = 1,2,3,4$ in subsample $p, p \in (a,b,c,d,e)$, with panel $a$ the permanent sampling locations (SS subsample) and panels $b,c,d,e$ the swarming sampling locations (IS subsamples). If the eight elementary estimates in $\hat{\mathbf{z}}$ are ordered as ($\hat{\bar{z}}_{1a},\hat{\bar{z}}_{2a},\hat{\bar{z}}_{3a},\hat{\bar{z}}_{4a},\hat{\bar{z}}_{1b},\hat{\bar{z}}_{2c},\hat{\bar{z}}_{3d},\hat{\bar{z}}_{4e}$), the variance-covariance matrix $\mathbf{C}$ equals
\begin{equation}
\mathbf{C}=
\begin{bmatrix}
V_{1a} &C_{1,2} &C_{1,3} &C_{1,4} &0 &0 &0 &0 \\
C_{2,1} &V_{2a} &C_{2,3} &C_{2,4} &0 &0 &0 &0 \\
C_{3,1} &C_{3,2} &V_{3a} &C_{3,4} &0 &0 &0 &0 \\
C_{4,1} &C_{4,2} &C_{4,3} &V_{4a} &0 &0 &0 &0 \\
0 &0 &0 &0 &V_{1b} &0 &0 &0 \\
0 &0 &0 &0 &0 &V_{2c} &0 &0 \\
0 &0 &0 &0 &0 &0 &V_{3d} &0 \\
0 &0 &0 &0 &0 &0 &0 &V_{4e}
\end{bmatrix}
\;.
(\#eq:CmatrixSP)
\end{equation}
The covariances of the elementary estimates of the IS subsamples are zero because these are estimated from independently selected samples (off-diagonal elements in lower right (4 $\times$ 4) submatrix). Also the covariances of the elementary estimates from the SS subsample and an IS subsample are zero for the same reason (upper right submatrix and lower left submatrix). The covariances of the four elementary estimates from the SS subsample are not zero because these are estimated from the same set of sampling locations.
Ordering the elementary estimates of the RP sample by the sampling times, the variance-covariance matrix equals
\begin{equation}
\mathbf{C}=
\begin{bmatrix}
V_{1a} &0 &0 &0 &0 &0 &0 &0 \\
0 &V_{1b} &C_{1,2} &0 &0 &0 &0 &0 \\
0 &C_{2,1} &V_{2b} &0 &0 &0 &0 &0 \\
0 &0 &0 &V_{2c} &C_{2,3} &0 &0 &0 \\
0 &0 &0 &C_{3,2} &V_{3c} &0 &0 &0 \\
0 &0 &0 &0 &0 &V_{3d} &C_{3,4} &0 \\
0 &0 &0 &0 &0 &C_{4,3} &V_{4d} &0 \\
0 &0 &0 &0 &0 &0 &0 &V_{4e}
\end{bmatrix}
\;.
(\#eq:CmatrixRP)
\end{equation}
Only the elementary estimates of the same panel are correlated, for instance the elementary estimates of the spatial means at times $t_1$ and $t_2$, estimated from panel b.
For the SA design of Figure \@ref(fig:SpaceTimeDesigns) the variance-covariance matrix equals (there is only one estimate per time)
\begin{equation}
\mathbf{C}=
\begin{bmatrix}
V_{1} &0 &C_{1,3} &0 \\
0 &V_{2} &0 &C_{2,4} \\
C_{3,1} &0 &V_{3} &0 \\
0 &C_{4,2} &0 &V_{4}
\end{bmatrix}
\;.
(\#eq:CmatrixSA)
\end{equation}
In practice, matrix $\mathbf{C}$ in Equation \@ref(eq:DBGLS) is unknown and is replaced by a matrix with design-based estimates of the variances and covariances of the elementary estimators. With simple random sampling with replacement from finite populations and simple random sampling of infinite populations, the variance of an elementary estimator of the spatial mean at a given time can be estimated by Equation \@ref(eq:EstVarMeanSIR). The covariance of the elementary estimators of the spatial means at two sampling times, using the data of the same panel, can be estimated by
\begin{equation}
\widehat{C}_{ab} = \frac{\widehat{S^2}_{ab}}{m} \;,
(\#eq:covartwoelementaryestimates)
\end{equation}
with $m$ the number of sampling locations in the panel and $\widehat{S^2}_{ab}$ the estimated covariance of the study variable at times $t_a$ and $t_b$, estimated by
\begin{equation}
\widehat{S^2}_{ab} = \frac{1}{m-1}\sum_{k=1}^m (z_{apk}-\hat{\bar{z}}_{ap})(z_{bpk}-\hat{\bar{z}}_{bp}) \;,
(\#eq:S2ab)
\end{equation}
with $z_{apk}$ the study variable of unit $k$ in panel $p$ at time $t_a$ and $\hat{\bar{z}}_{ap}$ the spatial mean at time $t_a$ as estimated from panel $p$.
The variances and covariances of the GLS estimators of the spatial means at the $R$ sampling times can be estimated by
\begin{equation}
\widehat{\mathbf{C}}(\hat{\mathbf{z}}_{\mathrm{GLS}}) = (\mathbf{X}^{\text{T}}\widehat{\mathbf{C}}^{-1}\mathbf{X})^{-1}
(\#eq:CovGLS) \;.
\end{equation}
Given the design-based GLS estimates of the spatial means at the different times, it is an easy job to compute the estimated change of the mean between two surveys, the estimated temporal trend of the mean, and the estimated space-time mean.
### Current mean
As explained above, with the SS, IS, and SA designs, the design matrix $\mathbf{X}$ is the identity matrix of size $R$. From this it follows that for these space-time designs $\hat{\mathbf{z}}_{\mathrm{GLS}}=\hat{\mathbf{z}}$, see Equation \@ref(eq:DBGLS), and $\mbox{Cov}(\hat{\mathbf{z}}_{\mathrm{GLS}})=\mbox{Cov}(\hat{\mathbf{z}})=\mathbf{C}$. In words, the GLS estimator of the spatial mean at a given time equals the usual $\pi$ estimator of the mean at that time, and the variance-covariance matrix of the GLS estimators of the spatial means equals the variance-covariance matrix of the $\pi$ estimators.
With SP and RP sampling in space-time, there is partial overlap between the samples at the different times, and so the samples at the previous times can be used to increase the precision of the estimated mean at the last time (current mean). The estimated current mean is simply the last element in $\hat{\mathbf{z}}_{\mathrm{GLS}}$. The estimated variance of the estimator of the current mean is the element in the final row and final column of the variance-covariance matrix of the estimated spatial means (Equation \@ref(eq:CovGLS)). These two space-time designs with partial replacement of sampling locations may yield a more precise estimate of the current mean than the other three space-time designs.
<!--@coc77 presents a formula for the optimal matching proportion, but this formula is not for the GLS estimator of the current mean, and for two sampling times only, see also Equation (15.15) in @gru06.-->
### Change of the spatial mean
The change of the spatial mean between two sampling times can simply be estimated by subtracting the estimated spatial means at these two times. With the SS, IS, and SA designs, these means are estimated by the $\pi$ estimators. With the SP and RP designs, the two spatial means are estimated by the GLS estimators. The variance of the estimator of the change can be estimated by the sum of the estimated variances of the spatial mean estimators at the two sampling times, minus two times the estimated covariance of the two estimators. The covariance is maximal when all sampling locations are revisited, leading to the most precise estimate of the change of the spatial mean. With SP and RP sampling, the covariance of the two spatial mean estimators is smaller, and so the variance of the change estimator is larger.
<!--
Derivation of the covariance of the difference of two $\pi$ estimators of the mean from two SI samples with partial overlap. Let $\mathcal{S}_{1u}}$ and $\mathcal{S}_{2u}}$ be the unmatched sample of size $n-m$ at time 1 and 2, respectively, and $\mathcal{S}_{m}}$ the matched sample of size $m$ with observations at both times.
$$
\bar{z}_1 = \frac{\sum_{k \in \mathcal{S}_{1u}}z_{1k}}{n} + \frac{\sum_{k \in \mathcal{S}_{m}}z_{1k}}{n}\\
\bar{z}_2 = \frac{\sum_{k \in \mathcal{S}_{2u}}z_{2k}}{n} + \frac{\sum_{k \in \mathcal{S}_{m}}z_{2k}}{n}\\
\text{cov}(\bar{z}_1,\bar{z}_2) = \text{cov}\left(\frac{\sum_{k \in \mathcal{S}_{m}}z_{1k}}{n},\frac{\sum_{k \in \mathcal{S}_{m}}z_{2k}}{n}\right)=\frac{mS^2(z_1,z_2)}{n^2}
$$
-->
### Temporal trend of the spatial mean
The temporal trend of the mean is estimated by the weighted average of the (GLS) estimated means at $t_1,\dots ,t_R$, with weights equal to Equation \@ref(eq:weightsTrendofMean):
\begin{equation}
\hat{b}=\sum_{j=1}^R w_j \hat{\bar{z}}_{\mathrm{GLS},j} \;.
(\#eq:EstimatorTrendofMean)
\end{equation}
The variance of the estimator of the temporal trend can be estimated by
\begin{equation}
\widehat{V}(\hat{b}) = \mathbf{w}^{\mathrm{T}}\widehat{\mathbf{C}}(\hat{\mathbf{z}}_{\mathrm{GLS}}) \mathbf{w}\;.
(\#eq:VarEstimatorTrendofMean)
\end{equation}
@Brus2011 compared the space-time designs for estimating the temporal trend of the spatial means, under a first order autoregressive time-series model of the spatial means. The SS design performed best when the correlation is strong, say $>0.8$. What is the best design depends amongst others on the strength of the correlation and the number of sampling times. A safe choice is an SA design. With strong positive correlation, say $>0.9$, the SS design can be a good choice, but remarkably for weak positive correlation this design performed relatively poorly.
For an application of an SP design to estimate the temporal trend of the areal fractions of several vegetation types, see @Brus2014c. Sampling locations are selected by a design that spreads the locations evenly over the study area (systematic unaligned sampling, which is a modified version of systematic sampling).
### Space-time mean
The space-time mean, as defined in Equation \@ref(eq:SpacetimeMeanDiscrete), can be estimated by the average of the (GLS) estimated spatial means at the $R$ sampling times. The variance of this estimator can be estimated by Equation \@ref(eq:VarEstimatorTrendofMean) using constant weights, equal to $1/R$, for the $R$ estimators of the mean. An IS design yields the most precise estimate of the space-time mean compared to the other space-time designs when the correlation of collocated measurements of the study variable is positive [@coc77]. With the other space-time designs, there is redundant information due to the revisits.
For design-based estimation of the space-time mean of a continuous temporal universe as defined in Equation \@ref(eq:SpacetimeMeanContinuous), both the sampling locations and the sampling times must be selected by probability sampling. Figure \@ref(fig:SpaceTimeDesignProb) shows an SS sample and an IS sample of twenty locations and six times, where both locations and times are selected by simple random sampling.
```{r SpaceTimeDesignProb, echo=FALSE, out.width="100%", fig.asp= 0.5, fig.cap="Space-time designs for estimating the space-time mean of a continuous temporal universe. Both locations and times are selected by simple random sampling. SS: static-synchronous design; IS: independent synchronous design."}
#pure panel
set.seed(314)
Space <- sample(1000, size = 20, replace = FALSE)
Time <- sample(365, size = 6, replace = FALSE)
mysampleSS <- data.frame(
Space = rep(x = Space, times = length(Time)),
Time = rep(Time, each = length(Space)),
Panel = rep("a", length(Time) * length(Space))
)
mysampleSS$design <- "SS"
#independent synchronous
Space <- sample(1000, size = length(Space) * length(Time), replace = FALSE)
mysampleIS <- data.frame(
Space,
Time = rep(Time, each = length(Space) / length(Time))
)
mysampleIS$Panel <- rep(x = c("a", "b", "c", "d", "e", "f"), each = length(Space) / length(Time))
mysampleIS$design <- "IS"
mysample <- rbind(mysampleSS, mysampleIS)
mysample$design <- factor(x = mysample$design, levels = c("SS", "IS"), ordered = TRUE)
# create plot
ggplot(data = mysample) +
geom_point(mapping = aes(x = Space, y = Time, shape = factor(Panel), colour = factor(Panel)), size = 2) +
scale_x_continuous(limits = c(0, 1000)) +
scale_y_continuous(limits = c(0, 365)) +
scale_shape(name = "Panel", solid = FALSE) +
scale_colour_viridis_d(name = "Panel") +
facet_wrap(~design, nrow = 3, ncol = 2)
```
An IS sample in which both locations and times are selected by probability sampling can be considered as a two-stage cluster random sample from a space-time universe (Chapter \@ref(Twostage)). The primary sampling units (PSUs) are the spatial sections of that universe (horizontal lines in Figure \@ref(fig:SpaceTimeDesignProb)), the secondary sampling units (SSUs) are the sampling locations. The space-time mean can therefore be estimated by Equation \@ref(eq:EstMeanTwostage), and its variance by Equation \@ref(eq:VarEstMeanTwostage). For simple random sampling, both in space and in time, and a linear costs model $C = c_0 + c_1n + c_2nm$ Equations \@ref(eq:nopt) and \@ref(eq:mopt) can be used to optimise the number of sampling times (PSUs, $n$) and the number of sampling locations (SSUs, $m$) per time. The pooled within-unit variance, $S^2_\text{w}$, is in this case the time-averaged spatial variance of the study variable at a given time, and the between-unit variance, $S^2_\text{b}$, is the variance of the spatial means over time.
Estimation of the space-time mean from an SS sample in which both locations and times are selected by probability sampling is the same as for an IS sample. However, estimation of the variance of the space-time mean estimator is more complicated. For the variance of the estimator of the space-time mean with an SS space-time design and simple random sampling of both locations and times, see equation (15.8) in @gru06. Due to the two-fold alignment of the sampling units, no unbiased estimator of the variance is available. The variance estimator of two-stage cluster sampling can be used to approximate the variance, but this variance estimator does not account for a possible temporal correlation of the estimated spatial means, resulting in an underestimation of the variance.
For an application of an IS design, to estimate the space-time mean of nutrients (nitrogen and phosphorous) in surface waters, see @bru08b and @Knotters2010d. In both applications, sampling times are selected by stratified simple random sampling, with periods of two months as strata. In @bru08b, sampling locations are selected by stratified simple random sampling as well.
## Case study: annual mean daily temperature in Iberia
The space-time designs are illustrated with the annual mean air temperature at two metres above the earth surface (TAS) in Iberia for 2004, 2009, 2014, and 2019 (Subsection \@ref(TASIberia)).
Sampling locations are selected by simple random sampling. The sample size is 100 locations per survey; so with four surveys we have 400 observations in total, but not with all space-time designs at 400 different sampling locations. In the next sections, I will show how a space-time sample with a given space-time design can be selected, and how the population parameters described in Section \@ref(SpaceTimeparameters) can be estimated.
### Static-synchronous design
Selecting an SS space-time sample, using simple random sampling as a spatial design, is straightforward. We can simply select a single simple random sample of size $n=100$. The locations of this sample are observed at all times (Figure \@ref(fig:sampleIberiaSS)).
```{r}
grd <- grdIberia %>%
mutate(x = x / 1000,
y = y / 1000)
set.seed(314)
n <- 100
units <- sample(nrow(grd), size = n, replace = TRUE)
mysample_SS <- grd[units, ]
```
```{r sampleIberiaSS, echo = FALSE, out.width = '100%', fig.cap = "Static-synchronous space-time sample from Iberia. Locations are selected by simple random sampling."}
df_lf <- grd %>% pivot_longer(cols = c("TAS2004", "TAS2009", "TAS2014", "TAS2019"))
ggplot(df_lf) +
geom_raster(mapping = aes(x = x, y = y, fill = value)) +
geom_point(mysample_SS, mapping = aes(x = x, y = y)) +
scale_fill_viridis_c(name = "TAS") +
scale_x_continuous(name = "Easting (km)", breaks = seq(from = 2700, to = 3700, by = 200) )+
scale_y_continuous(name = "Northing (km)") +
facet_wrap(~ name, ncol = 2, nrow = 2) +
coord_fixed()
rm(df_lf)
```
The spatial means at the four times can be estimated by the sample means (Equation \@ref(eq:HTMeanSI)), the variances of the mean estimators by Equation \@ref(eq:EstVarMeanSIR). The covariances of the estimators of the means at two different times can be estimated by Equation \@ref(eq:covartwoelementaryestimates) with $m$ equal to the sample size $n$. The estimated current mean (the spatial mean at the fourth survey) and the estimated standard error of the estimator of the current mean can be simply extracted from the vector with estimated means and from the matrix with estimated variances and covariances.
```{r}
mz <- apply(mysample_SS[, -c(1, 2)], MARGIN = 2, FUN = mean)
C <- var(mysample_SS[, -c(1, 2)]) / n
#current mean
mz_cur_SS <- mz[4]
se_mz_cur_SS <- sqrt(C[4, 4])
```
The change of the spatial mean from the first to the fourth survey can simply be estimated by subtracting the estimated spatial mean of the first survey from the estimated mean of the fourth survey. The standard error of this estimator can be estimated by the sum of the estimated variances of the two spatial mean estimators, minus two times the estimated covariance of the two mean estimators, and finally taking the square root.
```{r}
d_mz_SS <- mz[4] - mz[1]
se_d_mz_SS <- sqrt(C[4, 4] + C[1, 1] - 2 * C[1, 4])
```
The same estimates are obtained by defining a weight vector with values -1 and 1 for the first and last element, respectively, and 0 for the other two elements.
```{r}
w <- c(-1, 0, 0, 1)
d_mz_SS <- t(w) %*% mz
se_d_mz_SS <- sqrt(t(w) %*% C %*% w)
```
<!--The estimated change equals `r formatC(d_mz_SS, 3, format = "f")` $^\circ$C, and the estimated standard error equals `r formatC(se_d_mz_SS, 4, format = "f")` $^\circ$C.-->
The temporal trend of the spatial means can be estimated much in the same way, but using a different vector with weights (Equation \@ref(eq:weightsTrendofMean)).
```{r}
t <- seq(from = 2004, to = 2019, by = 5)
w <- (t - mean(t)) / sum((t - mean(t))^2)
mz_trend_SS <- t(w) %*% mz
se_mz_trend_SS <- sqrt(t(w) %*% C %*% w)
```
The estimated temporal trend equals `r formatC(mz_trend_SS, 4, format = "f")`$^\circ$C $y^{-1}$, and the estimated standard error equals `r se_mz_trend_SS`$^\circ$C $y^{-1}$. Using `t = 1:4` yields the estimated average change in annual mean temperature *per five years*.
```{r}
t <- 1:4
w <- (t - mean(t)) / sum((t - mean(t))^2)
print(mz_trend_SS <- t(w) %*% mz)
```
Using a constant weight vector with values 1/4 yields the estimated space-time mean and the standard error of the space-time mean estimator.
```{r}
w <- rep(1 / 4, 4)
mz_st_SS <- t(w) %*% mz
se_mz_st_SS <- sqrt(t(w) %*% C %*% w)
```
<!-- The estimated space-time mean equals `r mz_st_SS` and the estimated standard error equals `r se_mz_st_SS`.-->
### Independent synchronous design
To select an IS space-time sample with four sampling times, we simply select $4n=400$ sampling locations. Figure \@ref(fig:sampleIberiaIS) shows the selected IS space-time sample. A variable identifying the panel of the selected sampling units is added to the data frame.
```{r}
units <- sample(nrow(grd), size = 4 * n, replace = TRUE)
mysample_IS <- grd[units, ]
mysample_IS$panel <- rep(c("a", "b", "c", "d"), each = n)
```
```{block2, type = 'rmdnote'}
The units of the finite population are selected by simple random sampling *with replacement*. As a consequence, there can be partial overlap between the samples at the different times, i.e., some units are observed at multiple times. However, this overlap is by chance; it is not coordinated as in an SP and RP design. By selecting the units with replacement, the estimators of the spatial means are independent (covariance of estimators equals zero). For infinite populations such as points in an area, there will be no overlap, so that the covariance of the estimators equals zero.
```
```{r sampleIberiaIS, echo = FALSE, fig.asp = 0.8, out.width = "100%", fig.cap = "Independent synchronous space-time sample from Iberia. Locations are selected by simple random sampling."}
plt1 <- ggplot(grd) +
geom_raster(mapping = aes(x = x, y = y, fill = TAS2004)) +
geom_point(mysample_IS[mysample_IS$panel == "a", ], mapping = aes(x = x, y = y), shape = 1) +
geom_text(data = NULL, x = 3600, y = 2400, label = "2004") +
scale_fill_viridis_c(name = "T2004", limits = c(-2, 20)) +
scale_x_continuous(name = "Easting (km)") +
scale_y_continuous(name = "Northing (km)") +
coord_fixed() +
theme(legend.position = "none")
plt2 <- ggplot(grd) +
geom_raster(mapping = aes(x = x, y = y, fill = TAS2009)) +
geom_point(mysample_IS[mysample_IS$panel == "b", ], mapping = aes(x = x, y = y), shape = 2) +
geom_text(data = NULL, x = 3600, y = 2400, label = "2009") +
scale_fill_viridis_c(name = "T2009", limits = c(-2, 20)) +
scale_x_continuous(name = "Easting (km)") +
scale_y_continuous(name = "Northing (km)") +
coord_fixed() +
theme(legend.position = "none")
plt3 <- ggplot(grd) +
geom_raster(mapping = aes(x = x, y = y, fill = TAS2014)) +
geom_point(mysample_IS[mysample_IS$panel == "c", ], mapping = aes(x = x, y = y), shape = 3) +
geom_text(data = NULL, x = 3600, y = 2400, label = "2014") +
scale_fill_viridis_c(name = "T2014", limits = c(-2, 20)) +
scale_x_continuous(name = "Easting (km)") +
scale_y_continuous(name = "Northing (km)") +
coord_fixed() +
theme(legend.position = "none")
plt4 <- ggplot(grd) +
geom_raster(mapping = aes(x = x, y = y, fill = TAS2019)) +
geom_point(mysample_IS[mysample_IS$panel == "d", ], mapping = aes(x = x, y = y), shape = 4) +
geom_text(data = NULL, x = 3600, y = 2400, label = "2019") +
scale_fill_viridis_c(name = "T2019", limits = c(-2, 20)) +
scale_x_continuous(name = "Easting (km)") +
scale_y_continuous(name = "Northing (km)") +
coord_fixed() +
theme(legend.position = "none")
grid.arrange(plt1, plt2, plt3, plt4, nrow = 2, ncol = 2)
rm(plt1, plt2, ply3, plt4)
```
The spatial means are estimated with the $\pi$ estimator as there is no partial overlap of the spatial samples. All covariances of the mean estimators are zero. Estimation of the space-time parameters is done as before with the SS space-time sample. Four data frames of $n$ rows are first made with the data observed in a specific panel. The variables of these four data frames are joined into a single data frame. The spatial means at the four times are then estimated by the sample means, computed with function `apply`.
```{r}
panel_a <- filter(mysample_IS, panel == "a") %>% dplyr::select(TAS2004)
panel_b <- filter(mysample_IS, panel == "b") %>% dplyr::select(TAS2009)
panel_c <- filter(mysample_IS, panel == "c") %>% dplyr::select(TAS2014)
panel_d <- filter(mysample_IS, panel == "d") %>% dplyr::select(TAS2019)
panel_abcd <- cbind(panel_a, panel_b, panel_c, panel_d)
mz <- apply(panel_abcd, MARGIN = 2, FUN = mean)
C <- var(panel_abcd) / n
C[row(C) != col(C)] <- 0
#current mean
mz_cur_IS <- mz[4]
se_mz_cur_IS <- sqrt(C[4, 4])
#change of mean
w <- c(-1, 0, 0, 1)
d_mz_IS <- t(w) %*% mz
se_d_mz_IS <- sqrt(t(w) %*% C %*% w)
#trend of mean
w <- (t - mean(t)) / sum((t - mean(t))^2)
mz_trend_IS <- t(w) %*% mz
se_mz_trend_IS <- sqrt(t(w) %*% C %*% w)
#space-time mean
w <- rep(1 / 4, 4)
mz_st_IS <- t(w) %*% mz
se_mz_st_IS <- sqrt(t(w) %*% C %*% w)
```
### Serially alternating design
To select an SA space-time sample with a period of two, $2n=200$ sampling locations are selected. A variable is added to `mysample_SA` indicating the panel of the sampling locations. Figure \@ref(fig:sampleIberiaSA) shows the selected SA space-time sample.
```{r}
units <- sample(nrow(grd), size = 2 * n, replace = TRUE)
mysample_SA <- grd[units, ]
mysample_SA$panel <- rep(c("a", "b"), each = n)
```
```{r sampleIberiaSA, echo = FALSE, fig.asp = 0.8, out.width = '100%', fig.cap = "Serially alternating space-time sample from Iberia. Locations are selected by simple random sampling."}
plt1 <- ggplot(grd) +
geom_raster(mapping = aes(x = x, y = y, fill = TAS2004)) +
geom_point(mysample_SA[mysample_SA$panel == "a", ], mapping = aes(x = x, y = y), shape = 1) +
geom_text(data = NULL, x = 3600, y = 2400, label = "2004") +
scale_fill_viridis_c(name = "T2004", limits = c(-2, 20)) +
scale_x_continuous(name = "Easting (km)") +
scale_y_continuous(name = "Northing (km)") +
coord_fixed() +
theme(legend.position = "none")
plt2 <- ggplot(grd) +
geom_raster(mapping = aes(x = x, y = y, fill = TAS2009)) +
geom_point(mysample_SA[mysample_SA$panel == "b", ], mapping = aes(x = x, y = y), shape = 2) +
geom_text(data = NULL, x = 3600, y = 2400, label = "2009") +
scale_fill_viridis_c(name = "T2009", limits = c(-2, 20)) +
scale_x_continuous(name = "Easting (km)") +
scale_y_continuous(name = "Northing (km)") +
coord_fixed() +
theme(legend.position = "none")
plt3 <- ggplot(grd) +
geom_raster(mapping = aes(x = x, y = y, fill = TAS2014)) +
geom_point(mysample_SA[mysample_SA$panel == "a", ], mapping = aes(x = x, y = y), shape = 1) +
geom_text(data = NULL, x = 3600, y = 2400, label = "2014") +
scale_fill_viridis_c(name = "T2014", limits = c(-2, 20)) +
scale_x_continuous(name = "Easting (km)") +
scale_y_continuous(name = "Northing (km)") +
coord_fixed() +
theme(legend.position = "none")
plt4 <- ggplot(grd) +
geom_raster(mapping = aes(x = x, y = y, fill = TAS2019)) +
geom_point(mysample_SA[mysample_SA$panel == "b", ], mapping = aes(x = x, y = y), shape = 2) +
geom_text(data = NULL, x = 3600, y = 2400, label = "2019") +
scale_fill_viridis_c(name = "T2019", limits = c(-2, 20)) +
scale_x_continuous(name = "Easting (km)") +
scale_y_continuous(name = "Northing (km)") +
coord_fixed() +
theme(legend.position = "none")
grid.arrange(plt1, plt2, plt3, plt4, nrow = 2)
rm(plt1, plt2, ply3, plt4)
```
In SA sampling there is no partial overlap of the spatial samples at different times: there is either no coordinated overlap or full overlap. Spatial means therefore can be estimated by the usual $\pi$ estimator. Two data frames of $n$ rows are first made with the data observed in a specific panel. The variables of these two data frames are joined into a single data frame. The spatial means at the four times are then estimated by the sample means, computed with function `apply`.
```{r}
panel_a <- filter(mysample_SA, panel == "a") %>% dplyr::select(TAS2004, TAS2014)
panel_b <- filter(mysample_SA, panel == "b") %>% dplyr::select(TAS2009, TAS2019)
panel_ab <- cbind(panel_a[1], panel_b[1], panel_a[2], panel_b[2])
mz <- apply(panel_ab, MARGIN = 2, FUN = mean)
```
To compute the matrix with estimated variances and covariances of the spatial mean estimators, first the full matrix is computed. Then the estimated covariances of spatial means of consecutive surveys are replaced by zeroes as the samples of these consecutive surveys are selected independently from each other, so that the two estimators are independent. Estimation of the space-time parameters is done as before with the SS space-time sample.
```{r}
C <- var(panel_ab) / n
odd <- c(1, 3)
C[row(C) %in% odd & !(col(C) %in% odd)] <- 0
C[!(row(C) %in% odd) & col(C) %in% odd] <- 0
#current mean
mz_cur_SA <- mz[4]
se_mz_cur_SA <- sqrt(C[4, 4] / n)
#change of mean from time 1 to time 4
w <- c(-1, 0, 0, 1)
d_mz_SA <- t(w) %*% mz
se_d_mz_SA <- sqrt(t(w) %*% C %*% w)
#trend of mean
w <- (t - mean(t)) / sum((t - mean(t))^2)
mz_trend_SA <- t(w) %*% mz
se_mz_trend_SA <- sqrt(t(w) %*% C %*% w)
#space-time mean
mz_st_SA <- mean(mz)
w <- rep(1 / 4, 4)
se_mz_st_SA <- sqrt(t(w) %*% C %*% w)
```
### Supplemented panel design
With SP sampling and four sampling times we have five panels: one panel with fixed sampling locations and four panels with swarming locations (Figure \@ref(fig:SpaceTimeDesigns)). Each panel consists of $n/2$ sampling locations, so in total $5n/2=250$ locations are selected. A variable indicating the panel is added to the data frame with the selected sampling locations. Figure \@ref(fig:sampleIberiaSP) shows the selected SP sample.
```{r}
units <- sample(nrow(grd), size = 5 * n / 2, replace = TRUE)
mysample_SP <- grd[units, ]
mysample_SP$panel <- rep(c("a", "b", "c", "d", "e"), each = n / 2)
```
```{r sampleIberiaSP, echo = FALSE, fig.asp = 0.8, out.width = "100%", fig.cap = "Supplemented panel space-time sample from Iberia. Locations are selected by simple random sampling. The permanent sampling locations are indicated by the filled dots, the swarming locations by the other symbols."}
plt1 <- ggplot(grd) +
geom_raster(mapping = aes(x = x, y = y, fill = TAS2004)) +
geom_point(mysample_SP[mysample_SP$panel == "a", ], mapping = aes(x = x, y = y)) +
geom_point(mysample_SP[mysample_SP$panel == "b", ], mapping = aes(x = x, y = y), shape = 2) +
geom_text(data = NULL, x = 3600, y = 2400, label = "2004") +
scale_fill_viridis_c(name = "T2004", limits = c(-2, 20)) +
scale_x_continuous(name = "Easting (km)") +
scale_y_continuous(name = "Northing (km)") +
coord_fixed() +
theme(legend.position = "none")
plt2 <- ggplot(grd) +
geom_raster(mapping = aes(x = x, y = y, fill = TAS2009)) +
geom_point(mysample_SP[mysample_SP$panel == "a", ], mapping = aes(x = x, y = y)) +
geom_point(mysample_SP[mysample_SP$panel == "c", ], mapping = aes(x = x, y = y), shape = 3) +
geom_text(data = NULL, x = 3600, y = 2400, label = "2009") +
scale_fill_viridis_c(name = "T2009", limits = c(-2, 20)) +
scale_x_continuous(name = "Easting (km)") +
scale_y_continuous(name = "Northing (km)") +
coord_fixed() +
theme(legend.position = "none")
plt3 <- ggplot(grd) +
geom_raster(mapping = aes(x = x, y = y, fill = TAS2014)) +
geom_point(mysample_SP[mysample_SP$panel == "a", ], mapping = aes(x = x, y = y)) +
geom_point(mysample_SP[mysample_SP$panel == "d", ], mapping = aes(x = x, y = y), shape = 4) +
geom_text(data = NULL, x = 3600, y = 2400, label = "2014") +
scale_fill_viridis_c(name = "T2014", limits = c(-2, 20)) +
scale_x_continuous(name = "Easting (km)") +
scale_y_continuous(name = "Northing (km)") +
coord_fixed() +
theme(legend.position = "none")
plt4 <- ggplot(grd) +
geom_raster(mapping = aes(x = x, y = y, fill = TAS2019)) +
geom_point(mysample_SP[mysample_SP$panel == "a", ], mapping = aes(x = x, y = y)) +
geom_point(mysample_SP[mysample_SP$panel == "e", ], mapping = aes(x = x, y = y), shape = 5) +
geom_text(data = NULL, x = 3600, y = 2400, label = "2019") +
scale_fill_viridis_c(name = "T2019", limits = c(-2, 20)) +
scale_x_continuous(name = "Easting (km)") +
scale_y_continuous(name = "Northing (km)") +
coord_fixed() +
theme(legend.position = "none")
grid.arrange(plt1, plt2, plt3, plt4, nrow = 2)
rm(plt1, plt2, ply3, plt4)
```
With SP sampling, the spatial means are estimated by the design-based GLS estimator (Equation \@ref(eq:DBGLS)). As a first step, the eight elementary estimates (two per sampling time) are computed and collected in a vector. Note the order of the elementary estimates: first the estimated spatial means of 2004, 2009, 2014, and 2019, estimated from the panel with fixed sampling locations, then the spatial means estimated from the panels with swarming locations. The design matrix $\mathbf{X}$ corresponding to this order of elementary estimates is constructed.
```{block2, type = 'rmdnote'}
Ordering the elementary estimates by sampling time is also fine, but then the design matrix $\mathbf{X}$ should be adapted to this order.
```
```{r}
mz_1 <- tapply(
mysample_SP$TAS2004, INDEX = mysample_SP$panel, FUN = mean)[c(1, 2)]
mz_2 <- tapply(
mysample_SP$TAS2009, INDEX = mysample_SP$panel, FUN = mean)[c(1, 3)]
mz_3 <- tapply(
mysample_SP$TAS2014, INDEX = mysample_SP$panel, FUN = mean)[c(1, 4)]
mz_4 <- tapply(
mysample_SP$TAS2019, INDEX = mysample_SP$panel, FUN = mean)[c(1, 5)]
mz_el <- c(mz_1["a"], mz_2["a"], mz_3["a"], mz_4["a"],
mz_1["b"], mz_2["c"], mz_3["d"], mz_4["e"])
X <- rbind(diag(4), diag(4))
```
The variances and covariances of the elementary estimates of the panel with fixed locations (SS subsample) are estimated, as well as the variances of the elementary estimates of the panels with swarming locations (IS subsample). The two variance-covariance matrices and a 4 $\times$ 4 submatrix with all zeroes are combined into a single matrix, see matrix \@ref(eq:CmatrixSP).
```{r}
C_SS <- filter(mysample_SP, panel == "a") %>%
dplyr::select(TAS2004, TAS2009, TAS2014, TAS2019) %>%
var() / (n / 2)
C_IS <- cbind(mysample_SP$TAS2004[mysample_SP$panel == "b"],
mysample_SP$TAS2009[mysample_SP$panel == "c"],
mysample_SP$TAS2014[mysample_SP$panel == "d"],
mysample_SP$TAS2019[mysample_SP$panel == "e"]) %>%
var() / (n / 2)
C_IS[row(C_IS) != col(C_IS)] <- 0
zeroes <- matrix(0, nrow = 4, ncol = 4)
C <- rbind(cbind(C_SS, zeroes), cbind(zeroes, C_IS))
```
The design-based GLS estimates of the spatial means can be computed as follows, see Equation \@ref(eq:DBGLS).
```{r}
XCXinv <- solve(crossprod(X, solve(C, X)))
XCz <- crossprod(X, solve(C, mz_el))
mz_GLS <- XCXinv %*% XCz
```
Computing the estimated space-time parameters from the design-based GLS estimated spatial means is straightforward.
```{r}
#current mean
mz_cur_SP <- mz_GLS[4]
se_mz_cur_SP <- sqrt(XCXinv[4, 4])
#change of mean
w <- c(-1, 0, 0, 1)
d_mz_SP <- t(w) %*% mz_GLS
se_d_mz_SP <- sqrt(t(w) %*% XCXinv %*% w)
#trend of mean
w <- (t - mean(t)) / sum((t - mean(t))^2)
mz_trend_SP <- t(w) %*% mz_GLS
se_mz_trend_SP <- sqrt(t(w) %*% XCXinv %*% w)
#space-time mean
w <- rep(1 / 4, 4)
mz_st_SP <- t(w) %*% mz_GLS
se_mz_st_SP <- sqrt(t(w) %*% XCXinv %*% w)
```
### Rotating panel design
Similar to the SP design, with an in-for-two RP design and four sampling times we have five panels, each consisting of $n/2$ sampling locations, so that in total $5n/2=250$ locations are selected. Figure \@ref(fig:sampleIberiaRP) shows the selected space-time sample.
```{r}
units <- sample(nrow(grd), size = 5 * n / 2, replace = TRUE)
mysample_RP <- grd[units, ]
mysample_RP$panel <- rep(c("a", "b", "c", "d", "e"), each = n / 2)
```
```{r sampleIberiaRP, echo = FALSE, fig.asp = 0.8, out.width = "100%", fig.cap = "In-for-two rotating panel space-time sample from Iberia. Locations are selected by simple random sampling. Half of the sampling locations observed in a given year are also observed in the consecutive survey."}
plt1 <- ggplot(grd) +
geom_raster(mapping = aes(x = x, y = y, fill = TAS2004)) +
geom_point(mysample_RP[mysample_RP$panel == "a", ], mapping = aes(x = x, y = y), shape = 1) +
geom_point(mysample_RP[mysample_RP$panel == "b", ], mapping = aes(x = x, y = y), shape = 2) +
geom_text(data = NULL, x = 3600, y = 2400, label = "2004") +
scale_fill_viridis_c(name = "T2004", limits = c(-2, 20)) +
scale_x_continuous(name = "Easting (km)") +
scale_y_continuous(name = "Northing (km)") +
coord_fixed() +
theme(legend.position = "none")
plt2 <- ggplot(grd) +
geom_raster(mapping = aes(x = x, y = y, fill = TAS2009)) +
geom_point(mysample_RP[mysample_RP$panel == "b", ], mapping = aes(x = x, y = y), shape = 2) +
geom_point(mysample_RP[mysample_RP$panel == "c", ], mapping = aes(x = x, y = y), shape = 3) +
geom_text(data = NULL, x = 3600, y = 2400, label = "2009") +
scale_fill_viridis_c(name = "T2009", limits = c(-2, 20)) +
scale_x_continuous(name = "Easting (km)") +
scale_y_continuous(name = "Northing (km)") +
coord_fixed() +
theme(legend.position = "none")
plt3 <- ggplot(grd) +
geom_raster(mapping = aes(x = x, y = y, fill = TAS2014)) +
geom_point(mysample_RP[mysample_RP$panel == "c", ], mapping = aes(x = x, y = y), shape = 3) +
geom_point(mysample_RP[mysample_RP$panel == "d", ], mapping = aes(x = x, y = y), shape = 4) +
geom_text(data = NULL, x = 3600, y = 2400, label = "2014") +
scale_fill_viridis_c(name = "T2014", limits = c(-2, 20)) +
scale_x_continuous(name = "Easting (km)") +
scale_y_continuous(name = "Northing (km)") +
coord_fixed() +
theme(legend.position = "none")
plt4 <- ggplot(grd) +
geom_raster(mapping = aes(x = x, y = y, fill = TAS2019)) +
geom_point(mysample_RP[mysample_RP$panel == "d", ], mapping = aes(x = x, y = y), shape = 4) +
geom_point(mysample_RP[mysample_RP$panel == "e", ], mapping = aes(x = x, y = y), shape = 5) +
geom_text(data = NULL, x = 3600, y = 2400, label = "2019") +
scale_fill_viridis_c(name = "T2019", limits = c(-2, 20)) +
scale_x_continuous(name = "Easting (km)") +
scale_y_continuous(name = "Northing (km)") +
coord_fixed() +
theme(legend.position = "none")
grid.arrange(plt1, plt2, plt3, plt4, nrow = 2)
rm(plt1, plt2, ply3, plt4)
```
Two elementary estimates per sampling time are computed and collected in a vector. Note that now the elementary estimates are ordered by sampling time. The design matrix corresponding to this order is constructed.
```{r}
mz_1 <- tapply(
mysample_RP$TAS2004, INDEX = mysample_RP$panel, FUN = mean)[c(1, 2)]
mz_2 <- tapply(
mysample_RP$TAS2009, INDEX = mysample_RP$panel, FUN = mean)[c(2, 3)]
mz_3 <- tapply(
mysample_RP$TAS2014, INDEX = mysample_RP$panel, FUN = mean)[c(3, 4)]
mz_4 <- tapply(
mysample_RP$TAS2019, INDEX = mysample_RP$panel, FUN = mean)[c(4, 5)]
mz_el <- c(mz_1, mz_2, mz_3, mz_4)
X <- matrix(c(rep(c(1, 0, 0, 0), 2),
rep(c(0, 1, 0, 0), 2),
rep(c(0, 0, 1, 0), 2),
rep(c(0, 0, 0, 1), 2)), nrow = 8, ncol = 4, byrow = TRUE)
```
To construct the variance-covariance matrix of the eight elementary estimates, first the 2 $\times$ 2 matrices with estimated variances and covariances are computed for panels b, c, and d. These three panels are used to estimate the spatial means of two consecutive years: the locations of panel b are used to estimate the spatial means for 2004 and 2009, panel c for 2009 and 2014, and panel d for 2014 and 2019 (Figure \@ref(fig:SpaceTimeDesigns)). These three matrices are copied into the 8 $\times$ 8 matrix, see matrix \@ref(eq:CmatrixRP). Finally, the data of panel a and e are used to compute the estimated variances of the estimators of the spatial means in 2004 and 2019, respectively, and copied to the upper left and lower right entry of the variance-covariance matrix, respectively.
```{r}
C_b <- filter(mysample_RP, panel == "b") %>% dplyr::select(TAS2004, TAS2009) %>%
var() / (n / 2)
C_c <- filter(mysample_RP, panel == "c") %>% dplyr::select(TAS2009, TAS2014) %>%
var() / (n / 2)
C_d <- filter(mysample_RP, panel == "d") %>% dplyr::select(TAS2014, TAS2019) %>%
var() / (n / 2)
C <- matrix(data = 0, ncol = 8, nrow = 8)
C[2:3, 2:3] <- C_b
C[4:5, 4:5] <- C_c
C[6:7, 6:7] <- C_d
C[1, 1] <- var(mysample_RP[mysample_RP$panel == "a", "TAS2004"]) / (n / 2)
C[8, 8] <- var(mysample_RP[mysample_RP$panel == "e", "TAS2019"]) / (n / 2)
```
The design-based GLS estimates of the spatial means for the four sampling times are computed as before, followed by computing the estimated space-time parameters.
```{r}
XCXinv <- solve(crossprod(X, solve(C, X)))
XCz <- crossprod(X, solve(C, mz_el))
mz_GLS <- XCXinv %*% XCz
#current mean
mz_cur_RP <- mz_GLS[4]
se_mz_cur_RP <- sqrt(XCXinv[4, 4])
#change of mean
w <- c(-1, 0, 0, 1)
d_mz_RP <- t(w) %*% mz_GLS
se_d_mz_RP <- sqrt(t(w) %*% XCXinv %*% w)
#trend of mean
w <- (t - mean(t)) / sum((t - mean(t))^2)
mz_trend_RP <- t(w) %*% mz_GLS
se_mz_trend_RP <- sqrt(t(w) %*% XCXinv %*% w)
#space-time mean
w <- rep(1 / 4, 4)
mz_st_RP <- t(w) %*% mz_GLS
se_mz_st_RP <- sqrt(t(w) %*% XCXinv %*% w)
```
```{r, echo = FALSE}
mz <- apply(grd[, -c(1, 2)], MARGIN = 2, FUN = mean)
mz_2019_pop <- mz[4]
d_mz_pop <- mz[4] - mz[1]
t <- 1:4
w <- (t - mean(t)) / sum((t - mean(t))^2)
mz_trend_pop <- t(w) %*% mz
mz_st <- mean(mz)
```
### Sampling experiment
The random sampling with the five space-time designs and estimation of the four space-time parameters is repeated 10,000 times. The standard deviations of the 10,000 estimates of a space-time parameter are shown in Table \@ref(tab:TableRepeatedEstimatesSpaceTimeParameters). Note that to determine the standard errors of the estimators of the space-time parameters for the SS, IS, and SA designs, a sampling experiment is not really needed. These can be computed without error because we have exhaustive knowledge of the study variable at all sampling times. However, for the SP and RP designs, a sampling experiment is needed to approximate the design-expectation of the standard error of the estimators of the space-time parameters. This is because the space-time parameters are derived from the GLS estimator of the spatial means, and the GLS estimator is a function of the *estimated* covariances of the elementary estimates (Equation \@ref(eq:DBGLS)). Using the true covariance of the elementary estimates in the GLS estimator, instead of the estimated covariance, leads to an underestimation of the standard error. Computing the true standard errors of the estimators of the space-time parameters for the various space-time designs is left as an exercise.
As a reference for the standard deviations in Table \@ref(tab:TableRepeatedEstimatesSpaceTimeParameters): the current mean (spatial mean air temperature in 2019) equals `r formatC(mz_2019_pop, 2, format = "f")`$^\circ$C, the change of the mean from 2004 to 2019 equals `r formatC(d_mz_pop, 3, format = "f")`$^\circ$C, the temporal trend equals `r formatC(mz_trend_pop, 3, format = "f")`$^\circ$C *per five years*, and the space-time mean equals `r formatC(mz_st, 2, format = "f")`$^\circ$C.
```{r, echo = FALSE, eval = FALSE}
R <- 10000
mz_cur_SS <- d_mz_SS <- mz_trend_SS <- mz_st_SS <- mz_cur_SA <- d_mz_SA <- mz_trend_SA <- mz_st_SA <- mz_cur_SP <- d_mz_SP <- mz_trend_SP <- mz_st_SP <- mz_cur_RP <- d_mz_RP <- mz_trend_RP <- mz_st_RP <- mz_cur_IS <- d_mz_IS <- mz_trend_IS <- mz_st_IS <- d_mz_SP_HT <- numeric(length = R)
odd <- c(1, 3)
set.seed(314)
n <- 100
for (i in 1:R) {
#Design SS
units <- sample(nrow(grd), size = n, replace = TRUE)
mysample_SS <- grd[units, ]
mz <- apply(mysample_SS[, -c(1, 2)], MARGIN = 2, FUN = mean)
C <- var(mysample_SS[, -c(1, 2)]) / n
#current mean
mz_cur_SS[i] <- mz[4]
#change of mean from time 1 to time 4
w <- c(-1, 0, 0, 1)
d_mz_SS[i] <- t(w) %*% mz
#trend of mean
t <- 1:4
w <- (t - mean(t)) / sum((t - mean(t))^2)
mz_trend_SS[i] <- t(w) %*% mz
#space-time mean
mz_st_SS[i] <- mean(mz)
#Design IS
units <- sample(nrow(grd), size = 4 * n, replace = TRUE)
mysample_IS <- grd[units, ]
mysample_IS$panel <- rep(c("a", "b", "c", "d"), each = n)
panel_a <- filter(mysample_IS, panel == "a")[, c("TAS2004")]
panel_b <- filter(mysample_IS, panel == "b")[, c("TAS2009")]
panel_c <- filter(mysample_IS, panel == "c")[, c("TAS2014")]
panel_d <- filter(mysample_IS, panel == "d")[, c("TAS2019")]
panel_abcd <- cbind(panel_a, panel_b, panel_c, panel_d)
mz <- apply(panel_abcd, MARGIN = 2, FUN = mean)
C <- var(panel_abcd) / n
C[row(C) != col(C)] <- 0
#current mean
mz_cur_IS[i] <- mz[4]
#change of mean
d_mz_IS[i] <- mz[4] - mz[1]
#trend of mean
t <- 1:4
w <- (t - mean(t)) / sum((t - mean(t))^2)
mz_trend_IS[i] <- t(w) %*% mz
#space-time mean
mz_st_IS[i] <- mean(mz)
#Design SA
units <- sample(nrow(grd), size = 2 * n, replace = TRUE)
mysample_SA <- grd[units, ]
mysample_SA$panel <- rep(c("a", "b"), each = n)
panel_a <- filter(mysample_SA, panel == "a")[, c("TAS2004", "TAS2014")]
panel_b <- filter(mysample_SA, panel == "b")[, c("TAS2009", "TAS2019")]
panel_ab <- cbind(panel_a[1], panel_b[1], panel_a[2], panel_b[2])
mz <- apply(panel_ab, MARGIN = 2, FUN = mean)
C <- var(panel_ab) / n
C[row(C) %in% odd & !(col(C) %in% odd)] <- 0
C[!(row(C) %in% odd) & col(C) %in% odd] <- 0
#current mean
mz_cur_SA[i] <- mz[4]
#change of mean from time 1 to time 4
w <- c(-1, 0, 0, 1)
d_mz_SA[i] <- t(w) %*% mz
#trend of mean
t <- 1:4
w <- (t - mean(t)) / sum((t - mean(t))^2)
mz_trend_SA[i] <- t(w) %*% mz
#space-time mean
mz_st_SA[i] <- mean(mz)
#Design SP
units <- sample(nrow(grd), size = 5 * n / 2, replace = TRUE)
mysample_SP <- grd[units, ]
mysample_SP$panel <- rep(c("a", "b", "c", "d", "e"), each = n / 2)
#compute eight elementary estimates