-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathcomplex_workflow.Rmd
222 lines (171 loc) · 10.8 KB
/
complex_workflow.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
---
title: "A Not so Simple Workflow"
author: "Simon Goring, Socorro Dominguez Vidaña"
date: "`r Sys.Date()`"
output:
html_document:
code_folding: show
fig_caption: yes
keep_md: yes
self_contained: yes
theme: readable
toc: yes
toc_float: yes
css: "text.css"
pdf_document:
pandoc_args: "-V geometry:vmargin=1in -V geometry:hmargin=1in"
dev: svg
highlight: tango
---
## Building New Chronologies
This RMarkdown document will walk you through the process of:
1. Downloading a single record
2. Examining the chronologies for that record and associated chronological controls
3. Creating a new chronology for the record
4. Adding the chronology to the record
5. Switching between default chronologies
This approach is focused on a single record, but much of what is done here can be extended to multiple records using functions.
## Load Libraries
For this workshop element we only need four packages, `neotoma2`, `dplyr`, `ggplot2` and `Bchron`. We'll be loading a record from Neotoma, building a new chronology for the record, and then adding the chronology back to the record.
We'll be using the R package `pacman` here (so really, we need five packages), to automatically load and install packages:
```{r setup}
pacman::p_load(neotoma2, dplyr, ggplot2, Bchron)
```
## Loading Datasets
We worked through the process for finding and downloading records using `neotoma2` in the [previous workshop](https://open.neotomadb.org/EPD_binder/simple_workflow.html). Assuming we found a record that we were interested in, we can go back and pull a single record using its `datasetid`. In this case, the dataset is for [Stará Boleslav](https://data.neotomadb.org/24238). Let's start by pulling in the record and using the `chronologies()` helper function to look at the chronologies associated with the record:
```{r getStara, message = FALSE}
stara <- get_downloads(24238)
stara_chron <- chronologies(stara)
stara_chron %>% as.data.frame() %>%
DT::datatable(data = .,
options = list(scrollX = "100%"))
```
There are three chronologies here, but for whatever reason we've decided not to use any of them. We want to build a new one with the function `Bchronology()` from the [`Bchron` package](https://cran.r-project.org/web/packages/Bchron/vignettes/Bchron.html). First we want to see what chroncontrols we have for the prior chronologies. We're going to select the chronologies used for chronology `14591` as our template.
### Extract `chroncontrols`
```{r buildChronControl, message = FALSE}
controls <- chroncontrols(stara) %>%
dplyr::filter(chronologyid == 14591) %>%
arrange(depth)
controls %>% DT::datatable(data = .,
options = list(scrollX = "100%"))
```
We can look at other tools to decided how we want to manage the chroncontrols, for example, saving them and editing them using Excel or another spreadsheet program. We could add a new date by adding a new row. In this example we're just going to modify the existing ages to provide better constraints at the core top. We are setting the core top to *0 calibrated years BP*, and assuming an uncertainty of 2 years, and a thickness of 1cm.
To do these assignments we're just directly modifying cells within the `controls` `data.frame`:
```{r modifyControls, message = FALSE}
controls$chroncontrolage[1] <- 0
controls$agelimityounger[1] <- -2
controls$agelimitolder[1] <- 2
controls$thickness[1] <- 1
controls %>% DT::datatable(data = .,
options = list(scrollX = "100%"))
```
### Extract Depth & Analysis Unit IDs
Once our `chroncontrols` table is updated, we extract the `depth`s and `analysisunitid`s from the dataset `samples()`. Pulling in both `depth`s and `analysisunitid`s is important because a single collection unit may have multiple datasets, which may have non-overlapping depth sequences. So, when adding sample ages back to a record we use the `analysisunitid` to make sure we are providing the correct assignment since depth may be specific to a single dataset.
```{r predictDepths, message = FALSE, results="hide"}
# Get a two column data.frame with columns depth and analysisunitid.
# Sort the table by depth from top to bottom for "Bchronology"
predictDepths <- samples(stara) %>%
select(depth, analysisunitid) %>%
unique() %>%
arrange(depth)
# Pass the values from `controls`. We're assuming the difference between
# chroncontrolage and the agelimityounger is 1 SD.
newChron <- Bchron::Bchronology(ages = controls$chroncontrolage,
ageSds = abs(controls$agelimityounger -
controls$chroncontrolage),
calCurves = c("normal", rep("intcal20", 4)),
positionThicknesses = controls$thickness,
positions = controls$depth,
allowOutside = TRUE,
ids = controls$chroncontrolid)
# Predict ages at each depth for which we have samples. Returns a matrix.
newpredictions <- predict(newChron, predictDepths$depth)
```
```{r chronologyPlot, fig.cap="Age-depth model for Stará Boleslav, with probability distributions superimposed on the figure at each chronology control depth."}
plot(newChron) +
ggplot2::labs(
xlab = "Age (cal years BP)",
ylab = "Depth (cm)"
)
```
### Creating the New `chronology` and `contact` objects
Given the new chronology, we want to add it to the `sites` object so that it becomes the default for any calls to `samples()`. To create the metadata for the new chronology, we use `set_chronology()` using the properties from the [`chronology` table in Neotoma](https://open.neotomadb.org/dbschema/tables/chronologies.html):
```{r createChronology, message = FALSE}
creators <- c(set_contact(givennames = "Simon James",
familyname = "Goring",
ORCID = "0000-0002-2700-4605"),
set_contact(givennames = "Socorro",
familyname = "Dominguez Vidaña",
ORCID = "0000-0002-7926-4935"))
newChronStara <- set_chronology(agemodel = "Bchron model",
contact = creators,
isdefault = 1,
ageboundolder = max(newpredictions),
ageboundyounger = min(newpredictions),
dateprepared = lubridate::today(),
modelagetype = "Calibrated radiocarbon years BP",
chronologyname = "Simon's example chronology",
chroncontrols = controls)
newChronStara$notes <- 'newChron <- Bchron::Bchronology(ages = controls$chroncontrolage,
ageSds = abs(controls$agelimityounger -
controls$chroncontrolage),
calCurves = c("normal", rep("intcal20", 4)),
positionThicknesses = controls$thickness,
positions = controls$depth,
allowOutside = TRUE,
ids = controls$chroncontrolid,
predictPositions = predictDepths)'
```
### Adding the `chronology` to the `collectionunit`
Once we've created the chronology we need to apply it back to the collectionunit. We also need to add the predicted dates into the samples for each dataset associated with the collectionunit.
So:
1. we have a collectionunit in `stara` that is accessible at `stara[[1]]$collunits`.
2. We can use the function `add_chronology()`, which takes the chronology object and a `data.frame()` of sample ages.
3. The predicted dates associated with the new chronology need to be transferred to each `samples` object within the `collectionunit`.
This is all bound up in the `add_chronology()` function, which takes the `collectionunit`, modifys it, and returns the newly updated `collectionunit`.
```{r addChronology, message = FALSE}
newSampleAges <- data.frame(predictDepths,
age = colMeans(newpredictions),
ageolder = colMeans(newpredictions) +
apply(newpredictions, 2, sd),
ageyounger = colMeans(newpredictions) -
apply(newpredictions, 2, sd),
agetype = "Calibrated radiocarbon years")
stara[[1]]$collunits[[1]] <- add_chronology(stara[[1]]$collunits[[1]],
newChronStara,
newSampleAges)
```
With this, we now have the updated collectionunit. Lets take a look at how this affects the age model overal. To pull the ages from the prior chronologies, we use the `set_default()` function to change the default chronology, and then extract ages, depths & analysisunits:
```{r getAgesfromChronologies}
# The new chronology is currently the default chronology.
newages <- samples(stara) %>%
select(depth, analysisunitid, age) %>%
unique() %>%
arrange(depth) %>%
mutate(agecat = "new")
stara[[1]]$collunits[[1]]$chronologies <- set_default(stara[[1]]$collunits[[1]]$chronologies,
14591)
plotforages <- samples(stara) %>%
select(depth, analysisunitid, age) %>%
unique() %>%
arrange(depth) %>%
mutate(agecat = "old") %>%
bind_rows(newages)
```
And we can look at the difference visually:
```{r plotAgeDifferences, fig.cap="Differences in age representation between chronologies between existing chronologies and the new Bchron chronology."}
ggplot(plotforages, aes(x = depth, y = age)) +
geom_path(aes(color = agecat)) +
theme_bw() +
xlab("Depth (cm)") +
ylab("Calibrated Years BP")
```
So we can see the impact of the new chronology on the age model for the record, and we can make choices as to which model we want to use going forward. We can use this approach to create multiple new chronologies for a single record, tuning parameters within `Bchronology()`, or using Bacon and different parameters. Because the `chronology` is an R object we can save the objects for use in future sessions, and associate them with existin records, or we can re-run the models again.
## Summary
From this notebook we have learned how to:
1. Download a single record (the Stara record using `get_downloads()`)
2. Examining the chronologies for the record (using `chronologies()` and associated chronological controls (using `chroncontrols()`)
3. Creating a new chronology for the record (using `set_chronology()`)
4. Adding the chronology to the record (using `add_chronology()`)
5. Switching between default chronologies (using `set_default()`)
This approach is focused on a single record, but much of what is done here can be extended to multiple records using functions. We hope it's been helpful!