-
Notifications
You must be signed in to change notification settings - Fork 29
/
Copy pathstatic.qmd
427 lines (366 loc) · 14.9 KB
/
static.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
---
execute:
freeze: auto
---
# Static branching {#static}
```{r, message = FALSE, warning = FALSE, echo = FALSE}
knitr::opts_knit$set(root.dir = fs::dir_create(tempfile()))
knitr::opts_chunk$set(collapse = TRUE, comment = "#>", eval = TRUE)
```
```{r, message = FALSE, warning = FALSE, echo = FALSE, eval = TRUE}
library(targets)
library(tarchetypes)
library(tidyverse)
```
## Branching
:::{.callout-tip}
## Performance
Branched pipelines can be computationally demanding. See the [performance chapter](#performance) for options, settings, and other choices to optimize and monitor large pipelines.
:::
Sometimes, a pipeline contains more targets than a user can comfortably type by hand. For projects with hundreds of targets, branching can make the `_targets.R` file more concise and easier to read and maintain.
`targets` supports two types of branching: dynamic branching and [static branching](#static). Some projects are better suited to dynamic branching, while others benefit more from [static branching](#static) or a combination of both. Here is a short list of tradeoffs.
Dynamic | Static
---|---
Pipeline creates new targets at runtime. | All targets defined in advance.
Cryptic target names. | Friendly target names.
Scales to hundreds of branches. | Does not scale as easily for `tar_visnetwork()` etc.
No metaprogramming required. | Familiarity with metaprogramming is helpful.
## When to use static branching
Static branching is the act of defining a group of targets in bulk before the pipeline starts. Whereas dynamic branching uses last-minute dependency data to define the branches, static branching uses metaprogramming to modify the code of the pipeline up front. Whereas dynamic branching excels at creating a large number of very similar targets, static branching is most useful for smaller number of heterogeneous targets. Some users find it more convenient because they can use `tar_manifest()` and `tar_visnetwork()` to check the correctness of static branching before launching the pipeline.
## Map
[`tar_map()`](https://docs.ropensci.org/tarchetypes/reference/tar_map.html) from the [`tarchetypes`](https://github.com/ropensci/tarchetypes) package creates copies of existing target objects, where each new command is a variation on the original. In the example below, we have a data analysis workflow that iterates over datasets and analysis methods. The `values` data frame has the operational parameters of each data analysis, and `tar_map()` creates one new target per row.
```{r, echo = TRUE, eval = FALSE}
# _targets.R file:
library(targets)
library(tarchetypes)
library(tibble)
values <- tibble(
method_function = rlang::syms(c("method1", "method2")),
data_source = c("NIH", "NIAID")
)
targets <- tar_map(
values = values,
tar_target(analysis, method_function(data_source, reps = 10)),
tar_target(summary, summarize_analysis(analysis, data_source))
)
list(targets)
```
```{r, echo = FALSE, eval = TRUE}
tar_script({
library(targets)
library(tarchetypes)
library(tibble)
values <- tibble(
method_function = rlang::syms(c("method1", "method2")),
data_source = c("NIH", "NIAID")
)
targets <- tar_map(
values = values,
tar_target(analysis, method_function(data_source, reps = 10)),
tar_target(summary, summarize_analysis(analysis, data_source))
)
list(targets)
})
```
```{r, paged.print = FALSE, eval = TRUE}
tar_manifest()
```
```{r, eval = TRUE}
tar_visnetwork(targets_only = TRUE)
```
For shorter target names, use the `names` argument of `tar_map()`. And for more combinations of settings, use `tidyr::expand_grid()` on `values`.
```{r, eval = FALSE, echo = TRUE}
# _targets.R file:
library(targets)
library(tarchetypes)
library(tidyr)
values <- expand_grid( # Use all possible combinations of input settings.
method_function = rlang::syms(c("method1", "method2")),
data_source = c("NIH", "NIAID")
)
targets <- tar_map(
values = values,
names = "data_source", # Select columns from `values` for target names.
tar_target(analysis, method_function(data_source, reps = 10)),
tar_target(summary, summarize_analysis(analysis, data_source))
)
list(targets)
```
```{r, eval = TRUE, echo = FALSE}
tar_script({
library(targets)
library(tarchetypes)
library(tidyr)
values <- expand_grid(
method_function = rlang::syms(c("method1", "method2")),
data_source = c("NIH", "NIAID")
)
targets <- tar_map(
values = values,
names = "data_source",
tar_target(analysis, method_function(data_source, reps = 10)),
tar_target(summary, summarize_analysis(analysis, data_source))
)
list(targets)
})
```
It is extra important to run `tar_manifest()` to check that `tar_map()` generates the right R code for the targets. Sometimes, the metaprogramming may not produce the desired commands on your first try.
```{r, paged.print = FALSE, eval = TRUE}
tar_manifest()
```
And of course, check the dependency graph to ensure the pipeline is properly connected. If `tar_map()` generates a lot of targets, the graph may render slowly or look too cumbersome. If that happens, choose a small subset of rows of `values` for `tar_map()` and then try again on the smaller pipeline.
```{r, eval = TRUE}
# You may need to zoom out on this interactive graph to see all 8 targets.
tar_visnetwork(targets_only = TRUE)
```
### Limitations
[`tar_map()`](https://docs.ropensci.org/tarchetypes/reference/tar_map.html) generates [R expressions](https://adv-r.hadley.nz/expressions.html) to serve as commands in other targets. When it substitutes an element from `values`, it needs a way to transform the element into valid R code. For elements even a little bit complicated, especially nested data frames and objects with attributes, this is not always possible. For these complicated elements, it is best to use `quote()` to work with the underlying [expressions](https://adv-r.hadley.nz/expressions.html) instead of the objects themselves. See <https://github.com/ropensci/tarchetypes/discussions/105> for an example.
## Dynamic-within-static branching
You can even combine together static and dynamic branching. The static `tar_map()` is an excellent outer layer on top of targets with patterns. The following is a sketch of a pipeline that runs each of two data analysis methods 10 times, once per random seed. Static branching iterates over the method functions, while dynamic branching iterates over the seeds. `tar_map()` creates new patterns as well as new commands. So below, the summary methods map over the analysis methods both statically and dynamically.
```{r, eval = FALSE, echo = TRUE}
# _targets.R file:
library(targets)
library(tarchetypes)
library(tibble)
random_seed_target <- tar_target(random_seed, seq_len(10))
targets <- tar_map(
values = tibble(method_function = rlang::syms(c("method1", "method2"))),
tar_target(
analysis,
method_function("NIH", seed = random_seed),
pattern = map(random_seed)
),
tar_target(
summary,
summarize_analysis(analysis),
pattern = map(analysis)
)
)
list(random_seed_target, targets)
```
```{r, echo = FALSE, eval = TRUE}
tar_script({
library(targets)
library(tarchetypes)
library(tibble)
random_seed_target <- tar_target(random_seed, seq_len(10))
targets <- tar_map(
values = tibble(method_function = rlang::syms(c("method1", "method2"))),
tar_target(
analysis,
method_function("NIH", seed = random_seed),
pattern = map(random_seed)
),
tar_target(
summary,
summarize_analysis(analysis),
pattern = map(analysis)
)
)
list(random_seed_target, targets)
})
```
```{r, eval = TRUE, paged.print = FALSE}
tar_manifest()
```
```{r, eval = TRUE, paged.print = FALSE}
tar_visnetwork(targets_only = TRUE)
```
## Combine
[`tar_combine()`](https://docs.ropensci.org/tarchetypes/reference/tar_combine.html) from the [`tarchetypes`](https://github.com/ropensci/tarchetypes) package creates a new target to aggregate the results of upstream targets. In the simple example below, our combined target simply aggregates the rows returned from two other targets.
```{r, eval = FALSE, echo = TRUE}
# _targets.R file:
library(targets)
library(tarchetypes)
library(tibble)
options(crayon.enabled = FALSE)
target1 <- tar_target(head, head(mtcars, 1))
target2 <- tar_target(tail, tail(mtcars, 1))
target3 <- tar_combine(combined_target, target1, target2)
list(target1, target2, target3)
```
```{r, echo = FALSE, eval = TRUE}
tar_script({
library(targets)
library(tarchetypes)
library(tibble)
options(crayon.enabled = FALSE)
target1 <- tar_target(head_mtcars, head(mtcars, 1))
target2 <- tar_target(tail_mtcars, tail(mtcars, 1))
target3 <- tar_combine(combined_target, target1, target2)
list(target1, target2, target3)
})
```
```{r, eval = TRUE}
tar_manifest()
```
```{r, eval = TRUE}
tar_visnetwork(targets_only = TRUE)
```
```{r, eval = TRUE}
tar_make()
```
```{r, eval = TRUE}
tar_read(combined_target)
```
To use `tar_combine()` and `tar_map()` together in more complicated situations, you may need to supply `unlist = FALSE` to `tar_map()`. That way, `tar_map()` will return a nested list of target objects, and you can combine the ones you want. The pipeline extends our previous `tar_map()` example by combining just the summaries, omitting the analyses from `tar_combine()`. Also note the use of `bind_rows(!!!.x)` below. This is how you supply custom code to combine the return values of other targets. `.x` is a placeholder for the return values, and `!!!` is the "unquote-splice" operator from the `rlang` package.
```{r, eval = FALSE, echo = TRUE}
# _targets.R file:
library(targets)
library(tarchetypes)
library(tibble)
random_seed <- tar_target(random_seed, seq_len(10))
mapped <- tar_map(
unlist = FALSE, # Return a nested list from tar_map()
values = tibble(method_function = rlang::syms(c("method1", "method2"))),
tar_target(
analysis,
method_function("NIH", seed = random_seed),
pattern = map(random_seed)
),
tar_target(
summary,
summarize_analysis(analysis),
pattern = map(analysis)
)
)
combined <- tar_combine(
combined_summaries,
mapped[["summary"]],
command = dplyr::bind_rows(!!!.x, .id = "method")
)
list(random_seed, mapped, combined)
```
```{r, echo = FALSE, eval = TRUE}
tar_script({
library(targets)
library(tarchetypes)
library(tibble)
random_seed <- tar_target(random_seed, seq_len(10))
mapped <- tar_map(
unlist = FALSE, # Return a nested list from tar_map()
values = tibble(method_function = rlang::syms(c("method1", "method2"))),
tar_target(
analysis,
method_function("NIH", seed = random_seed),
pattern = map(random_seed)
),
tar_target(
summary,
summarize_analysis(analysis),
pattern = map(analysis)
)
)
combined <- tar_combine(
combined_summaries,
mapped[["summary"]],
command = dplyr::bind_rows(!!!.x, .id = "method")
)
list(random_seed, mapped, combined)
})
```
```{r, paged.print = FALSE, eval = TRUE}
tar_manifest()
```
```{r, eval = TRUE}
tar_visnetwork(targets_only = TRUE)
```
## Metaprogramming
Custom metaprogramming is a more flexible alternative to [`tar_map()`](https://docs.ropensci.org/tarchetypes/reference/tar_map.html) and [`tar_combine()`](https://docs.ropensci.org/tarchetypes/reference/tar_combine.html). [`tar_eval()`](https://docs.ropensci.org/tarchetypes/reference/tar_eval.html) from [`tarchetypes`](https://github.com/ropensci/tarchetypes) accepts an arbitrary expression and iteratively plugs in symbols. Below, we use it to branch over datasets.
```{r, eval = FALSE, echo = TRUE}
# _targets.R
library(rlang)
library(targets)
library(tarchetypes)
string <- c("gapminder", "who", "imf")
symbol <- syms(string)
tar_eval(
tar_target(symbol, get_data(string)),
values = list(string = string, symbol = symbol)
)
```
```{r, echo = FALSE, eval = TRUE}
tar_script({
library(rlang)
library(tarchetypes)
string <- c("gapminder", "who", "imf")
symbol <- syms(string)
tar_eval(
tar_target(symbol, get_data(string)),
values = list(string = string, symbol = symbol)
)
})
```
[`tar_eval()`](https://docs.ropensci.org/tarchetypes/reference/tar_eval.html) has fewer guardrails than [`tar_map()`](https://docs.ropensci.org/tarchetypes/reference/tar_map.html) or [`tar_combine()`](https://docs.ropensci.org/tarchetypes/reference/tar_combine.html), so [`tar_manifest()`](https://docs.ropensci.org/targets/reference/tar_manifest.html) is especially important for checking the correctness of your metaprogramming.
```{r, eval = TRUE}
tar_manifest(fields = command)
```
## Hooks
Hooks are supported in `tarchetypes` version 0.2.0 and above, and they allow you to prepend or wrap code in multiple targets at a time. For example, `tar_hook_before()` is a robust way to invoke the [`conflicted`](https://conflicted.r-lib.org) package to resolve namespace conflicts that works with [distributed computing](#hpc) and does not require a project-level `.Rprofile` file.
```{r, echo = FALSE, eval = TRUE}
tar_script({
library(tarchetypes)
library(magrittr)
tar_option_set(packages = c("conflicted", "dplyr"))
list(
tar_target(data, get_time_series_data()),
tar_target(analysis1, analyze(data)),
tar_target(analysis2, analyze(data))
) %>%
tar_hook_before(
hook = conflicted_prefer("filter", "dplyr"),
names = starts_with("analysis")
)
})
```
```{r, echo = TRUE, eval = FALSE}
# _targets.R file
library(tarchetypes)
library(magrittr)
tar_option_set(packages = c("conflicted", "dplyr"))
source("R/functions.R")
list(
tar_target(data, get_time_series_data()),
tar_target(analysis1, analyze_months(data)),
tar_target(analysis2, analyze_weeks(data))
) %>%
tar_hook_before(
hook = conflicted_prefer("filter", "dplyr"),
names = starts_with("analysis")
)
```
```{r, eval = TRUE}
# R console
targets::tar_manifest(fields = command)
```
Similarly, `tar_hook_outer()` wraps expressions around target commands, and `tar_hook_inner()` wraps expressions around target dependencies. These hooks could potentially help encrypt targets before storage in `_targets/` and decrypt targets before retrieval, as demonstrated in the sketch below.
Data security is the sole responsibility of the user and not the responsibility of `targets`, `tarchetypes`, or related pipeline packages. You as the user are responsible for validating your own target specifications and custom code and applying additional security precautions as appropriate for the situation.
```{r, echo = FALSE, eval = TRUE}
tar_script({
library(tarchetypes)
library(magrittr)
list(
tar_target(data1, get_data1()),
tar_target(data2, get_data2()),
tar_target(analysis, analyze(data1, data2))
) %>%
tar_hook_outer(encrypt(.x, threads = 2)) %>%
tar_hook_inner(decrypt(.x))
})
```
```{r, echo = TRUE, eval = FALSE}
# _targets.R file
library(tarchetypes)
library(magrittr)
list(
tar_target(data1, get_data1()),
tar_target(data2, get_data2()),
tar_target(analysis, analyze(data1, data2))
) %>%
tar_hook_outer(encrypt(.x, threads = 2)) %>%
tar_hook_inner(decrypt(.x))
```
```{r, eval = TRUE}
# R console
targets::tar_manifest(fields = command)
```