-
Notifications
You must be signed in to change notification settings - Fork 29
/
Copy pathpackages.qmd
79 lines (52 loc) · 6.36 KB
/
packages.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---
execute:
freeze: auto
---
```{r, message = FALSE, warning = FALSE, echo = FALSE}
knitr::opts_knit$set(root.dir = fs::dir_create(tempfile()))
knitr::opts_chunk$set(collapse = TRUE, comment = "#>", eval = TRUE)
```
```{r, message = FALSE, warning = FALSE, echo = FALSE}
library(targets)
library(tarchetypes)
```
# Packages {#packages}
This chapter describes the recommended roles of R packages in `targets` pipelines and how to manage them in different situations.
## Loading and configuring R packages
For most pipelines, it is straightforward to load the R packages that your targets need in order to run. You can either:
1. Call `library()` at the top of the target script file (default: `_targets.R`) to load each package the conventional way, or
2. Name the required packages using the `packages` argument of `tar_option_set()`.
2\. is often faster, especially for utilities like `tar_visnetwork()`, because it avoids loading packages unless absolutely necessary.
Some package management workflows are more complicated. If your use special configuration with [conflicted](https://github.com/r-lib/conflicted), [`box`](https://klmr.me/box/), [`import`](https://import.rticulate.org/), or similar utility, please do your configuration inside a project-level `.Rprofile` file instead of the target script file (default: `_targets.R`). In addition, if you use distributed workers inside external containers (Docker, Singularity, AWS AMI, etc.) make sure each container has a copy of this same `.Rprofile` file where the R worker process spawns. This approach is ensures that all [remote workers](#hpc) are configured the same way as the local main process.
## Package management with `renv`
`targets` and `renv` work extremely well together as an overall reproducibility solution for data analysis pipelines. `targets` makes sure your results are up to date, and `renv` keeps track of the packages you use.
If you use [`renv`](https://rstudio.github.io/renv/), then overhead from project initialization could slow down pipelines and [workers](#hpc). If you experience slowness, please make sure your [`renv`](https://rstudio.github.io/renv/) library is on a fast file system. (For example, slow network drives can severely reduce performance.) In addition, you can disable the slowest initialization checks. After confirming at <https://rstudio.github.io/renv/reference/config.html> that you can safely disable these checks, you can write the following lines in your user-level `.Renviron` file:
```
RENV_CONFIG_SANDBOX_ENABLED=false
RENV_CONFIG_SYNCHRONIZED_CHECK=false
```
If you disable the synchronization check, remember to call `renv::status()` periodically to check the health of your `renv` project library.
## R packages as projects
It is good practice to organize the files of a `targets` project similar to a research compendium or R package. However, unless have a specific reason to do so, it is usually not necessary to literally implement your `targets` pipeline as an installable R package with its own `DESCRIPTION` file. A research compendium backed by a `renv` library and Git-backed version control is enough reproducibility for most projects.
## Target Factories
To make specific `targets` pipelines reusable, it is usually better to create a package with specialized target factories tailored to your use case. Packages [`stantargets`](https://docs.ropensci.org/stantargets/) and [`jagstargets`](https://wlandau.github.io/jagstargets/) are examples, and you can find more information on the broader R Targetopia at <https://wlandau.github.io/targetopia/>.
## Package-based invalidation
Still, it is sometimes desirable to treat functions and objects from a package as dependencies when it comes to deciding which targets to rerun and which targets to skip. `targets` does not track package functions by default because this is not a common need. Usually, local package libraries do not need to change very often, and it is best to maintain a reproducible project library using [`renv`](https://rstudio.github.io/renv/articles/renv.html).
However, if you are developing a package alongside a `targets` pipeline that uses it, you may wish to invalidate certain targets as you make changes to your package. For example, if you are working on a novel statistical method, it is good practice to implement the method itself as an R package and perform the computation for the research paper in one or more `targets` pipelines.
To track the contents of packages `package1` and `package2`, you must
1. Fully install these packages with `install.packages()` or equivalent. `devtools::load_all()` is insufficient because it does not make the packages available to [parallel workers](#hpc).
2. Write the following in your target script file (default: `_targets.R`):
```{r, eval = FALSE}
# _targets.R
library(targets)
library(tarchetypes)
tar_option_set(
packages = c("package1", "package2", ...), # `...` is for other packages.
imports = c("package1", "package2")
)
list(
tar_target(name = output, command = function_from_package1())
)
```
`packages = c("package1", "package2", ...)` tells `targets` to call `library(package1)`, `library(package2)`, etc. before running each target. `imports = c("package1", "package2")` tells `targets` to dive into the environments of `package1` and `package2` and reproducibly track all the objects and datasets as if they were part of the global environment. For example, if you define a function `function_from_package1()` in `package1`, then you should see a function node for `function_from_package1()` in the graph produced by `tar_visnetwork(targets_only = FALSE)`, and targets downstream of `function_from_package1()` will invalidate if you install an update to `package1` with a new version of `function_from_package1()`. The next time you call `tar_make()`, those invalidated targets will automatically rerun.
One limitation is that entire namespaced calls like `package1::function_from_package1()` are not registered in the dependency graph because of the limitations of the static code analysis capabilities of `targets` (powered by `codetools::findGlobals()`). `tar_target(name = output, command = function_from_package1())` has a dependency `function_from_package1()`, but `tar_target(name = output, command = package1::function_from_package1())` does not have a dependency `package1::function_from_package1()`. This is because `::` is treated as a function with arguments `package` and `function_from_package1`.