static.qmd

---
execute:
  freeze: auto
---

# Static branching {#static}

```{r, message = FALSE, warning = FALSE, echo = FALSE}
knitr::opts_knit$set(root.dir = fs::dir_create(tempfile()))
knitr::opts_chunk$set(collapse = TRUE, comment = "#>", eval = TRUE)
```

```{r, message = FALSE, warning = FALSE, echo = FALSE, eval = TRUE}
library(targets)
library(tarchetypes)
library(tidyverse)
```

## Branching

:::{.callout-tip}
## Performance

Branched pipelines can be computationally demanding. See the [performance chapter](#performance) for options, settings, and other choices to optimize and monitor large pipelines.
:::

Sometimes, a pipeline contains more targets than a user can comfortably type by hand. For projects with hundreds of targets, branching can make the `_targets.R` file more concise and easier to read and maintain. 

`targets` supports two types of branching: dynamic branching and [static branching](#static). Some projects are better suited to dynamic branching, while others benefit more from [static branching](#static) or a combination of both. Here is a short list of tradeoffs.

Dynamic | Static
---|---
Pipeline creates new targets at runtime. | All targets defined in advance.
Cryptic target names. | Friendly target names.
Scales to hundreds of branches. | Does not scale as easily for `tar_visnetwork()` etc.
No metaprogramming required. | Familiarity with metaprogramming is helpful.

## When to use static branching

Static branching is the act of defining a group of targets in bulk before the pipeline starts. Whereas dynamic branching uses last-minute dependency data to define the branches, static branching uses metaprogramming to modify the code of the pipeline up front. Whereas dynamic branching excels at creating a large number of very similar targets, static branching is most useful for smaller number of heterogeneous targets. Some users find it more convenient because they can use `tar_manifest()` and `tar_visnetwork()` to check the correctness of static branching before launching the pipeline.

## Map

[`tar_map()`](https://docs.ropensci.org/tarchetypes/reference/tar_map.html) from the [`tarchetypes`](https://github.com/ropensci/tarchetypes) package creates copies of existing target objects, where each new command is a variation on the original. In the example below, we have a data analysis workflow that iterates over datasets and analysis methods. The `values` data frame has the operational parameters of each data analysis, and `tar_map()` creates one new target per row.

```{r, echo = TRUE, eval = FALSE}
# _targets.R file:
library(targets)
library(tarchetypes)
library(tibble)
values <- tibble(
  method_function = rlang::syms(c("method1", "method2")),
  data_source = c("NIH", "NIAID")
)
targets <- tar_map(
  values = values,
  tar_target(analysis, method_function(data_source, reps = 10)),
  tar_target(summary, summarize_analysis(analysis, data_source))
)
list(targets)
```

```{r, echo = FALSE, eval = TRUE}
tar_script({
  library(targets)
  library(tarchetypes)
  library(tibble)
  values <- tibble(
    method_function = rlang::syms(c("method1", "method2")),
    data_source = c("NIH", "NIAID")
  )
  targets <- tar_map(
    values = values,
    tar_target(analysis, method_function(data_source, reps = 10)),
    tar_target(summary, summarize_analysis(analysis, data_source))
  )
  list(targets)
})
```

```{r, paged.print = FALSE, eval = TRUE}
tar_manifest()
```

```{r, eval = TRUE}
tar_visnetwork(targets_only = TRUE)
```

For shorter target names, use the `names` argument of `tar_map()`. And for more combinations of settings, use `tidyr::expand_grid()` on `values`.

```{r, eval = FALSE, echo = TRUE}
# _targets.R file:
library(targets)
library(tarchetypes)
library(tidyr)
values <- expand_grid( # Use all possible combinations of input settings.
  method_function = rlang::syms(c("method1", "method2")),
  data_source = c("NIH", "NIAID")
)
targets <- tar_map(
  values = values,
  names = "data_source", # Select columns from `values` for target names.
  tar_target(analysis, method_function(data_source, reps = 10)),
  tar_target(summary, summarize_analysis(analysis, data_source))
)
list(targets)
```

```{r, eval = TRUE, echo = FALSE}
tar_script({
  library(targets)
  library(tarchetypes)
  library(tidyr)
  values <- expand_grid(
    method_function = rlang::syms(c("method1", "method2")),
    data_source = c("NIH", "NIAID")
  )
  targets <- tar_map(
    values = values,
    names = "data_source", 
    tar_target(analysis, method_function(data_source, reps = 10)),
    tar_target(summary, summarize_analysis(analysis, data_source))
  )
  list(targets)
})
```

It is extra important to run `tar_manifest()` to check that `tar_map()` generates the right R code for the targets. Sometimes, the metaprogramming may not produce the desired commands on your first try.

```{r, paged.print = FALSE, eval = TRUE}
tar_manifest()
```

And of course, check the dependency graph to ensure the pipeline is properly connected. If  `tar_map()` generates a lot of targets, the graph may render slowly or look too cumbersome. If that happens, choose a small subset of rows of `values` for `tar_map()` and then try again on the smaller pipeline.

```{r, eval = TRUE}
# You may need to zoom out on this interactive graph to see all 8 targets.
tar_visnetwork(targets_only = TRUE)
```

### Limitations

[`tar_map()`](https://docs.ropensci.org/tarchetypes/reference/tar_map.html) generates [R expressions](https://adv-r.hadley.nz/expressions.html) to serve as commands in other targets. When it substitutes an element from `values`, it needs a way to transform the element into valid R code. For elements even a little bit complicated, especially nested data frames and objects with attributes, this is not always possible. For these complicated elements, it is best to use `quote()` to work with the underlying [expressions](https://adv-r.hadley.nz/expressions.html) instead of the objects themselves. See <https://github.com/ropensci/tarchetypes/discussions/105> for an example.

## Dynamic-within-static branching

You can even combine together static and dynamic branching. The static `tar_map()` is an excellent outer layer on top of targets with patterns. The following is a sketch of a pipeline that runs each of two data analysis methods 10 times, once per random seed. Static branching iterates over the method functions, while dynamic branching iterates over the seeds. `tar_map()` creates new patterns as well as new commands. So below, the summary methods map over the analysis methods both statically and dynamically.

```{r, eval = FALSE, echo = TRUE}
# _targets.R file:
library(targets)
library(tarchetypes)
library(tibble)
random_seed_target <- tar_target(random_seed, seq_len(10))
targets <- tar_map(
  values = tibble(method_function = rlang::syms(c("method1", "method2"))),
  tar_target(
    analysis,
    method_function("NIH", seed = random_seed),
    pattern = map(random_seed)
  ),
  tar_target(
    summary,
    summarize_analysis(analysis),
    pattern = map(analysis)
  )
)
list(random_seed_target, targets)
```

```{r, echo = FALSE, eval = TRUE}
tar_script({
  library(targets)
  library(tarchetypes)
  library(tibble)
  random_seed_target <- tar_target(random_seed, seq_len(10))
  targets <- tar_map(
    values = tibble(method_function = rlang::syms(c("method1", "method2"))),
    tar_target(
      analysis,
      method_function("NIH", seed = random_seed),
      pattern = map(random_seed)
    ),
    tar_target(
      summary,
      summarize_analysis(analysis),
      pattern = map(analysis)
    )
  )
  list(random_seed_target, targets)
})
```

```{r, eval = TRUE, paged.print = FALSE}
tar_manifest()
```

```{r, eval = TRUE, paged.print = FALSE}
tar_visnetwork(targets_only = TRUE)
```

## Combine

[`tar_combine()`](https://docs.ropensci.org/tarchetypes/reference/tar_combine.html) from the [`tarchetypes`](https://github.com/ropensci/tarchetypes) package creates a new target to aggregate the results of upstream targets. In the simple example below, our combined target simply aggregates the rows returned from two other targets.

```{r, eval = FALSE, echo = TRUE}
# _targets.R file:
library(targets)
library(tarchetypes)
library(tibble)
options(crayon.enabled = FALSE)
target1 <- tar_target(head, head(mtcars, 1))
target2 <- tar_target(tail, tail(mtcars, 1))
target3 <- tar_combine(combined_target, target1, target2)
list(target1, target2, target3)
```

```{r, echo = FALSE, eval = TRUE}
tar_script({
  library(targets)
  library(tarchetypes)
  library(tibble)
  options(crayon.enabled = FALSE)
  target1 <- tar_target(head_mtcars, head(mtcars, 1))
  target2 <- tar_target(tail_mtcars, tail(mtcars, 1))
  target3 <- tar_combine(combined_target, target1, target2)
  list(target1, target2, target3)
})
```

```{r, eval = TRUE}
tar_manifest()
```

```{r, eval = TRUE}
tar_visnetwork(targets_only = TRUE)
```

```{r, eval = TRUE}
tar_make()
```

```{r, eval = TRUE}
tar_read(combined_target)
```

To use `tar_combine()` and `tar_map()` together in more complicated situations, you may need to supply `unlist = FALSE` to `tar_map()`. That way, `tar_map()` will return a nested list of target objects, and you can combine the ones you want. The pipeline extends our previous `tar_map()` example by combining just the summaries, omitting the analyses from `tar_combine()`. Also note the use of `bind_rows(!!!.x)` below. This is how you supply custom code to combine the return values of other targets. `.x` is a placeholder for the return values, and `!!!` is the "unquote-splice" operator from the `rlang` package.

```{r, eval = FALSE, echo = TRUE}
# _targets.R file:
library(targets)
library(tarchetypes)
library(tibble)
random_seed <- tar_target(random_seed, seq_len(10))
mapped <- tar_map(
  unlist = FALSE, # Return a nested list from tar_map()
  values = tibble(method_function = rlang::syms(c("method1", "method2"))),
  tar_target(
    analysis,
    method_function("NIH", seed = random_seed),
    pattern = map(random_seed)
  ),
  tar_target(
    summary,
    summarize_analysis(analysis),
    pattern = map(analysis)
  )
)
combined <- tar_combine(
  combined_summaries,
  mapped[["summary"]],
  command = dplyr::bind_rows(!!!.x, .id = "method")
)
list(random_seed, mapped, combined)
```

```{r, echo = FALSE, eval = TRUE}
tar_script({
  library(targets)
  library(tarchetypes)
  library(tibble)
  random_seed <- tar_target(random_seed, seq_len(10))
  mapped <- tar_map(
    unlist = FALSE, # Return a nested list from tar_map()
    values = tibble(method_function = rlang::syms(c("method1", "method2"))),
    tar_target(
      analysis,
      method_function("NIH", seed = random_seed),
      pattern = map(random_seed)
    ),
    tar_target(
      summary,
      summarize_analysis(analysis),
      pattern = map(analysis)
    )
  )
  combined <- tar_combine(
    combined_summaries,
    mapped[["summary"]],
    command = dplyr::bind_rows(!!!.x, .id = "method")
  )
  list(random_seed, mapped, combined)
})
```

```{r, paged.print = FALSE, eval = TRUE}
tar_manifest()
```

```{r, eval = TRUE}
tar_visnetwork(targets_only = TRUE)
```

## Metaprogramming

Custom metaprogramming is a more flexible alternative to [`tar_map()`](https://docs.ropensci.org/tarchetypes/reference/tar_map.html) and [`tar_combine()`](https://docs.ropensci.org/tarchetypes/reference/tar_combine.html). [`tar_eval()`](https://docs.ropensci.org/tarchetypes/reference/tar_eval.html) from [`tarchetypes`](https://github.com/ropensci/tarchetypes) accepts an arbitrary expression and iteratively plugs in symbols. Below, we use it to branch over datasets. 

```{r, eval = FALSE, echo = TRUE}
# _targets.R
library(rlang)
library(targets)
library(tarchetypes)
string <- c("gapminder", "who", "imf")
symbol <- syms(string)
tar_eval(
  tar_target(symbol, get_data(string)),
  values = list(string = string, symbol = symbol)
)
```

```{r, echo = FALSE, eval = TRUE}
tar_script({
  library(rlang)
  library(tarchetypes)
  string <- c("gapminder", "who", "imf")
  symbol <- syms(string)
  tar_eval(
    tar_target(symbol, get_data(string)),
    values = list(string = string, symbol = symbol)
  )
})
```

[`tar_eval()`](https://docs.ropensci.org/tarchetypes/reference/tar_eval.html) has fewer guardrails than [`tar_map()`](https://docs.ropensci.org/tarchetypes/reference/tar_map.html) or [`tar_combine()`](https://docs.ropensci.org/tarchetypes/reference/tar_combine.html), so [`tar_manifest()`](https://docs.ropensci.org/targets/reference/tar_manifest.html) is especially important for checking the correctness of your metaprogramming.

```{r, eval = TRUE}
tar_manifest(fields = command)
```

## Hooks

Hooks are supported in `tarchetypes` version 0.2.0 and above, and they allow you to prepend or wrap code in multiple targets at a time. For example, `tar_hook_before()` is a robust way to invoke the [`conflicted`](https://conflicted.r-lib.org) package to resolve namespace conflicts that works with [distributed computing](#hpc) and does not require a project-level `.Rprofile` file.

```{r, echo = FALSE, eval = TRUE}
tar_script({
  library(tarchetypes)
  library(magrittr)
  tar_option_set(packages = c("conflicted", "dplyr"))
  list(
    tar_target(data, get_time_series_data()),
    tar_target(analysis1, analyze(data)),
    tar_target(analysis2, analyze(data))
  ) %>%
    tar_hook_before(
      hook = conflicted_prefer("filter", "dplyr"),
      names = starts_with("analysis")
    )
})
```

```{r, echo = TRUE, eval = FALSE}
# _targets.R file
library(tarchetypes)
library(magrittr)
tar_option_set(packages = c("conflicted", "dplyr"))
source("R/functions.R")
list(
  tar_target(data, get_time_series_data()),
  tar_target(analysis1, analyze_months(data)),
  tar_target(analysis2, analyze_weeks(data))
) %>%
  tar_hook_before(
    hook = conflicted_prefer("filter", "dplyr"),
    names = starts_with("analysis")
  )
```

```{r, eval = TRUE}
# R console
targets::tar_manifest(fields = command)
```

Similarly, `tar_hook_outer()` wraps expressions around target commands, and `tar_hook_inner()` wraps expressions around target dependencies. These hooks could potentially help encrypt targets before storage in `_targets/` and decrypt targets before retrieval, as demonstrated in the sketch below. 

Data security is the sole responsibility of the user and not the responsibility of `targets`, `tarchetypes`, or related pipeline packages. You as the user are responsible for validating your own target specifications and custom code and applying additional security precautions as appropriate for the situation.

```{r, echo = FALSE, eval = TRUE}
tar_script({
  library(tarchetypes)
  library(magrittr)
  list(
    tar_target(data1, get_data1()),
    tar_target(data2, get_data2()),
    tar_target(analysis, analyze(data1, data2))
  ) %>%
    tar_hook_outer(encrypt(.x, threads = 2)) %>%
    tar_hook_inner(decrypt(.x))
})
```

```{r, echo = TRUE, eval = FALSE}
# _targets.R file
library(tarchetypes)
library(magrittr)
list(
  tar_target(data1, get_data1()),
  tar_target(data2, get_data2()),
  tar_target(analysis, analyze(data1, data2))
) %>%
  tar_hook_outer(encrypt(.x, threads = 2)) %>%
  tar_hook_inner(decrypt(.x))
```

```{r, eval = TRUE}
# R console
targets::tar_manifest(fields = command)
```