-
Notifications
You must be signed in to change notification settings - Fork 29
/
Copy pathliterate-programming.qmd
189 lines (143 loc) · 7.99 KB
/
literate-programming.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
---
execute:
freeze: auto
---
# Literate programming {#literate-programming}
```{r, message = FALSE, warning = FALSE, echo = FALSE, eval = TRUE}
knitr::opts_knit$set(root.dir = fs::dir_create(tempfile()))
knitr::opts_chunk$set(collapse = TRUE, comment = "#>", eval = TRUE)
options(crayon.enabled = FALSE)
Sys.setenv(TAR_WARN = "false")
```
Literate programming is the practice of mixing code and descriptive writing in order to execute and explain a data analysis simultaneously in the same document. The `targets` package supports literate programming through tight integration with [Quarto](https://quarto.org/), [R Markdown](https://rmarkdown.rstudio.com/), and [`knitr`](https://yihui.org/knitr/). It is recommended to learn one of these three tools before proceeding.
## Scope
There are two kinds of literate programming in `targets`:
1. A literate programming source document (or Quarto project) that renders inside an individual target. Here, you define a special kind of target that runs a lightweight R Markdown report which depends on upstream targets.
2. Target Markdown, an overarching system in which one or more [Quarto](https://quarto.org/) or R Markdown files write the `_targets.R` file and encapsulate the pipeline.
We recommend (1) in order to fully embrace pipelines as a paradigm, and that is where this chapter will focus. However, (2) is still supported, and we include it [in an appendix](#markdown).
## R Markdown targets
Here, literate programming serves to display, summarize, and annotate results from upstream in the `targets` pipeline. The document(s) have little to no computation of their own, and they make heavy use of `tar_read()` and `tar_load()` to leverage output from other targets.
As an example, let us extend the [walkthrough example](#walkthrough) chapter with the following [R Markdown](https://rmarkdown.rstudio.com/) source file `report.Rmd`.
![](./man/figures/knitr-source.png)
This document depends on targets `fit` and `hist`. If we previously ran the pipeline and the data store `_targets/` exists, then `tar_read()` and `tar_load()` will read those targets and show them in the rendered HTML output `report.html`.
![](./man/figures/knitr-ide.png)
With the `tar_render()` function in [`tarchetypes`](https://github.com/ropensci/tarchetypes), we can go a step further and include `report.Rmd` as a *target* in the pipeline. This new targets re-renders `report.Rmd` whenever `fit` or `hist` changes, which means `tar_make()` brings the output file `report.html` up to date.
```{r, echo = FALSE, eval = TRUE}
lines <- c(
"---",
"output: html_document",
"---",
"",
"```{r}",
"tar_read(fit)",
"tar_load(hist)",
"```"
)
writeLines(lines, "report.Rmd")
```
```{r, eval = TRUE}
library(targets)
library(tarchetypes)
target <- tar_render(report, "report.Rmd") # Just defines a target object.
target$command$expr[[1]]
```
`tar_render()` is like `tar_target()`, except that you supply the file path to the R Markdown report instead of an R command. Here it is at the bottom of the example `_targets.R` file below:
```{r, eval = FALSE}
# _targets.R
library(targets)
library(tarchetypes)
source("R/functions.R")
list(
tar_target(
raw_data_file,
"data/raw_data.csv",
format = "file"
),
tar_target(
raw_data,
read_csv(raw_data_file, col_types = cols())
),
tar_target(
data,
raw_data %>%
mutate(Ozone = replace_na(Ozone, mean(Ozone, na.rm = TRUE)))
),
tar_target(hist, create_plot(data)),
tar_target(fit, biglm(Ozone ~ Wind + Temp, data)),
tar_render(report, "report.Rmd") # Here is our call to tar_render().
)
```
When we visualize the pipeline, we see that our `report` target depends on targets `fit` and `hist`. `tar_render()` automatically detects these upstream dependencies by statically analyzing `report.Rmd` for calls to `tar_load()` and `tar_read()`.
```{r, eval = FALSE}
# R console
tar_visnetwork()
```
![](./man/figures/knitr-graph.png)
## Quarto targets
`tarchetypes` >= 0.6.0.9000 supports a `tar_quarto()` function, which is like `tar_render()`, but for Quarto. For an individual source document, `tar_quarto()` works exactly the same way as `tar_render()`. However, `tar_quarto()` is more powerful: you can supply the path to an entire Quarto project, such as a book, blog, or website. `tar_quarto()` looks for target dependencies in all the source documents (e.g. listed in `_quarto.yml`), and it tracks the important files in the project for changes (run `tar_quarto_files()` to see which ones).
## Parameterized documents
[`tarchetypes`](https://docs.ropensci.org/tarchetypes) functions make it straightforward to use [parameterized R Markdown](https://rmarkdown.rstudio.com/developer_parameterized_reports.html) and [parameterized Quarto](https://quarto.org/docs/computations/parameters.html) in a `targets` pipeline. The next two subsections walk through the major use cases.
## Single parameter set
In this scenario, the pipeline renders your parameterized report one time using a single set of parameters. These parameters can be upstream targets, global objects, or fixed values. Simply pass a `params` argument to [`tar_render()`](https://docs.ropensci.org/tarchetypes/reference/tar_render.html) or an `execute_params` argument to [`tar_quarto()`](https://docs.ropensci.org/tarchetypes/reference/tar_quarto.html). Example:
```{r, eval = FALSE}
# _targets.R
library(targets)
library(tarchetypes)
list(
tar_target(data, data.frame(x = seq_len(26), y = letters))
tar_quarto(report, "report.qmd", execute_params = list(your_param = data))
)
```
Internally, the `report` target runs:
```{r, eval = FALSE}
# R console
quarto::quarto_render("report.qmd", params = list(your_param = your_target))
```
where `report.qmd` looks like this:
![](./man/figures/param-qmd.png)
See [`tar_quarto()` examples](https://docs.ropensci.org/tarchetypes/reference/tar_quarto.html#examples) and [`tar_render()` examples](https://docs.ropensci.org/tarchetypes/reference/tar_render.html#examples) for more.
## Multiple parameter sets
In this scenario, you still have a single report, but you render it multiple times over a grid of parameters. This time, use [`tar_quarto_rep()`](https://docs.ropensci.org/tarchetypes/reference/tar_quarto_rep.html) or [`tar_render_rep()`](https://docs.ropensci.org/tarchetypes/reference/tar_render_rep.html). Each of these functions takes as input a grid of parameters with one column per parameter and one row per parameter set, where each parameter set is used to render an instance of the document. In other words, the number of rows in the parameter grid is the number of output documents you will produce. Below is an example `_targets.R` file using `tar_render_rep()`. Usage with `tar_quarto_rep()` is the same^[except the parameter grid argument is called `execute_params` in `tar_quarto_rep()`.].
```{r, eval = FALSE}
# _targets.R
library(targets)
library(tarchetypes)
tar_option_set(packages = "tibble")
list(
tar_target(x, "value_of_x"),
tar_render_rep(
report,
"report.Rmd",
params = tibble(
par = c("par_val_1", "par_val_2", "par_val_3", "par_val_4"),
output_file = c("f1.html", "f2.html", "f3.html", "f4.html")
),
batches = 2
)
)
```
where `report.Rmd` has the following YAML front matter:
```
title: report
output_format: html_document
params:
par: "default value"
```
and the following R code chunk:
```{r, eval = FALSE}
print(params$par)
print(tar_read(x))
```
`tar_render_rep()` creates a target for the parameter grid and uses [dynamic branching](#dynamic) to render the output reports in batches. In this case, we have two batches (dynamic branches) that each produce two reports (four output reports total).
```{r, eval = FALSE}
# R console
tar_make()
#> ● run target x
#> ● run target report_params
#> ● run branch report_9e7470a1
#> ● run branch report_457829de
#> ● end pipeline
```
The third output file `f3.html` is below, and the rest look similar.
![](./man/figures/dynamic-rmarkdown-params.png)
For more information, see [these examples](https://docs.ropensci.org/tarchetypes/reference/tar_render_rep.html#examples).