-
Notifications
You must be signed in to change notification settings - Fork 29
/
Copy pathindex.qmd
38 lines (24 loc) · 3.7 KB
/
index.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
---
execute:
freeze: auto
---
```{r, message = FALSE, warning = FALSE, echo = FALSE}
knitr::opts_knit$set(root.dir = fs::dir_create(tempfile()))
knitr::opts_chunk$set(collapse = TRUE, comment = "#>", eval = TRUE)
```
# Introduction {#intro}
Pipeline tools coordinate the pieces of computationally demanding analysis projects. The [`targets`](https://docs.ropensci.org/targets/) package is a [Make](https://www.gnu.org/software/make/)-like pipeline tool for statistics and data science in R. The package skips costly runtime for tasks that are already up to date, orchestrates the necessary computation with implicit parallel computing, and abstracts files as R objects. If all the current output matches the current upstream code and data, then the whole pipeline is up to date, and the results are more trustworthy than otherwise.
## Motivation
Data analysis can be slow. A round of scientific computation can take several minutes, hours, or even days to complete. After it finishes, if you update your code or data, your hard-earned results may no longer be valid. Unchecked, this invalidation creates chronic [Sisyphean](https://en.wikipedia.org/wiki/Sisyphus) loop:
1. Launch the code.
2. Wait while it runs.
3. Discover an issue.
4. Restart from scratch.
## Pipeline tools
[Pipeline tools](https://github.com/pditommaso/awesome-pipeline) like [GNU Make](https://www.gnu.org/software/make/) break the cycle. They watch the dependency graph of the whole workflow and skip steps, or "targets", whose code, data, and upstream dependencies have not changed since the last run of the pipeline. When all targets are up to date, this is evidence that the results match the underlying code and data, which helps us trust the results and confirm the computation is reproducible.
## The `targets` package
Unlike most [pipeline tools](https://github.com/pditommaso/awesome-pipeline), which are language agnostic or Python-focused, the [`targets`](https://docs.ropensci.org/targets/) package allows data scientists and researchers to work entirely within R. [`targets`](https://docs.ropensci.org/targets/) implicitly nudges users toward a clean, function-oriented programming style that fits the intent of the R language and helps practitioners maintain their data analysis projects.
## About this manual
This manual is a step-by-step user guide to [`targets`](https://docs.ropensci.org/targets/). The most important chapters are the [walkthrough](#walkthrough), [help guide](#help), and [debugging guide](#debugging). Subsequent chapters explain how to [write code](#functions), [manage projects](#projects), utilize [high-performance computing](#hpc), [transition from `drake`](#drake), and more. See the [documentation website](https://docs.ropensci.org/targets/index.html) for most other major resources, including [installation instructions](https://docs.ropensci.org/targets/index.html#installation), links to [example projects](https://docs.ropensci.org/targets/index.html#examples), and a [reference page with all user-side functions](https://docs.ropensci.org/targets/reference/index.html).
## What about `drake`?
The [`drake`](https://github.com/ropensci/drake) is an older R-focused pipeline tool, and [`targets`](https://docs.ropensci.org/targets/) is [`drake`](https://github.com/ropensci/drake)'s long-term successor. There is a [special chapter](#drake) to explain why [`targets`](https://docs.ropensci.org/targets/) was created, what this means for [`drake`](https://github.com/ropensci/drake)'s future, advice for [`drake`](https://github.com/ropensci/drake) users transitioning to [`targets`](https://docs.ropensci.org/targets/), and the main technical advantages of [`targets`](https://docs.ropensci.org/targets/) over [`drake`](https://github.com/ropensci/drake).