forked from NESCent/popgenInfo
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathCONTRIBUTING_WITH_GIT2R.Rmd
618 lines (475 loc) · 22.8 KB
/
CONTRIBUTING_WITH_GIT2R.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
---
title: "Contributing with git2r"
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Introduction
## Background/Motivation
The **popgenInfo** project was designed to be an educational tool that can move
at the same pace as method development by allowing people skilled in population
genetics analysis to submit workflows that demonstrate the strengths of any
given analysis in R. We use [git](https://git-scm.com/) and
[GitHub](https://github.com) to accept contributions via pull requests using
[GitHub flow](https://guides.github.com/introduction/flow/).
> **What is git?**
>
> Git is a *version control* program. It allows you to keep track of all changes
that happen when you are writing, analyzing data, or coding. You can think of it
as a sort of electronic lab notebook. Just like you would record any weights or
measures carefully in your bench lab notebook, so should you record any analyses
you do on the computer. With git, you "add" and "commit" changes that you make.
These are analogous to writing down the values of your measurements and noting
how you took them.
The basic steps of the process are:
i. **Fork** the [popgenInfo repository](https://github.com/NESCent/popgenInfo)\*
ii. **Clone** your fork to your local machine.\*
1. Checkout your master branch, **fetch** the changes from NESCent and **merge**
them into your fork\^.
2. Create a new **branch** and then **add** and **commit** new changes or content.
3. **Push** your changes to your fork.
4. Open a **pull request** from your fork to NESCent.
\* These should only be done once.
\^ Start here if you already have cloned your fork.
This particular tutorial will show you the basics of contributing to the
popgenInfo website using the R package
[*git2r*](https://cran.r-project.org/package=git2r), which was developed by the
[rOpenSci project](https://ropensci.org/). This package allows for a nearly
seamless and uniform interface for all contributors regardless of operating
system. A companion set of videos to go along with this tutorial can be found at
https://www.youtube.com/playlist?list=PLSFzyC3wp8-doiOcjDLlryIojK29qX2NL.
## Requirements
We assume that you have already done a few things:
1. Sign up for a [GitHub account](https://github.com)
2. Install [R](https://cran.r-project.org)
3. Install [Rstudio](https://rstudio.com/desktop/download)
If you have also installed [git](https://git-scm.com/), you can set up Rstudio
to work with git to help you commit changes, pull, and push. For the purposes of
this tutorial, these things are not necessary.
# Step i: Forking the repository (DO ONCE)
> If you have already forked and cloned the repository, go to Step 1.
Go to
[https://github.com/NESCent/popgenInfo](https://github.com/NESCent/popgenInfo)
and [click on the "Fork"
button](https://help.github.com/articles/fork-a-repo/#fork-an-example-repository).
This will create a copy of the NESCent popgenInfo repository to your account.
After you've forked the repository, you never have to fork it again!
# Step ii: Clone your fork to your computer (DO ONCE)
Now that you have your repository forked to your account, you will need to
**clone** it to your computer. Just like forking, you only have to clone your
fork to your computer once. It will live there until you decide to remove it. We
will need two things to clone our repository:
1. a place you want to store this repostiory (eg. a folder called `forks` in your `Documents`)
2. git2r
```{r, tmpdir_setup, echo = FALSE}
fork_dir <- tempdir()
```
## Create your forks folder
First, we need to make sure that we are working in the folder where we want to
set up our fork. In this tutorial, we will use `setwd()` to do this via the R
console, but it is possible to do this via Rstudio.
If you are using Windows:
```{r, eval = FALSE}
fork_dir <- "~/forks"
```
If you are using OSX:
```{r, eval = FALSE}
fork_dir <- "~/Documents/forks"
```
Now, we create the directory and then move there:
```{r, eval = FALSE}
dir.create(fork_dir)
```
We can use the function `list.files()` to see what's inside the directory:
```{r showdir, eval = FALSE}
list.files()
```
```{r showshadow, echo = FALSE}
list.files(fork_dir)
```
Currently, there's nothing here. We will use the `clone()` function from *git2r*
to make a copy of our repository. If you type `help("clone", "git2r")` in your
R console, you can see documentation about the function. We need two things
from this function:
1. The URL of the git repository
2. The path to our folder where we want to store it
The URL for the git repository is simply just the URL for your fork with `.git`
at the end of it. For mine, it's `https://github.com/zkamvar/popgenInfo.git`. To
keep things simple, we will name the folder we want to put the repository in
"popgenInfo".
```{r clone, eval = FALSE}
library("git2r")
clone(url = "https://github.com/zkamvar/popgenInfo.git",
local_path = "popgenInfo")
```
```{r shadow_clone, echo = FALSE}
library("git2r")
clone(url = "https://github.com/zkamvar/popgenInfo.git",
local_path = paste0(fork_dir, "/popgenInfo"))
```
Now when we look at our directory, we can see that we have one folder called
"popgenInfo":
```{r show_clone, eval = FALSE}
list.files()
```
```{r show_shadow, echo = FALSE}
list.files(fork_dir)
```
We can move into the folder with `setwd()` and look at all the files in there.
```{r go_clone, eval = FALSE}
setwd("popgenInfo")
list.files()
```
```{r go_shadow, echo = FALSE}
fork_pgi <- paste0(fork_dir, "/popgenInfo")
list.files(fork_pgi)
```
Now we have successfully cloned our repository into our computer using *git2r*.
Next, we will set up our credentials and keep our master branch up to date.
## Setting up your clone
To access the repository, all you have to do is open it by double-clicking on
the "popgenInfo.Rproj" file. This will open Rstudio to this folder and set it as
the working directory. After that, in your R console, type:
```{r repository_show, eval = FALSE}
library("git2r") # in case you're starting from here
repo <- repository()
repo
```
```{r repository_shadow, echo = FALSE}
library("git2r") # in case you're starting from here
repo <- repository(fork_pgi)
repo
```
This tells R that you have a repository in this folder and that you are
referring to it as `repo`. The output of `repo` is a summary of your repository:
- **Local** This is the branch (master) and the path where the copy of your repository is.
- **Remote** This shows you the branch and the url of the remote repository
- **Head** This is a summary of the last *commit* containing the date and a description.
In order to be able to synch your clone with your fork, GitHub needs to know
that you are really who you say you are. This means that you need to associate
your name, email, and a secret token with this repository. The first two items
are easily done within R. You can use the function `config()` to set and confirm
your name and email:
```{r, config}
config(repo,
user.name = "Zhian Kamvar",
user.email = "[email protected]")
config(repo)
```
Next, we need to set our secret token. Known as a [**Personal Access Token
(PAT)**](https://help.github.com/articles/creating-an-access-token-for-command-line-use/),
this is a 40 character cryptographic token that is like a long and complicated
password. They both allow access to your GitHub repository, but the difference
between a PAT and a password is that a PAT can be stored as text on your
computer (which you should NEVER do with passwords) because it is easy to
generate and (most importantly) easy to remove. We can create a PAT from GitHub
and then we will store this in a file called `.Renviron` that is read every time
you restart your R session, allowing you to have this token whenever you need to
push changes to your fork on GitHub. [The instructions to generate your PAT are
here](https://help.github.com/articles/creating-an-access-token-for-command-line-use/).
Here's an example of what a `.Renviron` file with a PAT will look like on the
inside:
```
GITHUB_PAT="97cc6bf86c31a42fca2de32884cd1f1c4b1102ba"
```
Once you have created your PAT, make a text file called `.Renviron` inside of
the repository and paste your PAT in that file as shown above. This will not be
tracked by git, and will only exist on your computer.
```{r renviron_shadow, echo = FALSE}
cat(paste0("GITHUB_PAT=\"¯\\_(ツ)_/¯\""), file = paste0(fork_pgi, "/.Renviron"))
```
The next time you open R from `popgenInfo.Rproj`, this token will be available
for GitHub to ensure that you are who you say you are. It is important to note
that, if you change or delete that PAT, you must replace it here.
In the next step, we will ensure that our fork is up to date with NESCent. In
order to do that, we need to tell our local copy that it can take updates from
NESCent. We do this by adding NESCent as a **remote** called 'upstream' using
the function `remote_add()` and checking to make sure it worked with the
function `remotes()`.
```{r remote_add}
remote_add(repo,
name = "upstream",
url = "https://github.com/NESCent/popgenInfo.git")
# Check our remotes
remotes(repo)
```
With all of this information set, we can begin working on our repository.
# Step 1: Keeping your fork up to date
The first thing we need to do is make sure that our master branch is up to date.
Since other people are also contributing to this project, it is possible for
changes to be made even within seconds of you forking the repository. In this
section we will show you how you can keep your master branch up to date in your
local repository and your fork on github.
Let's assume that you've just opened a fresh Rstudio session with the
`popgenInfo.Rproj` file. First, let's load *git2r* and access our repository.
```{r git2r_show, eval = FALSE}
library("git2r")
repo <- repository()
```
Now that we have our repository loaded, we should make sure that we are on the
**master branch**. We can do this using the `checkout()` function from *git2r*.
```{r checkout_master}
checkout(repo, "master")
repo
```
Now, whether or not we suspect changes from NESCent, it's always good form to
update your local repository before creating any new branches. We can do it this
by first **fetching** ALL changes that have happened on the NESCent popgenInfo
repository, and then **merging** the NESCent master branch into our master
branch. We will use the *git2r* functions `fetch()` and `merge()` to accomplish
this.
First, we fetch all of the changes that have happened. If there are no changes,
nothing will appear, but if there is new content on the master branch or new
branches on the repository, they will be downloaded and each branch will be
printed to your screen. For our purposes, we are only concerned with the changes
on the **master** branch, which will look something like this:
`## [new] b62a78c1fcd384a3ea23 refs/remotes/upstream/master`.
```{r update_master_fetch}
fetch(repo, name = "upstream")
```
It's important to note that our local repository is still unchanged at this
point as we have just downloaded the updates from NESCent. To update our local
repository, we should merge the changes from NESCent's master branch into our
master branch:
```{r update_master_merge}
merge(repo, "upstream/master")
repo
```
If there were any changes, the **Head** field will have a more current commit
than when we intialized it above.
Since we will never work directly on the master branch, there should be no
conflicts from the NESCent branch and we now know that our branch is up to date.
Assuming everything worked, we can push this to our fork on GitHub. We will use
*git2r*'s `push()` function to do this. We also need to supply our PAT in order
for this to work. To do so, we will use the *git2r* function `cred_token()`:
```{r push_repo, eval = FALSE}
cred <- cred_token()
push(repo, credentials = cred)
```
Now, when you check your fork, it should be updated with the most recent
changes. From here, we can checkout a new branch and then proceed with adding
our changes or new workflow.
# Step 2: Creating a New Branch
When you add contributions, the best practice is to create a new branch off of
the master. A **branch** is a term that indicates a sort of sandbox for a
repository that can become permanent. By default, git repositories all have a
"master" branch. When you want to experiment with something else, add new
content, or simply just fix a typo, but you don't want to disturb the original
copy, you create a new branch. Good practice is to name the branch in a way that
succinctly describes what you are doing with the branch. In this step, we will
create a branch, push it to our fork, and track it.
## SETUP
Most importantly we need to make sure that we are on the master branch and it is
up to date. To double plus make sure that you are on the master branch, check it
out:
```{r chchcheckitout}
checkout(repo, "master")
repo
```
For reasons that will reveal themselves later, please run these function
definitions:
```{r fun}
my_branch <- function(x) paste0("refs/heads/", x)
my_origin <- function(x) paste0("origin/", x)
```
And, of course, don't forget your credentials:
```{r cred}
cred <- cred_token()
```
## Creation
Assuming you've updated your master branch as shown above, we can create a
branch from the master using the function `checkout()`. Let's assume that we
want to create a vignette for analyzing spatial statistics. I'm going to lay out
a few steps here all at once because these are the steps you want to take when
you create a brand new branch and make sure it exists on your computer and on
your fork:
1. save the branch name as a variable
2. checkout the branch
3. push the branch to the fork
4. set your local repository to track the fork
In the first step, we want to name the branch. Since we are creating a new
vignette, it's ideal for the branch name to be the same as the vignette. As
shown in the [Best Practices](CONTRIBUTING.html#best-practices) guidelines, you
should name this vignette with the date and the subject. Since we are
contributing a vignette on spatial stats and committing for the first time on
2015-12-16, the new branch and vignette will be called "2015-12-16-spatial-
stats".
In code, the steps above would look like this:
```{r checkout_branch_flow, eval = FALSE}
BRANCH <- "2015-12-16-spatial-stats" # 1
checkout(repo, BRANCH, create = TRUE) # 2
push(repo, name = "origin", refspec = my_branch(BRANCH), credentials = cred) # 3
branch_set_upstream(head(repo), my_origin(BRANCH)) # 4
```
Let's examine these one by one. First, you create a variable that will hold the
name of the branch you want to create. We are using this convention to allow us
to use this name multiple times. After that, we create that branch from our
master branch. We can take a look at what our branch looks like at that point.
```{r checkout_branch, eval = FALSE}
BRANCH <- "2015-12-16-spatial-stats" # 1
checkout(repo, BRANCH, create = TRUE) # 2
repo
```
```{r checkout_branch_shadow, echo = FALSE}
BRANCH <- "2015-12-16-spatial-stats"
# Thu Dec 17 09:57:31 2015 ------------------------------
# I have this branch created on my fork, but if I check it out normally, it will
# automatically track the fork. Instead, I am creating this branch from the
# nescent head so that it appears seamless when the user does this.
branch_create(commits(repo)[[1]], BRANCH)
checkout(repo, BRANCH, create = TRUE)
repo
branch_set_upstream(head(repo), my_origin(BRANCH))
```
Notice here that you now only have two lines in the output, **Local** and
**Head**. This is because the branch doesn't exist on your GitHub fork. This is
where the next two steps come in. We will push that branch and then make sure
that our branch is tracking the copy on the fork.
```{r push_branch, eval = FALSE}
push(repo, name = "origin", refspec = my_branch(BRANCH), credentials = cred) # 3
branch_set_upstream(head(repo), my_origin(BRANCH)) # 4
repo
```
```{r push_branch_shadow, echo = FALSE}
repo
```
We can now see from the output that we have a **Remote** set up. Once we have
this, we can start making changes! If you are adding a new vignette, please copy
[`TEMPLATE.Rmd`](https://github.com/nescent/popgenInfo/tree/master/TEMPLATE.Rmd)
to the **use/** directory and give it a new name that is meaningful to your
contribution (good practice is to name it the same as your branch). If you are
including any data, place it in the **data/** directory. Once you add these
files, you need to to **add** these files and then you should **commit**, which
we will show you how to do below.
# Step 3: Adding content, committing, and pushing
If you are using Rstudio, you can use it to integrate with git. This allows you
to do things like commit, push, and pull. Hadley Wickham, chief scientist at
Rstudio has put up [this helpful tutorial on using git through
Rstudio](http://r-pkgs.had.co.nz/git.html#git-status). If you aren't using
Rstudio, but you don't want to access the terminal (command line), this section
will show you basic git commands with *git2r* that will allow you to work on
your vignette and keep it up to date.
Recall that using git is analogous to keeping a good, detailed lab notebook.
When you make any changes, you record those changes (`git add`) and then say
*why* you made the changes (`git commit`).
Let's say you've already copied the `TEMPLATE.Rmd` file to the **use/**
directory and renamed it to `2015-12-16-spatial-stats.Rmd` and added some data
called `spatial_data.csv`. When you place a file in your repository, git will
not pay attention to any changes you do until you specifically add it. Until
then it is "Untracked". You can see this by running `status()`:
```{r add_stuff_shadow, echo = FALSE}
x <- file.copy(paste0(fork_pgi, "/TEMPLATE.Rmd"),
paste0(fork_pgi, "/use/2015-12-16-spatial-stats.Rmd"))
x <- file.create(paste0(fork_pgi, "/data/spatial_data.csv"))
set.seed(999)
m <- matrix(rnorm(20), nrow = 5, ncol = 2, dimnames = list(NULL, c("x", "y")))
write.table(m, file = paste0(fork_pgi, "/data/spatial_data.csv"), sep = ",",
row.names = FALSE, col.names = TRUE)
```
```{r status}
status(repo)
```
Since we've just added the files, we can use `add()` to tell git we want to
stage them for committing.
```{r add}
add(repo, "*") # adding all new or changed files
status(repo)
```
When we have done this, we can commit to our changes by using `commit()`. A
commit is basically placing a record of what you did and, importantly, why you
did it. Your commit message should record this. Commit messages should be able
to fit on a single line. Optionally, if you want to be able to add more detail,
you can enter more lines below the message:
```{r commit}
msg <- " started spatial stats vignette and added data
I copied the template to the use folder and placed
spatial data in the data folder.
"
commit(repo, msg, session = TRUE)
status(repo) # should return nothing
repo
```
> Note that we put `session = TRUE` in the commit function. This automatically
puts your R session information in the commit, which gives more information
about package versions you were working with when you were editing the vignette.
You can see a summary of what you just did by looking at the summary of your
commit:
```{r summarycommit}
summary(commits(repo)[[1]])
```
Now you can see that your commit message was recorded in the repository. Since
we see that there is a **Remote** associated with this fork, we can simply use
`push()` to push our changes up to the fork.
```{r pushit, eval = FALSE}
push(repo, credentials = cred_token())
```
# Step 4: Pull Request and Peer Review
Now that you've successfully set up git, created a new branch, and created a new
vignette, it's time to fill it with content. You can work in your vignette like
you would write in any other Rmarkdown document and keep track of your changes
like we showed above by **adding** and **committing** your changes followed by
**pushing** these up to your fork. Once you are finished with all of your
changes, you can create a pull request from your GitHub fork to NESCent.
## Creating a pull request
### Through R
You can tell R to open a browser for you to create your pull request. The URL
for this is in the form of
`https://github.com/USER/popgenInfo/pull/new/BRANCH`. We can create this
URL by using the function `paste()` to glue the pieces together and then use the
`utils::browseURL()` function to open up a window. Note we are using the
variable called `BRANCH` that we named above.
```{r make_pull}
(my_account <- dirname(remote_url(repo, "origin")))
(pull_request_url <- paste(my_account, "popgenInfo/pull/new", BRANCH, sep = "/"))
```
Now that you have the URL, you can browse to it.
```{r browse_pull, eval = FALSE}
utils::browseURL(pull_request_url)
```
### Manually
If you don't want to do this from R, you can do it through your web browser.
1. Navigate to your fork on GitHub
2. Click on the dropdown menu on the left that says **"Branch: master"** and
switch to the branch you want to create the pull request from.
3. Click on the green button that says **"New Pull Request"**.
## Peer Review
Your pull request begins a process of open peer review where content, accuracy,
and style are considered. If changes are suggested, you should revise them by
making changes on your fork and repeating the process in **Step 3**. Your pull
request will be *automatically updated* once you push the changes to your fork.
Finally, when all questions and concerns have been addressed, the pull request
may be merged into the NESCent repository as long as two of the maintainers
have signaled their approval.
# Conclusions
From this tutorial, you should have learned how to:
- fork a repository on GitHub
- clone that fork
- set up credentials for *git2r*
- update your master branch from NESCent
- create a new branch
- add and commit changes
- push changes
- create a pull request
The skills demonstrated in this tutorial are not exclusive for **popgenInfo**,
but they can be used when writing up your own reports or analyses. Using git may
not feel comfortable at this moment, but just like it takes practice to learn
how to keep a lab notebook, with practice, you will become comfortable using git
to keep track of your workflows, making your science more reproducible.
# What's Next?
Once your pull request passes peer review and is published, you can update your
fork by starting from **Step 1**: checkout your master branch, fetch the changes
from NESCent, merge those changes, and push them up to the master branch on your
fork to make it even with NESCent.
# Contributors
- Zhian N. Kamvar (Author)
- Hilmar Lapp (Reviewer)
- Margarita M. Lopéz (Reviewer)
- Stéphanie Manel (Reviewer)
# Session Information
This shows us useful information for reproducibility. Of particular importance
are the versions of R and the packages used to create this workflow. It is
considered good practice to record this information with every analysis.
```{r, sessioninfo}
options(width = 100)
devtools::session_info()
```