Skip to content

Commit

Permalink
cleaned up strt-seq
Browse files Browse the repository at this point in the history
  • Loading branch information
dbrg77 committed Mar 7, 2024
1 parent b3e7d3f commit 0dd9952
Show file tree
Hide file tree
Showing 11 changed files with 16 additions and 16 deletions.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes
File renamed without changes
20 changes: 10 additions & 10 deletions docs/source/ge/STRT-seq.md
Original file line number Diff line number Diff line change
Expand Up @@ -193,17 +193,17 @@ where the authors developed those methods for the first time.

### The Original Version

The raw data for the original version can be found from [__the PRJNA140307 ENA page__](https://www.ebi.ac.uk/ena/browser/view/PRJNA140307?show=reads). I have prepared the read information, and you can [__download here__](https://teichlab.github.io/scg_lib_structs/data/filereport_read_run_PRJNA140307.tsv). The authors already demultiplexed for us. They were using single-end sequencing mode, so there is one file per cell. To mimic what we get directly from the machine, we could merge all of them into one file.
The raw data for the original version can be found from [__the PRJNA140307 ENA page__](https://www.ebi.ac.uk/ena/browser/view/PRJNA140307?show=reads). I have prepared the read information, and you can [__download here__](https://teichlab.github.io/scg_lib_structs/data/STRT-seq_family/filereport_read_run_PRJNA140307.tsv). The authors already demultiplexed for us. They were using single-end sequencing mode, so there is one file per cell. To mimic what we get directly from the machine, we could merge all of them into one file.

```bash
# get individual fastq files and merge into one file
mkdir -p strt-seq/data
wget -P strt-seq/data https://teichlab.github.io/scg_lib_structs/data/filereport_read_run_PRJNA140307.tsv
wget -P strt-seq/data https://teichlab.github.io/scg_lib_structs/data/STRT-seq_family/filereport_read_run_PRJNA140307.tsv
wget -i <(cut -f 8 strt-seq/data/filereport_read_run_PRJNA140307.tsv | tail -n +2 | awk '{print "ftp://" $0}') \
-O /dev/stdout >> trt-seq/data/STRT-seq.fastq.gz
```

Now we need to demultiplex the `fastq` file into individual files based on the first 6 bp. In this way, each cell has one file. Here, we use `cutadapt`. The cell barcode information can be found in this [__Supplementary Information__](https://teichlab.github.io/scg_lib_structs/data/STRT_GenomeRes_2011_SI.pdf) from the Genome Res. paper. We need the barcode in `fasta` format:
Now we need to demultiplex the `fastq` file into individual files based on the first 6 bp. In this way, each cell has one file. Here, we use `cutadapt`. The cell barcode information can be found in this [__Supplementary Information__](https://teichlab.github.io/scg_lib_structs/data/STRT-seq_family/STRT_GenomeRes_2011_SI.pdf) from the Genome Res. paper. We need the barcode in `fasta` format:

```
>bc01
Expand All @@ -219,10 +219,10 @@ TTGGAC
. . .
```

I have already prepared the `fasta` file and you can [__download from here__](https://teichlab.github.io/scg_lib_structs/data/STRT_bc.fa), and pass the `fasta` to `cutadapt`:
I have already prepared the `fasta` file and you can [__download from here__](https://teichlab.github.io/scg_lib_structs/data/STRT-seq_family/STRT_bc.fa), and pass the `fasta` to `cutadapt`:

```console
wget -P strt-seq/data https://teichlab.github.io/scg_lib_structs/data/STRT_bc.fa
wget -P strt-seq/data https://teichlab.github.io/scg_lib_structs/data/STRT-seq_family/STRT_bc.fa
cutadapt -j 4 -g ^file:strt-seq/data/STRT_bc.fa \
--no-indels \
-o "strt-seq/data/demul-{name}.fastq.gz" \
Expand All @@ -233,12 +233,12 @@ It should finish without any problem, and we should have 97 more files under `st

### The C1 Version

The raw data for the C1 version can be found from [__the PRJNA203208 ENA page__](https://www.ebi.ac.uk/ena/browser/view/PRJNA203208?show=reads). I have prepared the read information as a TSV file including the barcode as the last column, and you can [__download here__](https://teichlab.github.io/scg_lib_structs/data/filereport_read_run_PRJNA203208.tsv). Again, the authors already demultiplexed for us. They were using single-end sequencing mode, so there is one file per cell.
The raw data for the C1 version can be found from [__the PRJNA203208 ENA page__](https://www.ebi.ac.uk/ena/browser/view/PRJNA203208?show=reads). I have prepared the read information as a TSV file including the barcode as the last column, and you can [__download here__](https://teichlab.github.io/scg_lib_structs/data/STRT-seq_family/filereport_read_run_PRJNA203208.tsv). Again, the authors already demultiplexed for us. They were using single-end sequencing mode, so there is one file per cell.

```bash
mkdir -p strt-seq-c1/data
wget -P strt-seq-c1/data \
https://teichlab.github.io/scg_lib_structs/data/filereport_read_run_PRJNA203208.tsv
https://teichlab.github.io/scg_lib_structs/data/STRT-seq_family/filereport_read_run_PRJNA203208.tsv

# there are two types of libraries
# the one with the string "single" in the cell name is the regular one
Expand Down Expand Up @@ -283,7 +283,7 @@ done > islam2011_manifest.tsv

### The C1 Version

In this version, cDNA from individual cells are tagmented by barcoded Tn5 separately. The Tn5 barcode serves as the cell barcode. You can find the full sequence from the [__Supplementary Table 2__](https://teichlab.github.io/scg_lib_structs/data/41592_2014_BFnmeth2772_MOESM268_ESM.xlsx) from the Isalm2014 paper in Nature Methods. There are 96 different 8-bp Tn5 barcodes:
In this version, cDNA from individual cells are tagmented by barcoded Tn5 separately. The Tn5 barcode serves as the cell barcode. You can find the full sequence from the [__Supplementary Table 2__](https://teichlab.github.io/scg_lib_structs/data/STRT-seq_family/41592_2014_BFnmeth2772_MOESM268_ESM.xlsx) from the Isalm2014 paper in Nature Methods. There are 96 different 8-bp Tn5 barcodes:

| Name | Sequence | Reverse complement |
|-----------|----------|--------------------|
Expand Down Expand Up @@ -386,13 +386,13 @@ In this version, cDNA from individual cells are tagmented by barcoded Tn5 separa

I have prepared the full tables in `csv` format for you to download:

[STRT-seq_C1_bc.csv](https://teichlab.github.io/scg_lib_structs/data/STRT-seq_C1_bc.csv)
[STRT-seq_C1_bc.csv](https://teichlab.github.io/scg_lib_structs/data/STRT-seq_family/STRT-seq_C1_bc.csv)

If we check carefully about the oligo orientation in the [__STRT-seq C1 GitHub page__](https://teichlab.github.io/scg_lib_structs/methods_html/STRT-seq_family.html#STRT-seq-C1), we can see that the Tn5 barcodes are sequenced using the bottom strand as the template. Therefore, the barcode reads are actually reverse complement to the primer sequence. We should use the reverse complement as the whitelist:

```console
wget -P strt-seq-c1/data \
https://teichlab.github.io/scg_lib_structs/data/STRT-seq_C1_bc.csv
https://teichlab.github.io/scg_lib_structs/data/STRT-seq_family/STRT-seq_C1_bc.csv

tail -n +2 strt-seq-c1/data/STRT-seq_C1_bc.csv | \
cut -f 3 -d, > strt-seq-c1/data/whitelist.txt
Expand Down
8 changes: 4 additions & 4 deletions methods_html/STRT-seq_family.html
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ <h1><a href="#STRT-seq" target="_self">STRT-seq</a>
<br>

<h1><a href="http://genome.cshlp.org/content/21/7/1160.full" target="_blank" name="STRT-seq"><span style="color:red">STRT-seq</span></a></h1>
<p><span style="font-size:1.1em">STRT-seq was originally pulished in Genome Res. 21: 1160-1167 (2011). One year after that, the authors published the detailed protocol in Nature Protocols 7, 813828 (2012), which is only slightly different from the orginal paper. The workflow shown here is based on the protocol in Nature Protocols 7, 813828 (2012).</span></p>
<p><info>STRT-seq was originally pulished in Genome Res. 21: 1160-1167 (2011). One year after that, the authors published the detailed protocol in Nature Protocols 7, 813-828 (2012), which is only slightly different from the orginal paper. The workflow shown here is based on the protocol in Nature Protocols 7, 813-828 (2012).</info></p>

<h2>Adapter and primer sequences:</h2>
<seq>
Expand Down Expand Up @@ -224,7 +224,7 @@ <h3>(4) cDNA amplification using single C1-P1-PCR-2 primer:<a href="http://www.n
</pre>

<h3>(5) Homemade Tn5 homodimers using C1-Tn5_top/bottom tagmentation on amplified cDNA (will create 9-bp gap):</h3>
<img src="../data/tn5_strt_homodimer.svg" alt="Tn5 strt homodimer" style="width:800px;height:450px;">
<img src="../data/STRT-seq_family/tn5_strt_homodimer.svg" alt="Tn5 strt homodimer" style="width:800px;height:450px;">
<pre>
<seq>
<i>Product 1 (5'-end of cDNA):</i>
Expand Down Expand Up @@ -298,7 +298,7 @@ <h3>(2) Add index sequencing primer to sequence cell barcodes (bottom strand as
<br>

<h1><a href="http://www.biorxiv.org/content/early/2017/04/20/126268" target="_blank" name="STRT-seq-2i"><span style="color:red;">STRT-seq-2i</span></a></h1>
<p><span style="font-size:1.1em">This method is similar to STRT-seq-C1, but with nanowell capture and different oigo design, and there probably a mistake in the sequence of DI-Read1-Seq at this time of the preprint (29-08-2017), where there should be only five Ns in DI-Read1-Seq. There are currently six Ns in it.</span></p>
<p><info>This method is similar to STRT-seq-C1, but with nanowell capture and different oigo design, and there probably a mistake in the sequence of DI-Read1-Seq at this time of the preprint (29-08-2017), where there should be only five Ns in DI-Read1-Seq. There are currently six Ns in it.</info></p>

<br>

Expand Down Expand Up @@ -368,7 +368,7 @@ <h3>(5) Amplified double stranded and indexed cDNA looks like this:</h3>
</pre>

<h3>(6) Homemade Tn5 homodimers using annealed Barcoded STRT-Tn5-Idx[1-96] top/bottom tagmentation on amplified cDNA (will create 9-bp gap):</h3>
<img src="../data/tn5_strt_seq_2i.svg" alt="Tn5 strt seq 2i" style="width:800px;height:450px;">
<img src="../data/STRT-seq_family/tn5_strt_seq_2i.svg" alt="Tn5 strt seq 2i" style="width:800px;height:450px;">
<pre>
<align class="long">
<i>Product 1 (5'-end of cDNA):</i>
Expand Down
4 changes: 2 additions & 2 deletions methods_html/sci-RNA-seq_family.html
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ <h1><a href="#sci-RNA-seq" target="_self">sci-RNA-seq</a>

<br>

<h1><a href="http://science.sciencemag.org/content/357/6352/661" name="sci-RNA-seq" target="_blank">sci-RNA-seq</a></h1>
<h1><a href="http://science.sciencemag.org/content/357/6352/661" name="sci-RNA-seq" target="_blank"><span style="color:red">sci-RNA-seq</span></a></h1>

<p><info>The sci-RNA-seq uses the combinatorial indexing to identify single cells without single cell isolation. Two-level indexing (RT barcode + PCR barcodes (i5 + i7)) or three-level indexing (RT barcode + PCR barcodes (i5 + i7) + Tn5 barcodes) can be used. Three-level indexing is a bit more difficult since you need to assemble many indexed Tn5 transposomes. Here, two-level indexing strategy is demonstrated.</info></p>

Expand Down Expand Up @@ -158,7 +158,7 @@ <h3>(4) Add Read 2 sequencing primer to sequence the second read (top strand as

<br>

<h1><a href="https://www.nature.com/articles/s41586-019-0969-x" name="sci-RNA-seq3" target="_blank">sci-RNA-seq3</a></h1>
<h1><a href="https://www.nature.com/articles/s41586-019-0969-x" name="sci-RNA-seq3" target="_blank"><span style="color:red">sci-RNA-seq3</span></a></h1>

<p><info>The sci-RNA-seq3 is an updated version of <a href="#sci-RNA-seq" target="_self">sci-RNA-seq</a>. The major improvements are:</info></p>
<p><info>(1) nuclei are extracted directly from fresh tissues without enzymatic treatment;</info></p>
Expand Down

0 comments on commit 0dd9952

Please sign in to comment.