cleaned up strt-seq

Teichlab · Mar 7, 2024 · 0dd9952 · 0dd9952
1 parent b3e7d3f
commit 0dd9952
Show file tree

Hide file tree

Showing 11 changed files with 16 additions and 16 deletions.
diff --git a/.../41592_2014_BFnmeth2772_MOESM268_ESM.xlsx → .../41592_2014_BFnmeth2772_MOESM268_ESM.xlsx b/.../41592_2014_BFnmeth2772_MOESM268_ESM.xlsx → .../41592_2014_BFnmeth2772_MOESM268_ESM.xlsx
diff --git a/data/STRT-seq_C1_bc.csv → data/STRT-seq_family/STRT-seq_C1_bc.csv b/data/STRT-seq_C1_bc.csv → data/STRT-seq_family/STRT-seq_C1_bc.csv
diff --git a/data/STRT_GenomeRes_2011_SI.pdf → ...TRT-seq_family/STRT_GenomeRes_2011_SI.pdf b/data/STRT_GenomeRes_2011_SI.pdf → ...TRT-seq_family/STRT_GenomeRes_2011_SI.pdf
diff --git a/data/STRT_bc.fa → data/STRT-seq_family/STRT_bc.fa b/data/STRT_bc.fa → data/STRT-seq_family/STRT_bc.fa
diff --git a/data/filereport_read_run_PRJNA140307.tsv → ...amily/filereport_read_run_PRJNA140307.tsv b/data/filereport_read_run_PRJNA140307.tsv → ...amily/filereport_read_run_PRJNA140307.tsv
diff --git a/data/filereport_read_run_PRJNA203208.tsv → ...amily/filereport_read_run_PRJNA203208.tsv b/data/filereport_read_run_PRJNA203208.tsv → ...amily/filereport_read_run_PRJNA203208.tsv
diff --git a/data/tn5_strt_homodimer.svg → data/STRT-seq_family/tn5_strt_homodimer.svg b/data/tn5_strt_homodimer.svg → data/STRT-seq_family/tn5_strt_homodimer.svg
diff --git a/data/tn5_strt_seq_2i.svg → data/STRT-seq_family/tn5_strt_seq_2i.svg b/data/tn5_strt_seq_2i.svg → data/STRT-seq_family/tn5_strt_seq_2i.svg
diff --git a/docs/source/ge/STRT-seq.md b/docs/source/ge/STRT-seq.md
@@ -193,17 +193,17 @@ where the authors developed those methods for the first time.
 
 ### The Original Version
 
-The raw data for the original version can be found from [__the PRJNA140307 ENA page__](https://www.ebi.ac.uk/ena/browser/view/PRJNA140307?show=reads). I have prepared the read information, and you can [__download here__](https://teichlab.github.io/scg_lib_structs/data/filereport_read_run_PRJNA140307.tsv). The authors already demultiplexed for us. They were using single-end sequencing mode, so there is one file per cell. To mimic what we get directly from the machine, we could merge all of them into one file.
+The raw data for the original version can be found from [__the PRJNA140307 ENA page__](https://www.ebi.ac.uk/ena/browser/view/PRJNA140307?show=reads). I have prepared the read information, and you can [__download here__](https://teichlab.github.io/scg_lib_structs/data/STRT-seq_family/filereport_read_run_PRJNA140307.tsv). The authors already demultiplexed for us. They were using single-end sequencing mode, so there is one file per cell. To mimic what we get directly from the machine, we could merge all of them into one file.
 
 ```bash
 # get individual fastq files and merge into one file
 mkdir -p strt-seq/data
-wget -P strt-seq/data https://teichlab.github.io/scg_lib_structs/data/filereport_read_run_PRJNA140307.tsv
+wget -P strt-seq/data https://teichlab.github.io/scg_lib_structs/data/STRT-seq_family/filereport_read_run_PRJNA140307.tsv
 wget -i <(cut -f 8 strt-seq/data/filereport_read_run_PRJNA140307.tsv | tail -n +2 | awk '{print "ftp://" $0}') \
      -O /dev/stdout >> trt-seq/data/STRT-seq.fastq.gz
 ```
 
-Now we need to demultiplex the `fastq` file into individual files based on the first 6 bp. In this way, each cell has one file. Here, we use `cutadapt`. The cell barcode information can be found in this [__Supplementary Information__](https://teichlab.github.io/scg_lib_structs/data/STRT_GenomeRes_2011_SI.pdf) from the Genome Res. paper. We need the barcode in `fasta` format:
+Now we need to demultiplex the `fastq` file into individual files based on the first 6 bp. In this way, each cell has one file. Here, we use `cutadapt`. The cell barcode information can be found in this [__Supplementary Information__](https://teichlab.github.io/scg_lib_structs/data/STRT-seq_family/STRT_GenomeRes_2011_SI.pdf) from the Genome Res. paper. We need the barcode in `fasta` format:
 
 ```
 >bc01
@@ -219,10 +219,10 @@ TTGGAC
 . . .
 ```
 
-I have already prepared the `fasta` file and you can [__download from here__](https://teichlab.github.io/scg_lib_structs/data/STRT_bc.fa), and pass the `fasta` to `cutadapt`:
+I have already prepared the `fasta` file and you can [__download from here__](https://teichlab.github.io/scg_lib_structs/data/STRT-seq_family/STRT_bc.fa), and pass the `fasta` to `cutadapt`:
 
 ```console
-wget -P strt-seq/data https://teichlab.github.io/scg_lib_structs/data/STRT_bc.fa
+wget -P strt-seq/data https://teichlab.github.io/scg_lib_structs/data/STRT-seq_family/STRT_bc.fa
 cutadapt -j 4 -g ^file:strt-seq/data/STRT_bc.fa \
          --no-indels \
          -o "strt-seq/data/demul-{name}.fastq.gz" \
@@ -233,12 +233,12 @@ It should finish without any problem, and we should have 97 more files under `st
 
 ### The C1 Version
 
-The raw data for the C1 version can be found from [__the PRJNA203208 ENA page__](https://www.ebi.ac.uk/ena/browser/view/PRJNA203208?show=reads). I have prepared the read information as a TSV file including the barcode as the last column, and you can [__download here__](https://teichlab.github.io/scg_lib_structs/data/filereport_read_run_PRJNA203208.tsv). Again, the authors already demultiplexed for us. They were using single-end sequencing mode, so there is one file per cell.
+The raw data for the C1 version can be found from [__the PRJNA203208 ENA page__](https://www.ebi.ac.uk/ena/browser/view/PRJNA203208?show=reads). I have prepared the read information as a TSV file including the barcode as the last column, and you can [__download here__](https://teichlab.github.io/scg_lib_structs/data/STRT-seq_family/filereport_read_run_PRJNA203208.tsv). Again, the authors already demultiplexed for us. They were using single-end sequencing mode, so there is one file per cell.
 
 ```bash
 mkdir -p strt-seq-c1/data
 wget -P strt-seq-c1/data \
-    https://teichlab.github.io/scg_lib_structs/data/filereport_read_run_PRJNA203208.tsv
+    https://teichlab.github.io/scg_lib_structs/data/STRT-seq_family/filereport_read_run_PRJNA203208.tsv
 
 # there are two types of libraries
 # the one with the string "single" in the cell name is the regular one
@@ -283,7 +283,7 @@ done > islam2011_manifest.tsv
 
 ### The C1 Version
 
-In this version, cDNA from individual cells are tagmented by barcoded Tn5 separately. The Tn5 barcode serves as the cell barcode. You can find the full sequence from the [__Supplementary Table 2__](https://teichlab.github.io/scg_lib_structs/data/41592_2014_BFnmeth2772_MOESM268_ESM.xlsx) from the Isalm2014 paper in Nature Methods. There are 96 different 8-bp Tn5 barcodes:
+In this version, cDNA from individual cells are tagmented by barcoded Tn5 separately. The Tn5 barcode serves as the cell barcode. You can find the full sequence from the [__Supplementary Table 2__](https://teichlab.github.io/scg_lib_structs/data/STRT-seq_family/41592_2014_BFnmeth2772_MOESM268_ESM.xlsx) from the Isalm2014 paper in Nature Methods. There are 96 different 8-bp Tn5 barcodes:
 
 | Name      | Sequence | Reverse complement |
 |-----------|----------|--------------------|
@@ -386,13 +386,13 @@ In this version, cDNA from individual cells are tagmented by barcoded Tn5 separa
 
 I have prepared the full tables in `csv` format for you to download:
 
-[STRT-seq_C1_bc.csv](https://teichlab.github.io/scg_lib_structs/data/STRT-seq_C1_bc.csv)  
+[STRT-seq_C1_bc.csv](https://teichlab.github.io/scg_lib_structs/data/STRT-seq_family/STRT-seq_C1_bc.csv)  
 
 If we check carefully about the oligo orientation in the [__STRT-seq C1 GitHub page__](https://teichlab.github.io/scg_lib_structs/methods_html/STRT-seq_family.html#STRT-seq-C1), we can see that the Tn5 barcodes are sequenced using the bottom strand as the template. Therefore, the barcode reads are actually reverse complement to the primer sequence. We should use the reverse complement as the whitelist:
 
 ```console
 wget -P strt-seq-c1/data \
-    https://teichlab.github.io/scg_lib_structs/data/STRT-seq_C1_bc.csv
+    https://teichlab.github.io/scg_lib_structs/data/STRT-seq_family/STRT-seq_C1_bc.csv
 
 tail -n +2 strt-seq-c1/data/STRT-seq_C1_bc.csv | \
     cut -f 3 -d, > strt-seq-c1/data/whitelist.txt

diff --git a/methods_html/STRT-seq_family.html b/methods_html/STRT-seq_family.html
@@ -14,7 +14,7 @@ <h1><a href="#STRT-seq" target="_self">STRT-seq</a>
 <br>
 
 <h1><a href="http://genome.cshlp.org/content/21/7/1160.full" target="_blank" name="STRT-seq"><span style="color:red">STRT-seq</span></a></h1>
-<p><span style="font-size:1.1em">STRT-seq was originally pulished in Genome Res. 21: 1160-1167 (2011). One year after that, the authors published the detailed protocol in Nature Protocols 7, 813–828 (2012), which is only slightly different from the orginal paper. The workflow shown here is based on the protocol in Nature Protocols 7, 813–828 (2012).</span></p>
+<p><info>STRT-seq was originally pulished in Genome Res. 21: 1160-1167 (2011). One year after that, the authors published the detailed protocol in Nature Protocols 7, 813-828 (2012), which is only slightly different from the orginal paper. The workflow shown here is based on the protocol in Nature Protocols 7, 813-828 (2012).</info></p>
 
 <h2>Adapter and primer sequences:</h2>
 <seq>
@@ -224,7 +224,7 @@ <h3>(4) cDNA amplification using single C1-P1-PCR-2 primer:<a href="http://www.n
 </pre>
 
 <h3>(5) Homemade Tn5 homodimers using C1-Tn5_top/bottom tagmentation on amplified cDNA (will create 9-bp gap):</h3>
-<img src="../data/tn5_strt_homodimer.svg" alt="Tn5 strt homodimer" style="width:800px;height:450px;">
+<img src="../data/STRT-seq_family/tn5_strt_homodimer.svg" alt="Tn5 strt homodimer" style="width:800px;height:450px;">
 <pre>
 <seq>
 <i>Product 1 (5'-end of cDNA):</i>
@@ -298,7 +298,7 @@ <h3>(2) Add index sequencing primer to sequence cell barcodes (bottom strand as
 <br>
 
 <h1><a href="http://www.biorxiv.org/content/early/2017/04/20/126268" target="_blank" name="STRT-seq-2i"><span style="color:red;">STRT-seq-2i</span></a></h1>
-<p><span style="font-size:1.1em">This method is similar to STRT-seq-C1, but with nanowell capture and different oigo design, and there probably a mistake in the sequence of DI-Read1-Seq at this time of the preprint (29-08-2017), where there should be only five Ns in DI-Read1-Seq. There are currently six Ns in it.</span></p>
+<p><info>This method is similar to STRT-seq-C1, but with nanowell capture and different oigo design, and there probably a mistake in the sequence of DI-Read1-Seq at this time of the preprint (29-08-2017), where there should be only five Ns in DI-Read1-Seq. There are currently six Ns in it.</info></p>
 
 <br>
 
@@ -368,7 +368,7 @@ <h3>(5) Amplified double stranded and indexed cDNA looks like this:</h3>
 </pre>
 
 <h3>(6) Homemade Tn5 homodimers using annealed Barcoded STRT-Tn5-Idx[1-96] top/bottom tagmentation on amplified cDNA (will create 9-bp gap):</h3>
-<img src="../data/tn5_strt_seq_2i.svg" alt="Tn5 strt seq 2i" style="width:800px;height:450px;">
+<img src="../data/STRT-seq_family/tn5_strt_seq_2i.svg" alt="Tn5 strt seq 2i" style="width:800px;height:450px;">
 <pre>
 <align class="long">
 <i>Product 1 (5'-end of cDNA):</i>

diff --git a/methods_html/sci-RNA-seq_family.html b/methods_html/sci-RNA-seq_family.html
@@ -13,7 +13,7 @@ <h1><a href="#sci-RNA-seq" target="_self">sci-RNA-seq</a>
 
 <br>
 
-<h1><a href="http://science.sciencemag.org/content/357/6352/661" name="sci-RNA-seq" target="_blank">sci-RNA-seq</a></h1>
+<h1><a href="http://science.sciencemag.org/content/357/6352/661" name="sci-RNA-seq" target="_blank"><span style="color:red">sci-RNA-seq</span></a></h1>
 
 <p><info>The sci-RNA-seq uses the combinatorial indexing to identify single cells without single cell isolation. Two-level indexing (RT barcode + PCR barcodes (i5 + i7)) or three-level indexing (RT barcode + PCR barcodes (i5 + i7) + Tn5 barcodes) can be used. Three-level indexing is a bit more difficult since you need to assemble many indexed Tn5 transposomes. Here, two-level indexing strategy is demonstrated.</info></p>
 
@@ -158,7 +158,7 @@ <h3>(4) Add Read 2 sequencing primer to sequence the second read (top strand as
 
 <br>
 
-<h1><a href="https://www.nature.com/articles/s41586-019-0969-x" name="sci-RNA-seq3" target="_blank">sci-RNA-seq3</a></h1>
+<h1><a href="https://www.nature.com/articles/s41586-019-0969-x" name="sci-RNA-seq3" target="_blank"><span style="color:red">sci-RNA-seq3</span></a></h1>
 
 <p><info>The sci-RNA-seq3 is an updated version of <a href="#sci-RNA-seq" target="_self">sci-RNA-seq</a>. The major improvements are:</info></p>
 <p><info>(1) nuclei are extracted directly from fresh tissues without enzymatic treatment;</info></p>