diff --git a/data/41592_2014_BFnmeth2772_MOESM268_ESM.xlsx b/data/STRT-seq_family/41592_2014_BFnmeth2772_MOESM268_ESM.xlsx similarity index 100% rename from data/41592_2014_BFnmeth2772_MOESM268_ESM.xlsx rename to data/STRT-seq_family/41592_2014_BFnmeth2772_MOESM268_ESM.xlsx diff --git a/data/STRT-seq_C1_bc.csv b/data/STRT-seq_family/STRT-seq_C1_bc.csv similarity index 100% rename from data/STRT-seq_C1_bc.csv rename to data/STRT-seq_family/STRT-seq_C1_bc.csv diff --git a/data/STRT_GenomeRes_2011_SI.pdf b/data/STRT-seq_family/STRT_GenomeRes_2011_SI.pdf similarity index 100% rename from data/STRT_GenomeRes_2011_SI.pdf rename to data/STRT-seq_family/STRT_GenomeRes_2011_SI.pdf diff --git a/data/STRT_bc.fa b/data/STRT-seq_family/STRT_bc.fa similarity index 100% rename from data/STRT_bc.fa rename to data/STRT-seq_family/STRT_bc.fa diff --git a/data/filereport_read_run_PRJNA140307.tsv b/data/STRT-seq_family/filereport_read_run_PRJNA140307.tsv similarity index 100% rename from data/filereport_read_run_PRJNA140307.tsv rename to data/STRT-seq_family/filereport_read_run_PRJNA140307.tsv diff --git a/data/filereport_read_run_PRJNA203208.tsv b/data/STRT-seq_family/filereport_read_run_PRJNA203208.tsv similarity index 100% rename from data/filereport_read_run_PRJNA203208.tsv rename to data/STRT-seq_family/filereport_read_run_PRJNA203208.tsv diff --git a/data/tn5_strt_homodimer.svg b/data/STRT-seq_family/tn5_strt_homodimer.svg similarity index 100% rename from data/tn5_strt_homodimer.svg rename to data/STRT-seq_family/tn5_strt_homodimer.svg diff --git a/data/tn5_strt_seq_2i.svg b/data/STRT-seq_family/tn5_strt_seq_2i.svg similarity index 100% rename from data/tn5_strt_seq_2i.svg rename to data/STRT-seq_family/tn5_strt_seq_2i.svg diff --git a/docs/source/ge/STRT-seq.md b/docs/source/ge/STRT-seq.md index f1549e6..df0b129 100644 --- a/docs/source/ge/STRT-seq.md +++ b/docs/source/ge/STRT-seq.md @@ -193,17 +193,17 @@ where the authors developed those methods for the first time. ### The Original Version -The raw data for the original version can be found from [__the PRJNA140307 ENA page__](https://www.ebi.ac.uk/ena/browser/view/PRJNA140307?show=reads). I have prepared the read information, and you can [__download here__](https://teichlab.github.io/scg_lib_structs/data/filereport_read_run_PRJNA140307.tsv). The authors already demultiplexed for us. They were using single-end sequencing mode, so there is one file per cell. To mimic what we get directly from the machine, we could merge all of them into one file. +The raw data for the original version can be found from [__the PRJNA140307 ENA page__](https://www.ebi.ac.uk/ena/browser/view/PRJNA140307?show=reads). I have prepared the read information, and you can [__download here__](https://teichlab.github.io/scg_lib_structs/data/STRT-seq_family/filereport_read_run_PRJNA140307.tsv). The authors already demultiplexed for us. They were using single-end sequencing mode, so there is one file per cell. To mimic what we get directly from the machine, we could merge all of them into one file. ```bash # get individual fastq files and merge into one file mkdir -p strt-seq/data -wget -P strt-seq/data https://teichlab.github.io/scg_lib_structs/data/filereport_read_run_PRJNA140307.tsv +wget -P strt-seq/data https://teichlab.github.io/scg_lib_structs/data/STRT-seq_family/filereport_read_run_PRJNA140307.tsv wget -i <(cut -f 8 strt-seq/data/filereport_read_run_PRJNA140307.tsv | tail -n +2 | awk '{print "ftp://" $0}') \ -O /dev/stdout >> trt-seq/data/STRT-seq.fastq.gz ``` -Now we need to demultiplex the `fastq` file into individual files based on the first 6 bp. In this way, each cell has one file. Here, we use `cutadapt`. The cell barcode information can be found in this [__Supplementary Information__](https://teichlab.github.io/scg_lib_structs/data/STRT_GenomeRes_2011_SI.pdf) from the Genome Res. paper. We need the barcode in `fasta` format: +Now we need to demultiplex the `fastq` file into individual files based on the first 6 bp. In this way, each cell has one file. Here, we use `cutadapt`. The cell barcode information can be found in this [__Supplementary Information__](https://teichlab.github.io/scg_lib_structs/data/STRT-seq_family/STRT_GenomeRes_2011_SI.pdf) from the Genome Res. paper. We need the barcode in `fasta` format: ``` >bc01 @@ -219,10 +219,10 @@ TTGGAC . . . ``` -I have already prepared the `fasta` file and you can [__download from here__](https://teichlab.github.io/scg_lib_structs/data/STRT_bc.fa), and pass the `fasta` to `cutadapt`: +I have already prepared the `fasta` file and you can [__download from here__](https://teichlab.github.io/scg_lib_structs/data/STRT-seq_family/STRT_bc.fa), and pass the `fasta` to `cutadapt`: ```console -wget -P strt-seq/data https://teichlab.github.io/scg_lib_structs/data/STRT_bc.fa +wget -P strt-seq/data https://teichlab.github.io/scg_lib_structs/data/STRT-seq_family/STRT_bc.fa cutadapt -j 4 -g ^file:strt-seq/data/STRT_bc.fa \ --no-indels \ -o "strt-seq/data/demul-{name}.fastq.gz" \ @@ -233,12 +233,12 @@ It should finish without any problem, and we should have 97 more files under `st ### The C1 Version -The raw data for the C1 version can be found from [__the PRJNA203208 ENA page__](https://www.ebi.ac.uk/ena/browser/view/PRJNA203208?show=reads). I have prepared the read information as a TSV file including the barcode as the last column, and you can [__download here__](https://teichlab.github.io/scg_lib_structs/data/filereport_read_run_PRJNA203208.tsv). Again, the authors already demultiplexed for us. They were using single-end sequencing mode, so there is one file per cell. +The raw data for the C1 version can be found from [__the PRJNA203208 ENA page__](https://www.ebi.ac.uk/ena/browser/view/PRJNA203208?show=reads). I have prepared the read information as a TSV file including the barcode as the last column, and you can [__download here__](https://teichlab.github.io/scg_lib_structs/data/STRT-seq_family/filereport_read_run_PRJNA203208.tsv). Again, the authors already demultiplexed for us. They were using single-end sequencing mode, so there is one file per cell. ```bash mkdir -p strt-seq-c1/data wget -P strt-seq-c1/data \ - https://teichlab.github.io/scg_lib_structs/data/filereport_read_run_PRJNA203208.tsv + https://teichlab.github.io/scg_lib_structs/data/STRT-seq_family/filereport_read_run_PRJNA203208.tsv # there are two types of libraries # the one with the string "single" in the cell name is the regular one @@ -283,7 +283,7 @@ done > islam2011_manifest.tsv ### The C1 Version -In this version, cDNA from individual cells are tagmented by barcoded Tn5 separately. The Tn5 barcode serves as the cell barcode. You can find the full sequence from the [__Supplementary Table 2__](https://teichlab.github.io/scg_lib_structs/data/41592_2014_BFnmeth2772_MOESM268_ESM.xlsx) from the Isalm2014 paper in Nature Methods. There are 96 different 8-bp Tn5 barcodes: +In this version, cDNA from individual cells are tagmented by barcoded Tn5 separately. The Tn5 barcode serves as the cell barcode. You can find the full sequence from the [__Supplementary Table 2__](https://teichlab.github.io/scg_lib_structs/data/STRT-seq_family/41592_2014_BFnmeth2772_MOESM268_ESM.xlsx) from the Isalm2014 paper in Nature Methods. There are 96 different 8-bp Tn5 barcodes: | Name | Sequence | Reverse complement | |-----------|----------|--------------------| @@ -386,13 +386,13 @@ In this version, cDNA from individual cells are tagmented by barcoded Tn5 separa I have prepared the full tables in `csv` format for you to download: -[STRT-seq_C1_bc.csv](https://teichlab.github.io/scg_lib_structs/data/STRT-seq_C1_bc.csv) +[STRT-seq_C1_bc.csv](https://teichlab.github.io/scg_lib_structs/data/STRT-seq_family/STRT-seq_C1_bc.csv) If we check carefully about the oligo orientation in the [__STRT-seq C1 GitHub page__](https://teichlab.github.io/scg_lib_structs/methods_html/STRT-seq_family.html#STRT-seq-C1), we can see that the Tn5 barcodes are sequenced using the bottom strand as the template. Therefore, the barcode reads are actually reverse complement to the primer sequence. We should use the reverse complement as the whitelist: ```console wget -P strt-seq-c1/data \ - https://teichlab.github.io/scg_lib_structs/data/STRT-seq_C1_bc.csv + https://teichlab.github.io/scg_lib_structs/data/STRT-seq_family/STRT-seq_C1_bc.csv tail -n +2 strt-seq-c1/data/STRT-seq_C1_bc.csv | \ cut -f 3 -d, > strt-seq-c1/data/whitelist.txt diff --git a/methods_html/STRT-seq_family.html b/methods_html/STRT-seq_family.html index fc3bbd1..5e66ae3 100644 --- a/methods_html/STRT-seq_family.html +++ b/methods_html/STRT-seq_family.html @@ -14,7 +14,7 @@

STRT-seq

STRT-seq

-

STRT-seq was originally pulished in Genome Res. 21: 1160-1167 (2011). One year after that, the authors published the detailed protocol in Nature Protocols 7, 813–828 (2012), which is only slightly different from the orginal paper. The workflow shown here is based on the protocol in Nature Protocols 7, 813–828 (2012).

+

STRT-seq was originally pulished in Genome Res. 21: 1160-1167 (2011). One year after that, the authors published the detailed protocol in Nature Protocols 7, 813-828 (2012), which is only slightly different from the orginal paper. The workflow shown here is based on the protocol in Nature Protocols 7, 813-828 (2012).

Adapter and primer sequences:

@@ -224,7 +224,7 @@

(4) cDNA amplification using single C1-P1-PCR-2 primer: +Tn5 strt homodimer
 
 Product 1 (5'-end of cDNA):
@@ -298,7 +298,7 @@ 

(2) Add index sequencing primer to sequence cell barcodes (bottom strand as

STRT-seq-2i

-

This method is similar to STRT-seq-C1, but with nanowell capture and different oigo design, and there probably a mistake in the sequence of DI-Read1-Seq at this time of the preprint (29-08-2017), where there should be only five Ns in DI-Read1-Seq. There are currently six Ns in it.

+

This method is similar to STRT-seq-C1, but with nanowell capture and different oigo design, and there probably a mistake in the sequence of DI-Read1-Seq at this time of the preprint (29-08-2017), where there should be only five Ns in DI-Read1-Seq. There are currently six Ns in it.


@@ -368,7 +368,7 @@

(5) Amplified double stranded and indexed cDNA looks like this:

(6) Homemade Tn5 homodimers using annealed Barcoded STRT-Tn5-Idx[1-96] top/bottom tagmentation on amplified cDNA (will create 9-bp gap):

-Tn5 strt seq 2i +Tn5 strt seq 2i
 
 Product 1 (5'-end of cDNA):
diff --git a/methods_html/sci-RNA-seq_family.html b/methods_html/sci-RNA-seq_family.html
index 0f1c25a..6fd9faa 100644
--- a/methods_html/sci-RNA-seq_family.html
+++ b/methods_html/sci-RNA-seq_family.html
@@ -13,7 +13,7 @@ 

sci-RNA-seq
-

sci-RNA-seq

+

sci-RNA-seq

The sci-RNA-seq uses the combinatorial indexing to identify single cells without single cell isolation. Two-level indexing (RT barcode + PCR barcodes (i5 + i7)) or three-level indexing (RT barcode + PCR barcodes (i5 + i7) + Tn5 barcodes) can be used. Three-level indexing is a bit more difficult since you need to assemble many indexed Tn5 transposomes. Here, two-level indexing strategy is demonstrated.

@@ -158,7 +158,7 @@

(4) Add Read 2 sequencing primer to sequence the second read (top strand as
-

sci-RNA-seq3

+

sci-RNA-seq3

The sci-RNA-seq3 is an updated version of sci-RNA-seq. The major improvements are:

(1) nuclei are extracted directly from fresh tissues without enzymatic treatment;