From b3e7d3ff940b4efd369d18224e42bc47aef5a9ca Mon Sep 17 00:00:00 2001
From: Xi Chen <dbrg77@gmail.com>
Date: Thu, 7 Mar 2024 00:03:55 +0800
Subject: [PATCH] clean up split-seq

---
 data/{ => SPLiT-seq}/SPLiT-seq_Round1_bc.csv  |   0
 data/{ => SPLiT-seq}/SPLiT-seq_Round2_bc.csv  |   0
 data/{ => SPLiT-seq}/SPLiT-seq_Round3_bc.csv  |   0
 .../Star_CB_UMI_Complex_SPLiT-seq.jpg         | Bin
 data/{ => SPLiT-seq}/aam8999_tables12.xlsx    | Bin
 data/{ => SPLiT-seq}/aba5257_table_s3.xlsx    | Bin
 docs/source/ge/SPLiT-seq.md                   |  18 +++++++++---------
 7 files changed, 9 insertions(+), 9 deletions(-)
 rename data/{ => SPLiT-seq}/SPLiT-seq_Round1_bc.csv (100%)
 rename data/{ => SPLiT-seq}/SPLiT-seq_Round2_bc.csv (100%)
 rename data/{ => SPLiT-seq}/SPLiT-seq_Round3_bc.csv (100%)
 rename data/{ => SPLiT-seq}/Star_CB_UMI_Complex_SPLiT-seq.jpg (100%)
 rename data/{ => SPLiT-seq}/aam8999_tables12.xlsx (100%)
 rename data/{ => SPLiT-seq}/aba5257_table_s3.xlsx (100%)

diff --git a/data/SPLiT-seq_Round1_bc.csv b/data/SPLiT-seq/SPLiT-seq_Round1_bc.csv
similarity index 100%
rename from data/SPLiT-seq_Round1_bc.csv
rename to data/SPLiT-seq/SPLiT-seq_Round1_bc.csv
diff --git a/data/SPLiT-seq_Round2_bc.csv b/data/SPLiT-seq/SPLiT-seq_Round2_bc.csv
similarity index 100%
rename from data/SPLiT-seq_Round2_bc.csv
rename to data/SPLiT-seq/SPLiT-seq_Round2_bc.csv
diff --git a/data/SPLiT-seq_Round3_bc.csv b/data/SPLiT-seq/SPLiT-seq_Round3_bc.csv
similarity index 100%
rename from data/SPLiT-seq_Round3_bc.csv
rename to data/SPLiT-seq/SPLiT-seq_Round3_bc.csv
diff --git a/data/Star_CB_UMI_Complex_SPLiT-seq.jpg b/data/SPLiT-seq/Star_CB_UMI_Complex_SPLiT-seq.jpg
similarity index 100%
rename from data/Star_CB_UMI_Complex_SPLiT-seq.jpg
rename to data/SPLiT-seq/Star_CB_UMI_Complex_SPLiT-seq.jpg
diff --git a/data/aam8999_tables12.xlsx b/data/SPLiT-seq/aam8999_tables12.xlsx
similarity index 100%
rename from data/aam8999_tables12.xlsx
rename to data/SPLiT-seq/aam8999_tables12.xlsx
diff --git a/data/aba5257_table_s3.xlsx b/data/SPLiT-seq/aba5257_table_s3.xlsx
similarity index 100%
rename from data/aba5257_table_s3.xlsx
rename to data/SPLiT-seq/aba5257_table_s3.xlsx
diff --git a/docs/source/ge/SPLiT-seq.md b/docs/source/ge/SPLiT-seq.md
index 7ff5029..dc4bf03 100644
--- a/docs/source/ge/SPLiT-seq.md
+++ b/docs/source/ge/SPLiT-seq.md
@@ -110,7 +110,7 @@ wget -P split-seq/data -c \
 
 ## Prepare Whitelist
 
-The full oligo sequences can be found in the [Supplementary Table S12](https://teichlab.github.io/scg_lib_structs/data/aam8999_tables12.xlsx) from the __SPLiT-seq__ paper. As you can see, there are a total of 96 different __Round2 barcodes__ and 96 different __Round3 barcodes__. For the sublibrary index, they provided 8 different ones (`BC_0076` - `BC_0083`), but you can cerntainly do more. For the __Round1 barcodes__, it is a bit more complicated. There are 96 of them (`Round1_01` - `Round1_96`). The first 48 are oligo-dT primers and the last 48 are random hexamers. They mix them into 48 different wells:
+The full oligo sequences can be found in the [Supplementary Table S12](https://teichlab.github.io/scg_lib_structs/data/SPLiT-seq/aam8999_tables12.xlsx) from the __SPLiT-seq__ paper. As you can see, there are a total of 96 different __Round2 barcodes__ and 96 different __Round3 barcodes__. For the sublibrary index, they provided 8 different ones (`BC_0076` - `BC_0083`), but you can cerntainly do more. For the __Round1 barcodes__, it is a bit more complicated. There are 96 of them (`Round1_01` - `Round1_96`). The first 48 are oligo-dT primers and the last 48 are random hexamers. They mix them into 48 different wells:
 
 `Round1_01` and `Round1_49` are mixed in the same well;  
 `Round1_02` and `Round1_50` are mixed in the same well;  
@@ -120,7 +120,7 @@ The full oligo sequences can be found in the [Supplementary Table S12](https://t
 . . .  
 `Round1_48` and `Round1_96` are mixed in the same well.  
 
-Therefore, we actually have 48 different __Round1_barcodes__. If you use the oligos provided in the [Supplementary Table S12](https://teichlab.github.io/scg_lib_structs/data/aam8999_tables12.xlsx) from the __SPLiT-seq__ paper, you should have a capacity of __48 * 96 * 96 * 8 = 3,538,944__ combinations. For the preprocessing, we could treat the different __Round1 barcodes__ as if there are 96 different ones. During the downstream analysis after the preprocessing, we could merge them.
+Therefore, we actually have 48 different __Round1_barcodes__. If you use the oligos provided in the [Supplementary Table S12](https://teichlab.github.io/scg_lib_structs/data/SPLiT-seq/aam8999_tables12.xlsx) from the __SPLiT-seq__ paper, you should have a capacity of __48 * 96 * 96 * 8 = 3,538,944__ combinations. For the preprocessing, we could treat the different __Round1 barcodes__ as if there are 96 different ones. During the downstream analysis after the preprocessing, we could merge them.
 
 I have collected the index table as follows, and the names of the oligos are directly taken from the paper to be consistent:
 
@@ -429,17 +429,17 @@ __Round3 Barcodes (8 bp)__
 
 I have put those three tables into `csv` files and you can download them to have a look:
 
-[SPLiT-seq_Round1_bc.csv](https://teichlab.github.io/scg_lib_structs/data/SPLiT-seq_Round1_bc.csv)  
-[SPLiT-seq_Round2_bc.csv](https://teichlab.github.io/scg_lib_structs/data/SPLiT-seq_Round2_bc.csv)  
-[SPLiT-seq_Round3_bc.csv](https://teichlab.github.io/scg_lib_structs/data/SPLiT-seq_Round3_bc.csv)
+[SPLiT-seq_Round1_bc.csv](https://teichlab.github.io/scg_lib_structs/data/SPLiT-seq/SPLiT-seq_Round1_bc.csv)  
+[SPLiT-seq_Round2_bc.csv](https://teichlab.github.io/scg_lib_structs/data/SPLiT-seq/SPLiT-seq_Round2_bc.csv)  
+[SPLiT-seq_Round3_bc.csv](https://teichlab.github.io/scg_lib_structs/data/SPLiT-seq/SPLiT-seq_Round3_bc.csv)
 
 Let's download them:
 
 ```console
 wget -P split-seq/data \
-    https://teichlab.github.io/scg_lib_structs/data/SPLiT-seq_Round1_bc.csv \
-    https://teichlab.github.io/scg_lib_structs/data/SPLiT-seq_Round2_bc.csv \
-    https://teichlab.github.io/scg_lib_structs/data/SPLiT-seq_Round3_bc.csv
+    https://teichlab.github.io/scg_lib_structs/data/SPLiT-seq/SPLiT-seq_Round1_bc.csv \
+    https://teichlab.github.io/scg_lib_structs/data/SPLiT-seq/SPLiT-seq_Round2_bc.csv \
+    https://teichlab.github.io/scg_lib_structs/data/SPLiT-seq/SPLiT-seq_Round3_bc.csv
 ```
 
 Now we need to generate the whitelist of those three rounds of barcodes. Those barcodes are sequenced in __Read 2__ using the top strand as the template. They are in the same direction of the Illumina TruSeq Read 2 sequence. Therefore, we should take their sequences as they are. In addition, if you check the [__SPLiT-seq GitHub page__](https://teichlab.github.io/scg_lib_structs/methods_html/SPLiT-seq.html), you will see that the __Round3 barcode__ is sequenced first, then __Round2 barcode__ and finally __Round1 barcode__. Therefore, we should pass the whitelist to `starsolo` in that order. See the next section for more details.
@@ -512,7 +512,7 @@ If you understand the __SPLiT-seq__ experimental procedures described in [this G
 
 >> These options specify the locations of cell barcode and UMI in the 2nd fastq files we passed to `--readFilesIn`. In this case, it is __Read 2__. Read the [STAR manual](https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf) for more details. I have drawn a picture to help myself decide the exact parameters. There are some freedom here depending on what you are using as anchors. in __SPLiT-seq__, the UMI and cell barcodes are in fixed position in the __Read 2__. It is relatively straightforward to specify the parameter. See the image:
 
-![](https://teichlab.github.io/scg_lib_structs/data/Star_CB_UMI_Complex_SPLiT-seq.jpg)
+![](https://teichlab.github.io/scg_lib_structs/data/SPLiT-seq/Star_CB_UMI_Complex_SPLiT-seq.jpg)
 
 `--soloCBwhitelist`