-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a new scRNA workflow for standard analysis using Scanpy #556
Changes from 6 commits
29e91d9
632d853
3df16cd
28b9c33
95942e4
fb729bc
c6d4760
0966617
3f39c26
7b2ea1c
4d31571
dd04f38
1c6bf83
eb445af
da4ecea
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,17 @@ | ||||||
version: 1.2 | ||||||
workflows: | ||||||
- name: Standard-scRNA-seq-with-Scanpy | ||||||
subclass: Galaxy | ||||||
publish: true | ||||||
primaryDescriptorPath: /Standard-scRNA-seq-with-Scanpy.ga | ||||||
testParameterFiles: | ||||||
- /Standard-scRNA-seq-with-Scanpy-tests.yml | ||||||
authors: | ||||||
- name: Pavankumar Videm | ||||||
orcid: 0000-0002-5192-126X | ||||||
- name: Hans-Rudolf Hotz | ||||||
orcid: 0000-0002-2799-424X | ||||||
- name: Mehmet Tekman | ||||||
orcid: 0000-0002-4181-2676 | ||||||
- name: "B\xE9r\xE9nice Batut" | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
orcid: 0000-0001-9852-1987 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
## [0.1] 2024-10-09 | ||
|
||
First release. |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,23 @@ | ||||||
# Standard scRNA-seq Workflow using Scanpy and Anndata | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is even standard. Is clustering the thing you do here ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Preprocessing and clustering. I will rename it accordingly. |
||||||
|
||||||
## Inputs dataset | ||||||
|
||||||
- The workflow needs 4 files as input | ||||||
- A singl-cell count matrix file in Matrix Market Exchange format | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
- A cell barcodes file with a single barcode in each line. The barcodes should correspond to the cells in the matrix file | ||||||
- A genes/feature tabular file with gene ids and gene symbols | ||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you forgot to describe the fourth file. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As well as the 3 input values There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch. But maybe we do not need a parameters file because @mvdbeek suggested using individual parameters instead of a file. We will have only 3 input files. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes indeed, but it would be good to describe the input values in the README. |
||||||
## Processing | ||||||
|
||||||
- The workflow creates an **Anndata** object from the given input files. | ||||||
- Quality control performed. Cells are filtered by number of genes expressed, cells with high mitochondial content are removed. | ||||||
- Then counts are normlized and scaled | ||||||
- PCA is used for dimensionality reduction and 50 PCs are computed. Various plots are generated to inspect the PCA and PCA loadings that helps in chodeterminingnumber of PCs to keep for further analysis. | ||||||
- Clustering is performed by computing a neighbourhood graph, and then using **louvain** algorithm. neighborhood graph is embeded into UMAP and plotted. | ||||||
- Marker genes are identified using **Wilcoxon rank sum test**. Marker genes expressions are visualized in various plots. | ||||||
- Optionally, louvain clusters can be annotated with cell types based on the marker genes. | ||||||
|
||||||
## Outputs | ||||||
|
||||||
- Final output is an Anndata object with annotations of louvain clusters. | ||||||
- Some informative plots from QC to end results |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,95 @@ | ||
- doc: Test outline for Standard-scRNA-seq-with-Scanpy | ||
job: | ||
Workflow Params: | ||
class: File | ||
path: test-data/workflow_params.tabular | ||
filetype: tabular | ||
Barcodes: | ||
class: File | ||
location: https://zenodo.org/record/3581213/files/barcodes.tsv | ||
filetype: txt | ||
Genes: | ||
class: File | ||
location: https://zenodo.org/record/3581213/files/genes.tsv | ||
filetype: tabular | ||
Matrix: | ||
class: File | ||
location: https://zenodo.org/record/3581213/files/matrix.mtx | ||
filetype: mtx | ||
Input is from Cell Ranger v2 or earlier versions: true | ||
Manually annotate celltypes?: true | ||
Annotate louvain clusters with these cell types: CD4+ T, CD14+, B, CD8+ T, FCGR3A+, | ||
NK, Dendritic, Megakaryocytes | ||
outputs: | ||
initial_anndata_general_info: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you make all of these human readable please, no underscores. They are part of the primary output, so it should look nice. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am renaming all the outputs. Is that still a problem in that case? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. I don't think you need to rename the history items (but I don't mind of you do), workflow outputs should primarily be explored from the invocation view, not the history. |
||
asserts: | ||
has_text: | ||
text: "AnnData object with n_obs × n_vars = 2700 × 32738" | ||
pl_scatter_total_counts_vs_n_genes_by_counts: | ||
path: test-data/pl_scatter_total_counts_vs_n_genes_by_counts.png | ||
compare: sim_size | ||
pl_highly_variable: | ||
path: test-data/pl_highly_variable.png | ||
compare: sim_size | ||
pl_pca_loadings: | ||
path: test-data/pl_pca_loadings.png | ||
compare: sim_size | ||
pl_pca_variance_ratio: | ||
path: test-data/pl_pca_variance_ratio.png | ||
compare: sim_size | ||
pl_umap_louvain: | ||
path: test-data/pl_umap_louvain.png | ||
compare: sim_size | ||
uns_rank_genes_groups_names_wilcoxon_test: | ||
asserts: | ||
has_line: | ||
line: "LDHB LYZ CD74 CCL5 LST1 NKG7 HLA-DPA1 PF4" | ||
pl_rank_gene_groups_t_test_wilcoxon_test: | ||
path: test-data/pl_rank_gene_groups_t_test_wilcoxon_test.png | ||
compare: sim_size | ||
final_anndata_general_info: | ||
path: test-data/final_anndata_general_info.txt | ||
cells_per_cluster: | ||
asserts: | ||
has_line: | ||
line: "0 1162" | ||
line: "3 311" | ||
line: "6 34" | ||
pl_scatter_n_genes_by_counts_vs_pct_mito: | ||
path: test-data/pl_scatter_n_genes_by_counts_vs_pct_mito.png | ||
compare: sim_size | ||
pl_violin_initial: | ||
path: test-data/pl_violin_initial.png | ||
compare: sim_size | ||
anndata_with_raw: | ||
asserts: | ||
has_h5_keys: | ||
keys: "obs/log1p_total_counts" | ||
keys: "obs/total_counts_mito" | ||
keys: "var/mito" | ||
keys: "var/norm" | ||
keys: "uns/log1p" | ||
pl_pca_overview_genes: | ||
path: test-data/pl_pca_overview_genes.png | ||
compare: sim_size | ||
pl_rank_genes_heatmap: | ||
path: test-data/pl_rank_genes_heatmap.png | ||
compare: sim_size | ||
anndata_out: | ||
asserts: | ||
has_h5_keys: | ||
keys: "obs/louvain" | ||
keys: "var/highly_variable" | ||
keys: "uns/rank_genes_groups" | ||
pl_umap_marker_genes: | ||
path: test-data/pl_umap_marker_genes.png | ||
compare: sim_size | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are nowadays better asserts available for images. Maybe you can add them in addition. Also if you just compare sim_size you don't need to have the files in the repo, do you? |
||
pl_stacked_violin_marker_genes: | ||
path: test-data/pl_stacked_violin_marker_genes.png | ||
compare: sim_size | ||
pl_violin_louvain: | ||
path: test-data/pl_violin_louvain.png | ||
compare: sim_size | ||
pl_dotplot_marker_genes: | ||
path: test-data/pl_dotplot_marker_genes.png | ||
compare: sim_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.