Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new scRNA workflow for standard analysis using Scanpy #556

Merged
merged 15 commits into from
Jan 25, 2025
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions workflows/scRNAseq/standard-scanpy/.dockstore.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
version: 1.2
workflows:
- name: Standard-scRNA-seq-with-Scanpy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- name: Standard-scRNA-seq-with-Scanpy
- name: main

subclass: Galaxy
publish: true
primaryDescriptorPath: /Standard-scRNA-seq-with-Scanpy.ga
testParameterFiles:
- /Standard-scRNA-seq-with-Scanpy-tests.yml
authors:
- name: Pavankumar Videm
orcid: 0000-0002-5192-126X
- name: Hans-Rudolf Hotz
orcid: 0000-0002-2799-424X
- name: Mehmet Tekman
orcid: 0000-0002-4181-2676
- name: "B\xE9r\xE9nice Batut"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- name: "B\xE9r\xE9nice Batut"
- name: "Bérénice Batut"

orcid: 0000-0001-9852-1987
3 changes: 3 additions & 0 deletions workflows/scRNAseq/standard-scanpy/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## [0.1] 2024-10-09

First release.
23 changes: 23 additions & 0 deletions workflows/scRNAseq/standard-scanpy/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Standard scRNA-seq Workflow using Scanpy and Anndata
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is even standard. Is clustering the thing you do here ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preprocessing and clustering. I will rename it accordingly.


## Inputs dataset

- The workflow needs 4 files as input
- A singl-cell count matrix file in Matrix Market Exchange format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- A singl-cell count matrix file in Matrix Market Exchange format
- A single-cell count matrix file in Matrix Market Exchange format

- A cell barcodes file with a single barcode in each line. The barcodes should correspond to the cells in the matrix file
- A genes/feature tabular file with gene ids and gene symbols

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you forgot to describe the fourth file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As well as the 3 input values

Copy link
Member Author

@pavanvidem pavanvidem Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. But maybe we do not need a parameters file because @mvdbeek suggested using individual parameters instead of a file. We will have only 3 input files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed, but it would be good to describe the input values in the README.

## Processing

- The workflow creates an **Anndata** object from the given input files.
- Quality control performed. Cells are filtered by number of genes expressed, cells with high mitochondial content are removed.
- Then counts are normlized and scaled
- PCA is used for dimensionality reduction and 50 PCs are computed. Various plots are generated to inspect the PCA and PCA loadings that helps in chodeterminingnumber of PCs to keep for further analysis.
- Clustering is performed by computing a neighbourhood graph, and then using **louvain** algorithm. neighborhood graph is embeded into UMAP and plotted.
- Marker genes are identified using **Wilcoxon rank sum test**. Marker genes expressions are visualized in various plots.
- Optionally, louvain clusters can be annotated with cell types based on the marker genes.

## Outputs

- Final output is an Anndata object with annotations of louvain clusters.
- Some informative plots from QC to end results
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
- doc: Test outline for Standard-scRNA-seq-with-Scanpy
job:
Workflow Params:
class: File
path: test-data/workflow_params.tabular
filetype: tabular
Barcodes:
class: File
location: https://zenodo.org/record/3581213/files/barcodes.tsv
filetype: txt
Genes:
class: File
location: https://zenodo.org/record/3581213/files/genes.tsv
filetype: tabular
Matrix:
class: File
location: https://zenodo.org/record/3581213/files/matrix.mtx
filetype: mtx
Input is from Cell Ranger v2 or earlier versions: true
Manually annotate celltypes?: true
Annotate louvain clusters with these cell types: CD4+ T, CD14+, B, CD8+ T, FCGR3A+,
NK, Dendritic, Megakaryocytes
outputs:
initial_anndata_general_info:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make all of these human readable please, no underscores. They are part of the primary output, so it should look nice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am renaming all the outputs. Is that still a problem in that case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I don't think you need to rename the history items (but I don't mind of you do), workflow outputs should primarily be explored from the invocation view, not the history.

asserts:
has_text:
text: "AnnData object with n_obs × n_vars = 2700 × 32738"
pl_scatter_total_counts_vs_n_genes_by_counts:
path: test-data/pl_scatter_total_counts_vs_n_genes_by_counts.png
compare: sim_size
pl_highly_variable:
path: test-data/pl_highly_variable.png
compare: sim_size
pl_pca_loadings:
path: test-data/pl_pca_loadings.png
compare: sim_size
pl_pca_variance_ratio:
path: test-data/pl_pca_variance_ratio.png
compare: sim_size
pl_umap_louvain:
path: test-data/pl_umap_louvain.png
compare: sim_size
uns_rank_genes_groups_names_wilcoxon_test:
asserts:
has_line:
line: "LDHB LYZ CD74 CCL5 LST1 NKG7 HLA-DPA1 PF4"
pl_rank_gene_groups_t_test_wilcoxon_test:
path: test-data/pl_rank_gene_groups_t_test_wilcoxon_test.png
compare: sim_size
final_anndata_general_info:
path: test-data/final_anndata_general_info.txt
cells_per_cluster:
asserts:
has_line:
line: "0 1162"
line: "3 311"
line: "6 34"
pl_scatter_n_genes_by_counts_vs_pct_mito:
path: test-data/pl_scatter_n_genes_by_counts_vs_pct_mito.png
compare: sim_size
pl_violin_initial:
path: test-data/pl_violin_initial.png
compare: sim_size
anndata_with_raw:
asserts:
has_h5_keys:
keys: "obs/log1p_total_counts"
keys: "obs/total_counts_mito"
keys: "var/mito"
keys: "var/norm"
keys: "uns/log1p"
pl_pca_overview_genes:
path: test-data/pl_pca_overview_genes.png
compare: sim_size
pl_rank_genes_heatmap:
path: test-data/pl_rank_genes_heatmap.png
compare: sim_size
anndata_out:
asserts:
has_h5_keys:
keys: "obs/louvain"
keys: "var/highly_variable"
keys: "uns/rank_genes_groups"
pl_umap_marker_genes:
path: test-data/pl_umap_marker_genes.png
compare: sim_size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are nowadays better asserts available for images. Maybe you can add them in addition.

Also if you just compare sim_size you don't need to have the files in the repo, do you?

pl_stacked_violin_marker_genes:
path: test-data/pl_stacked_violin_marker_genes.png
compare: sim_size
pl_violin_louvain:
path: test-data/pl_violin_louvain.png
compare: sim_size
pl_dotplot_marker_genes:
path: test-data/pl_dotplot_marker_genes.png
compare: sim_size
Loading