Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new scRNA workflow for standard analysis using Scanpy #556

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

pavanvidem
Copy link
Member

Follows mostly the 3k PBMC clustering tutorial. It uses a workflow parameters file for some important parameters. All the plots automatically use the highly ranked genes.

This comment was marked as outdated.

@bgruening
Copy link
Member

File [/home/runner/work/iwc/iwc/workflows/scRNAseq/standard-scanpy/test-data/Barcodes.txt] does not exist - parent directory [/home/runner/work/iwc/iwc/workflows/scRNAseq/standard-scanpy/test-data] does exist, cwd is [/home/runner/work/iwc/iwc]

The Barcodes file is missing.

keys: "uns/rank_genes_groups"
pl_umap_marker_genes:
path: test-data/pl_umap_marker_genes.png
compare: sim_size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are nowadays better asserts available for images. Maybe you can add them in addition.

Also if you just compare sim_size you don't need to have the files in the repo, do you?

@pavanvidem
Copy link
Member Author

@lldelisle can you please review this?

Copy link
Contributor

@lldelisle lldelisle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks.

orcid: 0000-0002-2799-424X
- name: Mehmet Tekman
orcid: 0000-0002-4181-2676
- name: "B\xE9r\xE9nice Batut"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- name: "B\xE9r\xE9nice Batut"
- name: "Bérénice Batut"

## Inputs dataset

- The workflow needs 4 files as input
- A singl-cell count matrix file in Matrix Market Exchange format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- A singl-cell count matrix file in Matrix Market Exchange format
- A single-cell count matrix file in Matrix Market Exchange format

- A singl-cell count matrix file in Matrix Market Exchange format
- A cell barcodes file with a single barcode in each line. The barcodes should correspond to the cells in the matrix file
- A genes/feature tabular file with gene ids and gene symbols

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you forgot to describe the fourth file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As well as the 3 input values

Copy link
Member Author

@pavanvidem pavanvidem Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. But maybe we do not need a parameters file because @mvdbeek suggested using individual parameters instead of a file. We will have only 3 input files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed, but it would be good to describe the input values in the README.

{
"class": "Person",
"identifier": "0000-0001-9852-1987",
"name": "B\u00e9r\u00e9nice Batut"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"name": "B\u00e9r\u00e9nice Batut"
"name": "Bérénice Batut"

@lldelisle
Copy link
Contributor

@mvdbeek Have you/we written down the naming convension for workflows?

@@ -0,0 +1,17 @@
version: 1.2
workflows:
- name: Standard-scRNA-seq-with-Scanpy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- name: Standard-scRNA-seq-with-Scanpy
- name: main

Annotate louvain clusters with these cell types: CD4+ T, CD14+, B, CD8+ T, FCGR3A+,
NK, Dendritic, Megakaryocytes
outputs:
initial_anndata_general_info:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make all of these human readable please, no underscores. They are part of the primary output, so it should look nice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am renaming all the outputs. Is that still a problem in that case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I don't think you need to rename the history items (but I don't mind of you do), workflow outputs should primarily be explored from the invocation view, not the history.

"name": "Workflow Params"
}
],
"label": "Workflow Params",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be a file, but individual options.

"format-version": "0.1",
"license": "CC-BY-4.0",
"release": "0.1",
"name": "Standard scRNA-seq with Scanpy",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"name": "Standard scRNA-seq with Scanpy",
"name": "scRNA-seq with Scanpy",

Copy link
Member Author

@pavanvidem pavanvidem Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will rename it to "Preprocessing and Clustering of single-cell RNA-seq data with Scanpy".

@@ -0,0 +1,3096 @@
{
"a_galaxy_workflow": "true",
"annotation": "Standard scRNA-seq workflow with Scanpy and Anndata. Based on the 3k PBMC clustering tutorial from Scanpy. Important workflow parameters can be read from a tabular file.",
Copy link
Member

@mvdbeek mvdbeek Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't make the users write a parameter file, that's quite bad for UX, validation etc.

@mvdbeek
Copy link
Member

mvdbeek commented Oct 14, 2024

I don't think we've agreed on anything yet. I would prefer to use Single Cell in the name field of the workflow itself.

@@ -0,0 +1,23 @@
# Standard scRNA-seq Workflow using Scanpy and Anndata
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is even standard. Is clustering the thing you do here ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preprocessing and clustering. I will rename it accordingly.

@bgruening
Copy link
Member

FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/iwc/iwc/workflows/scRNAseq/scanpy-clustering/test-data/General information about the final Anndata object.txt'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants