Home

<--

Answers to Polls

14_FirstSteps

How many genes and cells does this dataset have?

dim(pbmc.data)

32738 genes, 2700 cells

How many genes are not expressed in any cell?

rowSums(pbmc.data) == 0 %>% sum()

16104

Which are the top 3 genes with the highest total count?

rowSums(pbmc.data) %>% sort(decreasing = TRUE) %>% head(3)

MALAT1, TMSB4X, B2M

In cell "AAATTCGATTCTCA-1", how many reads map to gene "ACTB"?

pbmc.data["ACTB","AAATTCGATTCTCA-1"]

10

How many cells have less than 2000 counts?

(colSums(pbmc.data) <= 2000) %>% sum()

1025

What's the current number of cells after this step?

dim(pbmc)

2638

Which are the 3 most highly variable genes?

VariableFeatures(pbmc) %>% head(3)

LYZ, S100A9, PPBP

What's the variance of the gene PYCARD?

HVFInfo(pbmc)["PYCARD",]

5.05

How many components should we choose to include?

7

Which is the default value of the parameter K?

k.param = 20

How many clusters did we find? pbmc$seurat_clusters %>% unique() %>% length() 9

16_UMAP

Is t-SNE affected by the seed?

Yes. And there's always one seed set because the algorithm is stochastic (the S in SNE!).

(perplexity constraint) Is nrow(X) the number of genes or cells?

nrow(X) is the number of cells ("observations" in the data matrix are in rows.) Meanwhile, each column of the data matrix is a variable (gene).

Is UMAP affected more than t-SNE by the seed?

t-SNE is more sensitive. UMAP emphasis on global structure helps (a lot.)

Would decreasing the number of PCs fed to the UMAP algorithm change our visualization? Would you say the results are 'better'?

Yes, and it’s better with few PCs. That's why we look for a "sweet" spot with Elbow or JackStraw. The variability in each PC is a mixed proportion (not always 1:1) of relevant, biological variability or technical variability.

Could you have the UMAP projection onto 3 axes instead of 2?

All answers are correct, except for "no".

21_DE

What would happen if we ran FindVariableFeatures after ScaleData?

Nothing, the resulting list of genes would be the same. Data scaling is important for ML (e.g. dim reduc); the method(s) to finding variable features are more akin to 'classic' statistical modelling techniques (e.g. regression).

What would happen if we skip ScaleData? Would PCA be affected?

The resulting PCA has biases introduced by the absolute values of our count matrix. Genes with higher counts would rule the components, even if their variability is not big.

Would it be any difference if we ran FindNeighbors and FindClusters but only after UMAP? Depends! If we wanted to FindNeighbors using reduction='umap'. The first option is incorrect in the context of single cell data and seurat processing pipeline.
Given that genes used for clustering are the same genes tested for differential expression, Would you interpret the (adjusted) p-values without concerns? Yeah. There's a bit of chicken-and-egg problem... but that's just how it is.
Which marker gene is the most expressed in cluster 1 when comparing it to cluster 2?
What would happen if we used ident.1 = 2, and ident.2 = 1 instead?
How would you extract the 'gene signature' of a given cluster? By DE testing with… Any test, against the remaining clusters

Day 3

paste question here

Day 4

paste question here

-->

This wiki is not empty, we have hidden content inside an HTML comment. Click button with text EDIT in the top right corner! ........ (not this pencil --->)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Answers to Polls

14_FirstSteps

16_UMAP

21_DE

Day 3

Day 4

Clone this wiki locally