Volcano plot of scRNA-seq data not "centered"
1
0
Entering edit mode
tjl • 0
@4e30bfdb
Last seen 24 days ago
United States

I analyzed a single-cell RNA-seq dataset including two groups using Seurat and then DESeq2 for differential expression. No problems with the standard workflow. However, the resulting log2FoldChanges for treated vs. untreated groups within clusters are almost all negative. This is clear in the corresponding volcano plots, which are not "centered" at a log2FoldChange of ~zero, but rather at about -0.5 (although still with a classic volcano shape). I know I can use "estimateSizeFactors" with "controlGenes" in DESeq2, but what is the most likely upstream explanation for this? It seems like a problem with normalization or scaling(?).

DESeq2 • 397 views
ADD COMMENT
0
Entering edit mode
ATpoint ★ 4.2k
@atpoint-13662
Last seen 12 minutes ago
Germany

Sounds like a normalization problem, but I am not sure how to help with that. Indeed you can use control genes, for example use the top 20% of genes with highest baseMean, or only genes that are expressed by at least 25% of cells. This could help enrich for more robust genes to properly center the bulk of genes along y=0 in the MA-plot (MA-plots are better fir diagnostics than Volcanos). Without more details on setup and code or plots I don't know what more to say.

ADD COMMENT
0
Entering edit mode

Agree with ATpoint

Depending on the "groups" in single cell, the question of DE can be not well defined without inserting additional information (controlGenes). E.g. what defines LFC = 0 across disparate cell types with very different distribution across the transcriptome.

ADD REPLY
0
Entering edit mode

Thank you both. The code is below, simplified a bit for simplicity. ATpoint, your suggestion re: the top 20% of genes by baseMean makes me wonder if the shift in LFC=0 is an artifact of the experiment, which should include an enrichment for certain RNA types in the treated group (grey points in the volcano plot). If these are robustly increased in most cells whereas most other genes in the single-cell data are non-uniformly expressed, perhaps they are driving the normalization?

volcano plot

# Standard Seurat workflow
merged_obj <- merge(x = obj_1, y = c(obj_2, obj_3, obj_4, obj_5, obj_6, obj_7, obj_8, obj_9, obj_10), project = "projectID")
merged_obj$percent_mt <- PercentageFeatureSet(merged_obj, pattern="^mt-")
filtered_obj <- subset(merged_ob, subset = nFeature_RNA >= 200 & nFeature_RNA <= 2500 & percent_mt <= 5)
filtered_obj <- NormalizeData(filtered_obj)
filtered_obj <- FindVariableFeatures(filtered_obj)
filtered_obj <- ScaleData(filtered_obj)
filtered_obj <- RunPCA(filtered_obj)
int_obj <- IntegrateLayers(object = filtered_obj, method = CCAIntegration, orig.reduction = "pca", new.reduction = "integrated.cca", verbose = FALSE)
int_obj[["RNA"]] <- JoinLayers(int_obj[["RNA"]])
int_obj <- FindNeighbors(int_obj, reduction = "integrated.cca", dims = 1:30)
int_obj <- FindClusters(int_obj, resolution = 0.8, cluster.name = "integrated_clusters")
int_obj <- RunUMAP(int_obj, dims = 1:30, reduction = "integrated.cca")

# Annotated cells manually with RenameIdents()
ann_obj$cell_type <- Idents(ann_obj)
ann_obj$cell_type[Cells(ann_obj)] <- paste(Idents(ann_obj))

# Pseudobulk DEG analysis
pseudo_obj <- AggregateExpression(ann_obj, assays = "RNA", return.seurat = TRUE, group.by = c("celltype", "group", "sample"))
pseudo_obj$pseudo_celltype.group <- paste(pseudo_obj$cell_type, pseudo_obj$group, sep = "_")
Idents(pseudo_obj) <- "pseudo_celltype.group"
pseudobulk_DEG <- FindMarkers(object = pseudo_obj, ident.1 = "celltype_group2", ident.2 = "celltype_group1", test.use = "DESeq2")
ADD REPLY
0
Entering edit mode

My comment was generic. If your experiment causes drastic shifts then the situation might be different. The advise is the same: Try to use genes you think are non-DE as controlGenes. For more guidance I would need to actually see the data and do inspection myself.

ADD REPLY

Login before adding your answer.

Traffic: 745 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6