Question: edgeR: Strange volcano plot
0
5 months ago by
cronanz0
cronanz0 wrote:

Dear all,

I have a scRNA-seq data (plate-based) and to identify differentially expressed genes between clusters, I have made use of edgeR. The input data was expected counts from RSEM and the example workflow is as follows:

all_edger <- DGEList(counts=all_expc,group=groups)
all_edger <- calcNormFactors(all_edger,method="TMMwzp")
all_design <- model.matrix(~0+groups)
all_edger <- estimateDisp(all_edger,design=all_design)
all_fit <- glmFit(all_edger,all_design)
all_lrt <- glmLRT(all_fit,constrast=c(-1,0,0,0,0,1,0,0))


The resulting volcano plot from the above comparison has a pattern that I'm not familiar with. Supposedly there is a tight correlation between logFC and -log10(FDR) for certain genes that resulted in a line of genes from each side of the plot. I guess my understanding is limited such that I'm unable to interpret this pattern. Is this to be expected? Am I doing something out of norm that results in this? Thank you very much.

Volcano plot: https://ibb.co/JcMnK7r

edger dgea scrna-seq • 345 views
modified 5 months ago by Aaron Lun25k • written 5 months ago by cronanz0
1

Generally one plots the negative log10 of the nominal p-value

1

Here are a couple of posts explaining why -log10(p) is better than -log10(FDR) for the volcano plot (as noted by Kevin Blighe):

1
5 months ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

It's hard to say for sure, but I would guess that you have a few genes that are all-zero in one group and with some non-zero counts in the other group. If you hold the dispersion constant (e.g., if all of the genes have very similar abundances), the p-value will be a monotonic function of the log-fold change, resulting in the lines that you've observed. It may even be that the non-zero counts in each group come from the same cells - or even just a single cell - which contributes to the clear definition of the pattern on the volcano plot.

I would suggest having a closer look at a few of those genes (in terms of their expression profiles across groups, e.g., with scater::plotExpression) for further diagnostics. Such patterns are not necessarily a problem - the counts are low, after all - though you are correct in that they do warrant some level of concern and investigation.