Question: How to generate a volcano plot when p-values from DESeq2 are zero?
0
gravatar for ricardo3889
8 months ago by
University of Pennsylvania
ricardo38890 wrote:

When using DESeq2, I noticed that some of my top genes have a pvalue or padj of zero. I suppose the pvalue from the Wald test is really small and it got rounded at some point when I run DESeq2, although it is a bit surprising that other packages, including limma/voom, edgeR assigned a more reasonable pvalue (e.g E-15, E-20, etc) to the same genes using the same dataset. In contrast, DESeq2 is only giving zeros for those same genes. The second lowest p-value I get (after the zeros) is E-82 for a similar logFC for the genes with p-value of zero. Given this situation, how I can use these values to generate a decent volcano plot to represent the data?. When my top genes have a value of zero, the scale of the volcano plot is just awful because the top genes are assigned -log10P values that are through the roof . Also, I would appreciate if someone could help me understand why DESeq2 doesn't provide a more reasonable p-value (I mean E-82 is a bit extreme... ). I am attaching some values from by data.Thanks!

Volcanoplot.jpeg sample.tsv

ADD COMMENTlink modified 8 months ago by Michael Love25k • written 8 months ago by ricardo38890

I'll just add a comment on the volcano part because you have tagged my package, EnhancedVolcano. EnhancedVolcano will automatically convert p-values of 0 to the lowest possible machine value, which I imagine differs based on whatever OS you are using. It issues a warning message to console when doing this. I introduced this behaviour for the purposes of being able to plot these genes, which otherwise would not be plotted (negative log10 of 0 is infinity).

Just looking at your plot, there nevertheless looks to be something wrong with your p-value distribution, even when not considering the bunch of genes at the top-left. I'll let Michael answer on the DESeq2 part, though.

ADD REPLYlink modified 8 months ago • written 8 months ago by Kevin Blighe200

Hi Kevin, thank you for your response. Yes I noticed this behavior, the problem is that volcano plot is zoomed out a lot to the point that it doesn't even look like a volcano anymore (see attached jpeg). is there a way to set the coordinate limits without dropping data observations (similar to coord_cartesian() in ggplot2)?

ADD REPLYlink written 8 months ago by ricardo38890

The package is built on the ggplot2 engine and returns a ggplot2 object, so, you could likely just do, for example:

EnhancedVolcano(...) + coord_cartesian(-log10(10E-50), -log10(10E-60))

Behaviour may be unexpected though.

ADD REPLYlink written 8 months ago by Kevin Blighe200

Thanks Kevin. I tried

EnhancedVolcano() + coord_cartesian(ylim=c(0,20))

And it works at setting the axis limits, but the data points outside of the visible plot are hidden. Is there a way to represent those values falling outside the visible plot with perhaps arrows, similar to the MA plots from DESeq2?

ADD REPLYlink written 8 months ago by ricardo38890

That functionality is not available, unfortunately. Regarding the fundamental issue: I don't know what a p-value of 0 actually means. You could likely identify these transcripts prior to running EnhancedVolcano and set them to, for example, 10E-1 * the lowest non-zero p-value. Then, the plot visualisation may improve.

ADD REPLYlink modified 8 months ago • written 8 months ago by Kevin Blighe200
Answer: How to generate a volcano plot when p-values from DESeq2 are zero?
0
gravatar for Michael Love
8 months ago by
Michael Love25k
United States
Michael Love25k wrote:

I’ve posted here before about small pvalues before. I’m not really concerned about very small pvalues for trivially DE genes. It’s not a useful statistic for much once it’s very small. I think the useful outputs are FDR or FSR sets and posterior estimates of log fold change.

ADD COMMENTlink written 8 months ago by Michael Love25k

Is it possible to get these outputs you are referring to (FDR, FSR, posterior estimates of logFC) using DESeq2? I don't think the vignette explains this, unless I am missing something. I thought that FDR and the padj were used interchangeably.

ADD REPLYlink written 8 months ago by ricardo38890

An FDR set is the set of genes with an adjusted p-value less than a certain nominal bound.

The posterior estimates of LFC are provided by lfcShrink in DESeq2. We have a section of the vignette on these. You can now get FSR sets from lfcShrink by specifying s-value=TRUE. See the references for more details on FSR and s-value.

ADD REPLYlink written 8 months ago by Michael Love25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 213 users visited in the last hour