Question

How to generate a volcano plot when p-values from DESeq2 are zero?

0

Entering edit mode

ricardo3889 • 0

@ricardo3889-19646

Last seen 4.8 years ago

University of Pennsylvania

When using DESeq2, I noticed that some of my top genes have a pvalue or padj of zero. I suppose the pvalue from the Wald test is really small and it got rounded at some point when I run DESeq2, although it is a bit surprising that other packages, including limma/voom, edgeR assigned a more reasonable pvalue (e.g E-15, E-20, etc) to the same genes using the same dataset. In contrast, DESeq2 is only giving zeros for those same genes. The second lowest p-value I get (after the zeros) is E-82 for a similar logFC for the genes with p-value of zero. Given this situation, how I can use these values to generate a decent volcano plot to represent the data?. When my top genes have a value of zero, the scale of the volcano plot is just awful because the top genes are assigned -log10P values that are through the roof . Also, I would appreciate if someone could help me understand why DESeq2 doesn't provide a more reasonable p-value (I mean E-82 is a bit extreme... ). I am attaching some values from by data.Thanks!

Volcanoplot.jpeg sample.tsv

deseq2 pvalue visualization enhancedvolcano rnaseq • 12k views

ADD COMMENT • link updated 5.2 years ago by Michael Love 41k • written 5.2 years ago by ricardo3889 • 0

0

Entering edit mode

I'll just add a comment on the volcano part because you have tagged my package, EnhancedVolcano. EnhancedVolcano will automatically convert p-values of 0 to the lowest possible machine value, which I imagine differs based on whatever OS you are using. It issues a warning message to console when doing this. I introduced this behaviour for the purposes of being able to plot these genes, which otherwise would not be plotted (negative log10 of 0 is infinity).

Just looking at your plot, there nevertheless looks to be something wrong with your p-value distribution, even when not considering the bunch of genes at the top-left. I'll let Michael answer on the DESeq2 part, though.

ADD REPLY • link 5.2 years ago Kevin Blighe ★ 3.9k

0

Entering edit mode

Hi Kevin, thank you for your response. Yes I noticed this behavior, the problem is that volcano plot is zoomed out a lot to the point that it doesn't even look like a volcano anymore (see attached jpeg). is there a way to set the coordinate limits without dropping data observations (similar to coord_cartesian() in ggplot2)?

ADD REPLY • link 5.2 years ago ricardo3889 • 0

0

Entering edit mode

The package is built on the ggplot2 engine and returns a ggplot2 object, so, you could likely just do, for example:

EnhancedVolcano(...) + coord_cartesian(-log10(10E-50), -log10(10E-60))

Behaviour may be unexpected though.

ADD REPLY • link 5.2 years ago Kevin Blighe ★ 3.9k

0

Entering edit mode

Thanks Kevin. I tried

EnhancedVolcano() + coord_cartesian(ylim=c(0,20))

And it works at setting the axis limits, but the data points outside of the visible plot are hidden. Is there a way to represent those values falling outside the visible plot with perhaps arrows, similar to the MA plots from DESeq2?

ADD REPLY • link 5.2 years ago ricardo3889 • 0

0

Entering edit mode

That functionality is not available, unfortunately. Regarding the fundamental issue: I don't know what a p-value of 0 actually means. You could likely identify these transcripts prior to running EnhancedVolcano and set them to, for example, 10E-1 * the lowest non-zero p-value. Then, the plot visualisation may improve.

ADD REPLY • link 5.2 years ago Kevin Blighe ★ 3.9k

score 0 · Answer 1 · 2019-02-07

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 7 hours ago

United States

I’ve posted here before about small pvalues before. I’m not really concerned about very small pvalues for trivially DE genes. It’s not a useful statistic for much once it’s very small. I think the useful outputs are FDR or FSR sets and posterior estimates of log fold change.

ADD COMMENT • link 5.2 years ago Michael Love 41k

0

Entering edit mode

Is it possible to get these outputs you are referring to (FDR, FSR, posterior estimates of logFC) using DESeq2? I don't think the vignette explains this, unless I am missing something. I thought that FDR and the padj were used interchangeably.

ADD REPLY • link 5.2 years ago ricardo3889 • 0

0

Entering edit mode

An FDR set is the set of genes with an adjusted p-value less than a certain nominal bound.

The posterior estimates of LFC are provided by lfcShrink in DESeq2. We have a section of the vignette on these. You can now get FSR sets from lfcShrink by specifying s-value=TRUE. See the references for more details on FSR and s-value.

ADD REPLY • link 5.2 years ago Michael Love 41k