How to generate a volcano plot when p-values from DESeq2 are zero?
1
0
Entering edit mode
@ricardo3889-19646
Last seen 5.4 years ago
University of Pennsylvania

When using DESeq2, I noticed that some of my top genes have a pvalue or padj of zero. I suppose the pvalue from the Wald test is really small and it got rounded at some point when I run DESeq2, although it is a bit surprising that other packages, including limma/voom, edgeR assigned a more reasonable pvalue (e.g E-15, E-20, etc) to the same genes using the same dataset. In contrast, DESeq2 is only giving zeros for those same genes. The second lowest p-value I get (after the zeros) is E-82 for a similar logFC for the genes with p-value of zero. Given this situation, how I can use these values to generate a decent volcano plot to represent the data?. When my top genes have a value of zero, the scale of the volcano plot is just awful because the top genes are assigned -log10P values that are through the roof . Also, I would appreciate if someone could help me understand why DESeq2 doesn't provide a more reasonable p-value (I mean E-82 is a bit extreme... ). I am attaching some values from by data.Thanks!

Volcanoplot.jpeg sample.tsv

deseq2 pvalue visualization enhancedvolcano rnaseq • 14k views
ADD COMMENT
0
Entering edit mode

I'll just add a comment on the volcano part because you have tagged my package, EnhancedVolcano. EnhancedVolcano will automatically convert p-values of 0 to the lowest possible machine value, which I imagine differs based on whatever OS you are using. It issues a warning message to console when doing this. I introduced this behaviour for the purposes of being able to plot these genes, which otherwise would not be plotted (negative log10 of 0 is infinity).

Just looking at your plot, there nevertheless looks to be something wrong with your p-value distribution, even when not considering the bunch of genes at the top-left. I'll let Michael answer on the DESeq2 part, though.

ADD REPLY
0
Entering edit mode

Hi Kevin, thank you for your response. Yes I noticed this behavior, the problem is that volcano plot is zoomed out a lot to the point that it doesn't even look like a volcano anymore (see attached jpeg). is there a way to set the coordinate limits without dropping data observations (similar to coord_cartesian() in ggplot2)?

ADD REPLY
0
Entering edit mode

The package is built on the ggplot2 engine and returns a ggplot2 object, so, you could likely just do, for example:

EnhancedVolcano(...) + coord_cartesian(-log10(10E-50), -log10(10E-60))

Behaviour may be unexpected though.

ADD REPLY
0
Entering edit mode

Thanks Kevin. I tried

EnhancedVolcano() + coord_cartesian(ylim=c(0,20))

And it works at setting the axis limits, but the data points outside of the visible plot are hidden. Is there a way to represent those values falling outside the visible plot with perhaps arrows, similar to the MA plots from DESeq2?

ADD REPLY
0
Entering edit mode

That functionality is not available, unfortunately. Regarding the fundamental issue: I don't know what a p-value of 0 actually means. You could likely identify these transcripts prior to running EnhancedVolcano and set them to, for example, 10E-1 * the lowest non-zero p-value. Then, the plot visualisation may improve.

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

I’ve posted here before about small pvalues before. I’m not really concerned about very small pvalues for trivially DE genes. It’s not a useful statistic for much once it’s very small. I think the useful outputs are FDR or FSR sets and posterior estimates of log fold change.

ADD COMMENT
0
Entering edit mode

Is it possible to get these outputs you are referring to (FDR, FSR, posterior estimates of logFC) using DESeq2? I don't think the vignette explains this, unless I am missing something. I thought that FDR and the padj were used interchangeably.

ADD REPLY
0
Entering edit mode

An FDR set is the set of genes with an adjusted p-value less than a certain nominal bound.

The posterior estimates of LFC are provided by lfcShrink in DESeq2. We have a section of the vignette on these. You can now get FSR sets from lfcShrink by specifying s-value=TRUE. See the references for more details on FSR and s-value.

ADD REPLY

Login before adding your answer.

Traffic: 347 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6