Volcanoplot with limma - RAW P-values or Adj.P-Values
2
0
Entering edit mode
tcalvo ▴ 70
@tcalvo-12466
Last seen 5 months ago
Brazil

I have noticed that limma's volcanoplot() function uses uncorrected p-values from the MArrayLM objected. My question is: why?

I've seen an old post where G. Smyth  mentioned that the FDR-corrected p-values loses some info in comparison to the raw ones. Could someone elucidate this, please? Another reason pointed by the author was that the same adj.p-value may match to different p-values.

Thanks!

Thyago

volcanoplot limma fdr • 5.3k views
5
Entering edit mode
@gordon-smyth
Last seen 2 hours ago
WEHI, Melbourne, Australia

I'm not sure what I can tell you that I didn't already say in my earlier answer to a similar question: Volcano plot labeling troubles

You've already repeated in your question the reason why it it preferable to use p-value as the y-axis rather than FDR. (Actually I like B-statistic even better, but that's another story.) The p-values are the basic values from which FDR is computed and it is typically better to plot basic data rather than derived quantities.

Why does that not convince you?  Why would you want to force points with different p-values together on the y-axis? Or are you asking for more explanation of why different p-values can lead to the same FDR? I think that has been answered separately.

Note that there is always a p-value cutoff that corresponds to any FDR cutoff, so you can easily indicate an FDR cutoff on the plot even if the y-axis is p-value. So using FDR as the y-axis has no advantage that I can think of.

0
Entering edit mode

Thanks for your answer. I'm not questioning your decision by making the way you did it, though. I only asked because I wanted to know exactly why, since a lot of people often question me this. Anyway, thank you again.

Regards,

3
Entering edit mode
@wolfgang-huber-3550
Last seen 9 days ago
EMBL European Molecular Biology Laborat…

There's another reason to support Gordon's view. There is a fundamental difference between p-values and FDR: p-values are per-hypothesis (i.e., per-gene) properties, whereas FDR is an average across all rejected hypotheses. I.e., if you have a set of hypotheses (genes) rejected at a certain FDR $\alpha$, then the local fdr for some of these is less than $\alpha$, and for some, more than $\alpha$. The only thing you know is that the FDR overall is $\alpha$.

In general, there is no 1:1 relation between p-value and FDR. In the special case of the Benjamini-Hochberg method, such a 1:1 relation can be constructed (what's called the 'adjusted p-value'), but this assumes that the Benjamini-Hochberg method is used, with no modifications such as filtering, weighting, etc.

This assumption has seemed so natural that often it has not even been questioned (hence the popularity of the 'adjusted p-value' terminology), but in fact is not natural if there is heterogeneity between the tests, e.g., if we know that some tests have more power than others, or some have a higher prior probability of being null than others.

For these reasons, the p-value and not the adjusted p-value is the preferable quantity to use in a volcano plot.