p -values smaller than machine epsilon in edgeR
1
0
Entering edit mode
oscarf • 0
@oscarf-18146
Last seen 6.1 years ago

Hi,

our group has been using the functions kegga and goana of edgeR to perform overrepresentation analysis of GO ontologies and KEGG pathways. We notice than some ontologies and pathways get a p-value much smaller than the machine epsilon. There are two points that we find difficult to understand.

  1.  How are these functions generating the p-values, especially those that are smaller than the epsilon of the machine?

  2.  How meaningful is a comparison of p-values that are negligibly small, say 10^(-250) and 10^(-251)?

We would deeply appreciate any help for the understanding of these questions.

O.F.

edger p values • 1.4k views
ADD COMMENT
0
Entering edit mode

Here’s a link to a thread with same question about DESeq2

p-values smaller than machine epsilon in DESeq2

ADD REPLY
2
Entering edit mode
@gordon-smyth
Last seen 4 hours ago
WEHI, Melbourne, Australia

What does "machine epsilon" mean?

I think you may have misunderstood what machine precision means in floating-point computer arithmetic. The machine epsilon value determines the relative accuracy of computed numbers, not the absolute size or absolute error. A machine epsilon of 10^(-16) means that computed numbers have 16 significant figures. It does not mean that we can't compute any quantity less than 10^(-16). If you aren't sure what I mean by "significant figures", here's a tutorial on scientific notation and significant figures.

R has no trouble computing very small p-values. For example, suppose we had a one-sided Z-statistic equal to 200. The log10(P-value) can be computed like this:

> Z <- 200
> loge.pvalue <- pnorm(Z, lower.tail=FALSE, log=TRUE)
> log10.pvalue <- loge.pvalue / log(10)
> log10.pvalue
[1] -8688.59

So the p-value is about 10^(-8689). In fact the log10 value itself is computed to 16 significant figures although only 6 figures are show in the print out above!

How meaningful it is to compare very small p-values?

Very small p-values like 10^(-250) and 10^(-251) have the same inferential meaning, i.e., have no practical difference from an interpretation point of view, but we consider it important to keep them distinct in reporting edgeR results so that the GO or KEGG terms can be ordered properly by size of enrichment. Some high impact journals like Nature indeed have the requirement that you report exact p-values in published articles rather than rounding them to < 10^(-16) or whatever.

How believable are very small p-values?

kegga() and goana() use one-sided Fisher's exact tests (aka hypergeometric tests). This is the most popular way to conduct GO analyses in the literature, but it assumes all the genes to be statistically independent of one another, which actually they are not. This assumption can cause p-values to be smaller than they should be.

If you want more careful p-values that allow for inter-gene correlation, then use ROAST, FRY or CAMERA gene set tests. Just type help("roast") or help("roast.DGEList").

ADD COMMENT

Login before adding your answer.

Traffic: 572 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6