Question

p -values smaller than machine epsilon in edgeR

0

Entering edit mode

oscarf • 0

@oscarf-18146

Last seen 7.3 years ago

Hi,

our group has been using the functions kegga and goana of edgeR to perform overrepresentation analysis of GO ontologies and KEGG pathways. We notice than some ontologies and pathways get a p-value much smaller than the machine epsilon. There are two points that we find difficult to understand.

1. How are these functions generating the p-values, especially those that are smaller than the epsilon of the machine?

2. How meaningful is a comparison of p-values that are negligibly small, say 10^(-250) and 10^(-251)?

We would deeply appreciate any help for the understanding of these questions.

O.F.

edger p values • 1.7k views

ADD COMMENT • link updated 7.3 years ago by Gordon Smyth 53k • written 7.3 years ago by oscarf • 0

0

Entering edit mode

Here’s a link to a thread with same question about DESeq2

p-values smaller than machine epsilon in DESeq2

ADD REPLY • link 7.3 years ago Michael Love 43k

score 2 · Answer 1 · 2018-11-01

What does "machine epsilon" mean?

I think you may have misunderstood what machine precision means in floating-point computer arithmetic. The machine epsilon value determines the relative accuracy of computed numbers, not the absolute size or absolute error. A machine epsilon of 10^(-16) means that computed numbers have 16 significant figures. It does not mean that we can't compute any quantity less than 10^(-16). If you aren't sure what I mean by "significant figures", here's a tutorial on scientific notation and significant figures.

R has no trouble computing very small p-values. For example, suppose we had a one-sided Z-statistic equal to 200. The log10(P-value) can be computed like this:

> Z <- 200
> loge.pvalue <- pnorm(Z, lower.tail=FALSE, log=TRUE)
> log10.pvalue <- loge.pvalue / log(10)
> log10.pvalue
[1] -8688.59

So the p-value is about 10^(-8689). In fact the log10 value itself is computed to 16 significant figures although only 6 figures are show in the print out above!

How meaningful it is to compare very small p-values?

Very small p-values like 10^(-250) and 10^(-251) have the same inferential meaning, i.e., have no practical difference from an interpretation point of view, but we consider it important to keep them distinct in reporting edgeR results so that the GO or KEGG terms can be ordered properly by size of enrichment. Some high impact journals like Nature indeed have the requirement that you report exact p-values in published articles rather than rounding them to < 10^(-16) or whatever.

How believable are very small p-values?

kegga() and goana() use one-sided Fisher's exact tests (aka hypergeometric tests). This is the most popular way to conduct GO analyses in the literature, but it assumes all the genes to be statistically independent of one another, which actually they are not. This assumption can cause p-values to be smaller than they should be.

If you want more careful p-values that allow for inter-gene correlation, then use ROAST, FRY or CAMERA gene set tests. Just type help("roast") or help("roast.DGEList").