5 months ago by

Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia

**What does "machine epsilon" mean?**

I think you may have misunderstood what machine precision means in floating-point computer arithmetic. The machine epsilon value determines the relative accuracy of computed numbers, not the absolute size or absolute error. A machine epsilon of 10^(-16) means that computed numbers have 16 significant figures. It does not mean that we can't compute any quantity less than 10^(-16). If you aren't sure what I mean by "significant figures", here's a tutorial on scientific notation and significant figures.

R has no trouble computing very small p-values. For example, suppose we had a one-sided Z-statistic equal to 200. The log10(P-value) can be computed like this:

> Z <- 200
> loge.pvalue <- pnorm(Z, lower.tail=FALSE, log=TRUE)
> log10.pvalue <- loge.pvalue / log(10)
> log10.pvalue
[1] -8688.59

So the p-value is about 10^(-8689). In fact the log10 value itself is computed to 16 significant figures although only 6 figures are show in the print out above!

**How meaningful it is to compare very small p-values?**

Very small p-values like 10^(-250) and 10^(-251) have the same inferential meaning, i.e., have no practical difference from an interpretation point of view, but we consider it important to keep them distinct in reporting edgeR results so that the GO or KEGG terms can be ordered properly by size of enrichment. Some high impact journals like *Nature* indeed have the requirement that you report exact p-values in published articles rather than rounding them to < 10^(-16) or whatever.

**How believable are very small p-values?**

kegga() and goana() use one-sided Fisher's exact tests (aka hypergeometric tests). This is the most popular way to conduct GO analyses in the literature, but it assumes all the genes to be statistically independent of one another, which actually they are not. This assumption can cause p-values to be smaller than they should be.

If you want more careful p-values that allow for inter-gene correlation, then use ROAST, FRY or CAMERA gene set tests. Just type help("roast") or help("roast.DGEList").

Here’s a link to a thread with same question about DESeq2

p-values smaller than machine epsilon in DESeq2

23k