Question: p -values smaller than machine epsilon in edgeR
0
gravatar for oscarf
7 months ago by
oscarf0
oscarf0 wrote:

Hi,

our group has been using the functions kegga and goana of edgeR to perform overrepresentation analysis of GO ontologies and KEGG pathways. We notice than some ontologies and pathways get a p-value much smaller than the machine epsilon. There are two points that we find difficult to understand.

  1.  How are these functions generating the p-values, especially those that are smaller than the epsilon of the machine?

  2.  How meaningful is a comparison of p-values that are negligibly small, say 10^(-250) and 10^(-251)?

We would deeply appreciate any help for the understanding of these questions.

O.F.

edger p values • 238 views
ADD COMMENTlink modified 7 months ago by Gordon Smyth37k • written 7 months ago by oscarf0

Here’s a link to a thread with same question about DESeq2

p-values smaller than machine epsilon in DESeq2

ADD REPLYlink written 7 months ago by Michael Love24k
Answer: p -values smaller than machine epsilon in edgeR
2
gravatar for Gordon Smyth
7 months ago by
Gordon Smyth37k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth37k wrote:

What does "machine epsilon" mean?

I think you may have misunderstood what machine precision means in floating-point computer arithmetic. The machine epsilon value determines the relative accuracy of computed numbers, not the absolute size or absolute error. A machine epsilon of 10^(-16) means that computed numbers have 16 significant figures. It does not mean that we can't compute any quantity less than 10^(-16). If you aren't sure what I mean by "significant figures", here's a tutorial on scientific notation and significant figures.

R has no trouble computing very small p-values. For example, suppose we had a one-sided Z-statistic equal to 200. The log10(P-value) can be computed like this:

> Z <- 200
> loge.pvalue <- pnorm(Z, lower.tail=FALSE, log=TRUE)
> log10.pvalue <- loge.pvalue / log(10)
> log10.pvalue
[1] -8688.59

So the p-value is about 10^(-8689). In fact the log10 value itself is computed to 16 significant figures although only 6 figures are show in the print out above!

How meaningful it is to compare very small p-values?

Very small p-values like 10^(-250) and 10^(-251) have the same inferential meaning, i.e., have no practical difference from an interpretation point of view, but we consider it important to keep them distinct in reporting edgeR results so that the GO or KEGG terms can be ordered properly by size of enrichment. Some high impact journals like Nature indeed have the requirement that you report exact p-values in published articles rather than rounding them to < 10^(-16) or whatever.

How believable are very small p-values?

kegga() and goana() use one-sided Fisher's exact tests (aka hypergeometric tests). This is the most popular way to conduct GO analyses in the literature, but it assumes all the genes to be statistically independent of one another, which actually they are not. This assumption can cause p-values to be smaller than they should be.

If you want more careful p-values that allow for inter-gene correlation, then use ROAST, FRY or CAMERA gene set tests. Just type help("roast") or help("roast.DGEList").

ADD COMMENTlink modified 7 months ago • written 7 months ago by Gordon Smyth37k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 164 users visited in the last hour