Question

p-values smaller than machine epsilon in DESeq2

0

Entering edit mode

oscarf • 0

@oscarf-18146

Last seen 5.5 years ago

Hi,

our group has been using the function DESeq of the library DESeq2 to perform differential expression analysis. We notice than some features get a p-value much smaller than the machine epsilon. There are two points that we find difficult to understand.

1. How are these functions generating the p-values, especially those that are smaller than the epsilon of the machine?

2. How meaningful is a comparison of p-values that are negligibly small, say 10^(-250) and 10^(-251)?

We would deeply appreciate any help for the understanding of these questions.

O.F.

deseq2 pvalue • 848 views

ADD COMMENT • link updated 5.5 years ago by Michael Love 41k • written 5.5 years ago by oscarf • 0

0

Entering edit mode

Here's a link to a similar thread about edgeR:

p -values smaller than machine epsilon in edgeR

ADD REPLY • link 5.5 years ago Michael Love 41k

score 0 · Answer 1 · 2018-11-01

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 19 hours ago

United States

I wouldn’t make a big difference between very small pvalues, essentially the data is not consistent with a null model, eg many samples with 0 vs many samples with very high counts will give a very small pvalue. Just consider it a rejectable set of genes at an FDR that you specify. Our work (DESeq2 and now apeglm) focuses a lot on effect size estimation, and LFC thresholds greater than 0. Point null rejection as we argue is sometimes a trivial hurdle to pass over for many genes in well powered gene expression studies.

The machine epsilon point I don’t really see what you’re getting at. We can calculate tail probabilities of a distribution.

ADD COMMENT • link 5.5 years ago Michael Love 41k

0

Entering edit mode

Or, to answer the second question very directly: Not at all.

A very low p value means that either model asumption is wrong or the evidence against the null hypothesis with almost certainty. There is little point in seeing different levels between being almost certain.

This is why I usually recommend to use p values only as a cut point: Once you have decided on your significance threshold, cut your result list there, then forget about the p values and sort by (shrunken!) fold change to find the most interesting hits.

ADD REPLY • link 5.5 years ago Simon Anders ★ 3.7k