Question

limma p-value and adjusted p-value meaning

1

Entering edit mode

arfranco ▴ 130

@arfranco-8341

Last seen 10 months ago

European Union

I've performed a limma differential expression analysis of microarray data, and ended having many genes with a p-value lower than 0.05 whose adjusted p-value was always higher than 0.05

I must admit that my statistical knowledge is pretty low, and I am hoping that anybody can explain the meaning of this

limma • 18k views

ADD COMMENT • link 8.6 years ago arfranco ▴ 130

0

Entering edit mode

arfranco ▴ 130

@arfranco-8341

Last seen 10 months ago

European Union

However, there is something confusing to me.

I did a treatment with bacteria with and without scavenging the iron content of media, and it comes out that many genes, whose adjusted p-values are less than 0.05, are iron transporter being induced differentially in the media lacking iron. There are other genes that has sense to be DE in the absence of iron

ADD COMMENT • link 8.6 years ago arfranco ▴ 130

0

Entering edit mode

"However, there is something confusing to me"

Well ... this is where life gets interesting now, right? :-)

ADD REPLY • link 8.6 years ago Steve Lianoglou ★ 13k

score 5 · Accepted Answer · 2015-09-04

Short answer: You have no differentially expressed (DE) genes at a false discovery rate (FDR) threshold of 5%.

Long answer: Let's say you have a large number of genes, all of which are not DE. Of these, we would expect 5% of them to have p-values below 0.05, simply due to chance (recall that the p-value is not a fixed quantity, but instead, randomly varies between 0 to 1 under the assumption that the null hypothesis is true, i.e., there is no DE for each gene). If we were to define significantly DE genes based on selecting those with p-values below 0.05, we would end up with a non-empty "DE list" full of non-DE genes. This would make us look rather silly.

To avoid this, we need to correct for the number of tests that we're performing, i.e., the number of genes, given that we're testing for DE in each gene. The most widely used correction for genomic studies is the Benjamini-Hochberg (BH) correction, that aims to control the FDR across significant genes. Applying the BH method yields the adjusted p-values that you see after running topTable (assuming you haven't changed the adjust.method). A set of significantly DE genes can be defined by selecting those genes with adjusted p-values below a desired threshold. For example, if we set a threshold of 0.05, the resulting DE set would be such that under 5% of the genes in that set are expected to be non-DE, i.e., the FDR is controlled below 5%.

The multiplicity correction will invariably increase the size of the p-values, as it needs to account for the possibility of increased false positives when the number of tests increases. Thus, even if your p-values are below 0.05, your BH-adjusted p-values may not be. Indeed, in the above example with non-DE genes, many of those will have p-values below 0.05, but none should have adjusted p-values below 0.05 (with some caveats that I won't go into). You should be using the adjusted values if you're doing genome-wide analyses; if you're not getting any genes with adjusted p-values below 0.05, this means you don't have any DE genes at a FDR threshold of 5%.