Question

Effect of lfcThreshold on p-value in DESeq2

3

Entering edit mode

Russ Fraser ▴ 30

@russ-fraser-13646

Last seen 5.7 years ago

Ontario Veterinary College, University …

I'm having a bit of trouble understanding how lfcThreshold parameter of the results function of DESeq2 affects the p-value. Imposing a stricter lfcThreshold reduces the number of significant results, which makes sense, however, it does so above and beyond what I would have expected.

From my data:

lfc <- 0
thrHuh <- results(dds_huh7, contrast = c("group", "infected_3", "uninfected_3"), alpha = alpha, lfcThreshold = lfc)
summary(thrHuh)

out of 72860 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)     : 1545, 2.1%
LFC < 0 (down)   : 1278, 1.8%
outliers [1]     : 39, 0.054%
low counts [2]   : 42376, 58%
(mean count < 6)

sig <- subset(thrHuh, thrHuh$padj < 0.05)
nrow(subset(sig, sig$log2FoldChange >= 2 | sig$log2FoldChange <= -2))
[1] 495

My interpretation of this is that I have 495 (significant) genes that have a greater than 4-fold change. My (obviously incorrect) understanding is that if I set lfcThreshold <- 2 I would get the same results, but I don't:

lfc <- 2
thrHuh <- results(dds_huh7, contrast = c("group", "infected_3", "uninfected_3"), alpha = alpha, lfcThreshold = lfc)
summary(thrHuh)

out of 72860 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)     : 21, 0.029%
LFC < 0 (down)   : 3, 0.0041%
outliers [1]     : 39, 0.054%
low counts [2]   : 8476, 12%
(mean count < 0)

I've looked at the DESEq2 paper, the vignette, and the workflow, and cannot figure why one method gives 495 and the other 24. Is it simply because there are fewer tests being run in the latter case, and this is reflected in the adjusted p value? If I only want to consider genes that are a) significant and b) have at least a 4-fold change or more, which is the better one to use?

Thanks,

Russ

edits: formatting mistakes...

deseq2 rnaseq statistics • 10k views

ADD COMMENT • link updated 6 months ago by camillab. ▴ 10 • written 6.5 years ago by Russ Fraser ▴ 30

score 16 · Accepted Answer · 2017-10-11

16

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 1 hour ago

United States

The lfcThreshold isn't doing what you think it is doing. You could hypothetically do what you have done to get the 495 genes, which is to do a post hoc filtering on the fold changes. The problem with that is you have now destroyed any meaning for your p-values, which were testing for evidence that the fold change is different from zero.

In other words, a p-value is a test where you compare the range of Wald statistics you would expect to get if there were no differences between your groups. There will still be variation due to sampling, so you will by happenstance get some large statistics, and the p-value tells you the long run probability of seeing your observed result under the null. In the first case, the null distribution was one where the means are identical. Since your p-value was based on the null of no differences, adding in an extra criterion invalidates the original meaning of the p-value (and any multiplicity adjustment you might then make).

If you specify the lfcThreshold, you incorporate the fold change criterion into the Wald statistic, and are now testing your observed result against the values you would expect under the null distribution where the difference between the two groups is no larger than the lfc you have specified. This is a much more conservative threshold, and you should expect far fewer genes.

An alternative way to think about the difference is to note that a post hoc fold change criterion tells you that the fold change between your samples is greater than a certain value, but using lfcThreshold tests the probability that the underlying population fold change is greater than that value, which is entirely different (and what you really want to be testing).

Anyway, you should in general use something smaller like 1, or 0.585 (representing a 2-fold or 1.5-fold difference), rather than something massive like an lfcThreshold of 2.

ADD COMMENT • link 6.5 years ago James W. MacDonald 65k

0

Entering edit mode

Much clearer. Thanks for the detailed explanation! I will lower the lfcThreshold - was just using 2 as it provided an extreme illustration of the effect.

ADD REPLY • link 6.5 years ago Russ Fraser ▴ 30

4

Entering edit mode

Thanks James.

And to connect to the DESeq2 paper, this is the relevant paragraph, under the section 'Hypothesis tests with thresholds on effect size':

A common procedure is to disregard genes whose estimated LFC β ir is below some threshold, |β ir |≤θ. However, this approach loses the benefit of an easily interpretable FDR, as the reported P value and adjusted P value still correspond to the test of zero LFC. It is therefore desirable to include the threshold in the statistical testing procedure directly, i.e., not to filter post hoc on a reported fold-change estimate, but rather to evaluate statistically directly whether there is sufficient evidence that the LFC is above the chosen threshold.

ADD REPLY • link 6.5 years ago Michael Love 41k

0

Entering edit mode

great so the best approach would be using the lfcThreshold parameter to make sure that both p.value and log2fc are taken into account in the right way?

ADD REPLY • link 6 months ago camillab. ▴ 10

0

Entering edit mode

Very nice explanation, I have just one question. If I have understood you, to apply somekind of LFC filter it must be done by the addition of lfcThreshold option. In this way we conserve the real meaning of the padj result of our FDR test, but if we use:

sig_Wild_LA <- Wild_LA[which(Wild_LA$log2FoldChange > 1.99999), ] (example)

My subset of DGE now has no sense padj values.

Is this correct?

I have this doubt because I found a lot of tutorial including this filtering step and not even one using the lfcThreshold option.

ADD REPLY • link 6.4 years ago pablo_garcia05 ▴ 20

2

Entering edit mode

Yes, you've understood correctly. There is a lot of bad advice out there, and one of the most common bits of bad advice is filtering based on fold change separately from significance testing. If you filter fold changes in this way, the FDR estimation has no knowledge of your fold change filtering criterion, so there's no way that the reported FDR could possibly have taken your filtering criterion into account. Depending on the data and the specific way in which you implement this flawed double filtering strategy, you could end up with FDR values that either greatly overestimate or underestimate the true significance.

ADD REPLY • link 6.4 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Thank you Ryan. I'm not surprised, I was working with RTqPCR and data analysis is a blackhole even for a technique with more than 15 years-old...

RNAseq papers usually has a M&M quite schematic and I have seen a lot of people using software in a wrong way... without any problem to cross "referee-barrier".

Thank you again!

ADD REPLY • link 6.4 years ago pablo_garcia05 ▴ 20