Question: Effect of lfcThreshold on p-value in DESeq2
0
gravatar for Russ Fraser
20 months ago by
Ontario Veterinary College, University of Guelph, Guelph, Ontario, Canada
Russ Fraser0 wrote:

I'm having a bit of trouble understanding how lfcThreshold parameter of the results function of DESeq2 affects the p-value. Imposing a stricter lfcThreshold reduces the number of significant results, which makes sense, however, it does so above and beyond what I would have expected.

From my data:

lfc <- 0
thrHuh <- results(dds_huh7, contrast = c("group", "infected_3", "uninfected_3"), alpha = alpha, lfcThreshold = lfc)
summary(thrHuh)

out of 72860 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)     : 1545, 2.1%
LFC < 0 (down)   : 1278, 1.8%
outliers [1]     : 39, 0.054%
low counts [2]   : 42376, 58%
(mean count < 6)

sig <- subset(thrHuh, thrHuh$padj < 0.05)
nrow(subset(sig, sig$log2FoldChange >= 2 | sig$log2FoldChange <= -2))
[1] 495

My interpretation of this is that I have 495 (significant) genes that have a greater than 4-fold change. My (obviously incorrect) understanding is that if I set lfcThreshold <- 2 I would get the same results, but I don't:

lfc <- 2
thrHuh <- results(dds_huh7, contrast = c("group", "infected_3", "uninfected_3"), alpha = alpha, lfcThreshold = lfc)
summary(thrHuh)

out of 72860 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)     : 21, 0.029%
LFC < 0 (down)   : 3, 0.0041%
outliers [1]     : 39, 0.054%
low counts [2]   : 8476, 12%
(mean count < 0)

I've looked at the DESEq2 paper, the vignette, and the workflow, and cannot figure why one method gives 495 and the other 24. Is it simply because there are fewer tests being run in the latter case, and this is reflected in the adjusted p value? If I only want to consider genes that are a) significant and b) have at least a 4-fold change or more, which is the better one to use?

Thanks,

Russ

edits: formatting mistakes...

rnaseq deseq2 statistics • 1.4k views
ADD COMMENTlink modified 20 months ago by James W. MacDonald50k • written 20 months ago by Russ Fraser0
Answer: Effect of lfcThreshold on p-value in DESeq2
7
gravatar for James W. MacDonald
20 months ago by
United States
James W. MacDonald50k wrote:

The lfcThreshold isn't doing what you think it is doing. You could hypothetically do what you have done to get the 495 genes, which is to do a post hoc filtering on the fold changes. The problem with that is you have now destroyed any meaning for your p-values, which were testing for evidence that the fold change is different from zero.

In other words, a p-value is a test where you compare the range of Wald statistics you would expect to get if there were no differences between your groups. There will still be variation due to sampling, so you will by happenstance get some large statistics, and the p-value tells you the long run probability of seeing your observed result under the null. In the first case, the null distribution was one where the means are identical. Since your p-value was based on the null of no differences, adding in an extra criterion invalidates the original meaning of the p-value (and any multiplicity adjustment you might then make).

If you specify the lfcThreshold, you incorporate the fold change criterion into the Wald statistic, and are now testing your observed result against the values you would expect under the null distribution where the difference between the two groups is no larger than the lfc you have specified. This is a much more conservative threshold, and you should expect far fewer genes.

An alternative way to think about the difference is to note that a post hoc fold change criterion tells you that the fold change between your samples is greater than a certain value, but using lfcThreshold tests the probability that the underlying population fold change is greater than that value, which is entirely different (and what you really want to be testing).

Anyway, you should in general use something smaller like 1, or 0.585 (representing a 2-fold or 1.5-fold difference), rather than something massive like an lfcThreshold of 2.

ADD COMMENTlink written 20 months ago by James W. MacDonald50k

Much clearer. Thanks for the detailed explanation! I will lower the lfcThreshold - was just using 2 as it provided an extreme illustration of the effect.
 

ADD REPLYlink written 20 months ago by Russ Fraser0
1

Thanks James.

And to connect to the DESeq2 paper, this is the relevant paragraph, under the section 'Hypothesis tests with thresholds on effect size':

A common procedure is to disregard genes whose estimated LFC β ir is below some threshold, |β ir |≤θ. However, this approach loses the benefit of an easily interpretable FDR, as the reported P value and adjusted P value still correspond to the test of zero LFC. It is therefore desirable to include the threshold in the statistical testing procedure directly, i.e., not to filter post hoc on a reported fold-change estimate, but rather to evaluate statistically directly whether there is sufficient evidence that the LFC is above the chosen threshold.

ADD REPLYlink written 20 months ago by Michael Love24k

Very nice explanation, I have just one question. If I have understood you, to apply somekind of LFC filter it must be done by the addition of lfcThreshold option. In this way we conserve the real meaning of the padj result of our FDR test, but if we use:

sig_Wild_LA <- Wild_LA[which(Wild_LA$log2FoldChange > 1.99999), ] (example)

My subset of DGE now has no sense padj values.

Is this correct?

I have this doubt because I found a lot of tutorial including this filtering step and not even one using the lfcThreshold option.

ADD REPLYlink written 18 months ago by pablo_garcia0520

Yes, you've understood correctly. There is a lot of bad advice out there, and one of the most common bits of bad advice is filtering based on fold change separately from significance testing. If you filter fold changes in this way, the FDR estimation has no knowledge of your fold change filtering criterion, so there's no way that the reported FDR could possibly have taken your filtering criterion into account. Depending on the data and the specific way in which you implement this flawed double filtering strategy, you could end up with FDR values that either greatly overestimate or underestimate the true significance.

ADD REPLYlink written 18 months ago by Ryan C. Thompson7.3k

Thank you Ryan. I'm not surprised, I was working with RTqPCR and data analysis is a blackhole even for a technique with more than 15 years-old...

RNAseq papers usually has a M&M quite schematic and I have seen a lot of people using software in a wrong way... without any problem to cross "referee-barrier".

Thank you again!

ADD REPLYlink written 18 months ago by pablo_garcia0520
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 311 users visited in the last hour