DESeq2 lfcThreshold question: different numbers when filtering in results() vs. filtering afterwards?
1
0
Entering edit mode
psutton • 0
@psutton-20110
Last seen 5.1 years ago

I have a question about lfcThreshold in DESeq2. In the sample code below, I use the airway demo data, have a variable lfcThreshold = log2(1.5).

The variable res1 contains results() without an lfcThreshold (i.e. default 0). Then, res2 is the result of filtering res1 for abs(res1$log2FoldChange) > lfcThreshold. Finally, res3 contains results() with the lfcThreshold explicitly set.

Naively, I expect res2 and res3 to be the same, but they are not (see output below). Can someone help me understand why they are not the same?

library("airway")
data("airway")
se <- airway

library("DESeq2")
dds <- DESeqDataSet(se, design = ~ cell + dex)
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]

dds <- DESeq(dds)
alpha <- 0.1
lfcThreshold <- log2(1.5)
contrast_dex <- c("dex", "trt", "untrt")

res1 <- results(dds, contrast = contrast_dex, alpha = alpha)
summary(res1)

res2 <- res1[ abs(res1$log2FoldChange) > lfcThreshold, ]
summary(res2)
min(abs(res2$log2FoldChange))

res3 <- results(dds, contrast = contrast_dex, alpha = alpha, lfcThreshold = lfcThreshold)
summary(res3)

which gives the summary outputs:

> res1 <- results(dds, contrast = contrast_dex, alpha = alpha)
> summary(res1)

out of 22369 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 2610, 12%
LFC < 0 (down)     : 2224, 9.9%
outliers [1]       : 0, 0%
low counts [2]     : 4337, 19%
(mean count < 5)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

> 
> res2 <- res1[ abs(res1$log2FoldChange) > lfcThreshold, ]
> summary(res2)

out of 5990 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 1121, 19%
LFC < 0 (down)     : 1139, 19%
outliers [1]       : 0, 0%
low counts [2]     : 2394, 40%
(mean count < 5)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

> min(abs(res2$log2FoldChange))
[1] 0.5849997

> 
> res3 <- results(dds, contrast = contrast_dex, alpha = alpha, lfcThreshold = lfcThreshold)
> summary(res3)

out of 22369 with nonzero total read count
adjusted p-value < 0.1
LFC > 0.58 (up)    : 401, 1.8%
LFC < -0.58 (down) : 283, 1.3%
outliers [1]       : 0, 0%
low counts [2]     : 3470, 16%
(mean count < 3)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

deseq2 • 701 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 11 minutes ago
United States

Take a look at the DESeq2 paper or vignette on what LFC threshold tests are doing.

ADD COMMENT
0
Entering edit mode

Thanks Michael, I think this section from the DESeq2 paper answers my question:

"A common procedure is to disregard genes whose estimated LFC βir is below some threshold, | βir | ≤ θ. However, this approach loses the benefit of an easily interpretable FDR, as the reported P value and adjusted P value still correspond to the test of zero LFC. It is therefore desirable to include the threshold in the statistical testing procedure directly, i.e., not to filter post hoc on a reported fold-change estimate, but rather to evaluate statistically directly whether there is sufficient evidence that the LFC is above the chosen threshold."

ADD REPLY

Login before adding your answer.

Traffic: 891 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6