Question: DESeq2 lfcThreshold question: different numbers when filtering in results() vs. filtering afterwards?
0
gravatar for psutton
6 months ago by
psutton0
psutton0 wrote:

I have a question about lfcThreshold in DESeq2. In the sample code below, I use the airway demo data, have a variable lfcThreshold = log2(1.5).

The variable res1 contains results() without an lfcThreshold (i.e. default 0). Then, res2 is the result of filtering res1 for abs(res1$log2FoldChange) > lfcThreshold. Finally, res3 contains results() with the lfcThreshold explicitly set.

Naively, I expect res2 and res3 to be the same, but they are not (see output below). Can someone help me understand why they are not the same?

library("airway")
data("airway")
se <- airway

library("DESeq2")
dds <- DESeqDataSet(se, design = ~ cell + dex)
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]

dds <- DESeq(dds)
alpha <- 0.1
lfcThreshold <- log2(1.5)
contrast_dex <- c("dex", "trt", "untrt")

res1 <- results(dds, contrast = contrast_dex, alpha = alpha)
summary(res1)

res2 <- res1[ abs(res1$log2FoldChange) > lfcThreshold, ]
summary(res2)
min(abs(res2$log2FoldChange))

res3 <- results(dds, contrast = contrast_dex, alpha = alpha, lfcThreshold = lfcThreshold)
summary(res3)

which gives the summary outputs:

> res1 <- results(dds, contrast = contrast_dex, alpha = alpha)
> summary(res1)

out of 22369 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 2610, 12%
LFC < 0 (down)     : 2224, 9.9%
outliers [1]       : 0, 0%
low counts [2]     : 4337, 19%
(mean count < 5)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

> 
> res2 <- res1[ abs(res1$log2FoldChange) > lfcThreshold, ]
> summary(res2)

out of 5990 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 1121, 19%
LFC < 0 (down)     : 1139, 19%
outliers [1]       : 0, 0%
low counts [2]     : 2394, 40%
(mean count < 5)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

> min(abs(res2$log2FoldChange))
[1] 0.5849997

> 
> res3 <- results(dds, contrast = contrast_dex, alpha = alpha, lfcThreshold = lfcThreshold)
> summary(res3)

out of 22369 with nonzero total read count
adjusted p-value < 0.1
LFC > 0.58 (up)    : 401, 1.8%
LFC < -0.58 (down) : 283, 1.3%
outliers [1]       : 0, 0%
low counts [2]     : 3470, 16%
(mean count < 3)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

deseq2 • 104 views
ADD COMMENTlink modified 6 months ago by Michael Love25k • written 6 months ago by psutton0
Answer: DESeq2 lfcThreshold question: different numbers when filtering in results() vs.
0
gravatar for Michael Love
6 months ago by
Michael Love25k
United States
Michael Love25k wrote:

Take a look at the DESeq2 paper or vignette on what LFC threshold tests are doing.

ADD COMMENTlink written 6 months ago by Michael Love25k

Thanks Michael, I think this section from the DESeq2 paper answers my question:

"A common procedure is to disregard genes whose estimated LFC βir is below some threshold, | βir | ≤ θ. However, this approach loses the benefit of an easily interpretable FDR, as the reported P value and adjusted P value still correspond to the test of zero LFC. It is therefore desirable to include the threshold in the statistical testing procedure directly, i.e., not to filter post hoc on a reported fold-change estimate, but rather to evaluate statistically directly whether there is sufficient evidence that the LFC is above the chosen threshold."

ADD REPLYlink written 6 months ago by psutton0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 301 users visited in the last hour