Question

DESeq2 LRT test behaving strangely?

0

Entering edit mode

casikecola • 0

@casikecola-16424

Last seen 5.4 years ago

I am using DESeq2 with single-cell data. Previously, I used the "Wald" test with default settings but I wanted to check how my previous analysis would look using the recommend single-cell settings from the vignette (basically using the "LRT" test).

I have two variables and to simplify the question I will use the same notation as the vignette. So, my design matrix would have two columns: genotype (1,2,3,4) and condition (D, H).

For this comparison I am only interest in asking what genes are D.E. between the conditions for genotype 1.

First, I ran DESeq2 using the Ward test in 3 ways:

Using an interaction term (~ genotype + condition + genotype:condition)
Combining the genotype and condition columns (~ combined)
Slicing the counts matrix to only contain columns with genotype 1 (~ condition)

The number of genes that I got at a padj of 0.05 were (4,4 and 6) and they overlap (not entirely).

Second, I ran DESeq2 using the LRT test in the same 3 ways described above. I get more genes as expected but the numbers look a bit odd (same padj==0.05):

108
3831
10

They don't overlap so well with the genes I got using the Wald test (specially top genes) and the amount of "highly confident" D.E. genes with the "combined" method seems strangely high.

Could I trust these results with the LRT method? I feel more confident sticking with the Wald test even though I get less genes.

deseq2 • 854 views

ADD COMMENT • link 5.3 years ago casikecola • 0

0

Entering edit mode

Yes, you are right. I figured for 2) and 3) it would be obvious.

The reduced formulas for the LRT runs were:

(~ genotype + condition) and (~ genotype) choosing eventually the second one
(~ 1)
(~ 1)

I am using size factors computed with Scran (default settings) and the following parameters for the DESeq() method:

useT=TRUE, minmu=1e-6, minReplicatesForReplace=Inf

I use results() to obtain the list of genes in the following ways:

results(dds, contrast=c("condition", "D", "H"))
results(dds, contrast=c("combined", "ID", "IH"))
results(dds, contrast=c("condition", "D", "H"))

I generated the combined column like this:

colData(dds)$combined = factor(paste(meta$genotype, meta$condition, sep="_"))

Would it be better to include the whole code?

ADD REPLY • link 5.3 years ago casikecola • 0

score 0 · Answer 1 · 2019-01-18

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 19 hours ago

United States

Please post your code. The LRT has a reduced model that is very important in the interpretation of results. You can’t just say “I ran an LRT” but you have to describe the two models you compared.

ADD COMMENT • link 5.3 years ago Michael Love 41k

0

Entering edit mode

If you have ~genotype + condition + genotype:condition and a reduced design of ~genotype, this isn't going to be equivalent to any Wald test. You are testing whether there is any effect of condition, genotype-specific or not. Likewise, when you compared ~combined to ~1, it's not equivalent to any Wald test, because you are testing whether there is any effect of condition or genotype. The only roughly comparable one is the last one, whether you just compare a Wald test of condition with ~condition and ~1, within a single genotype.

ADD REPLY • link 5.3 years ago Michael Love 41k

0

Entering edit mode

That makes sense, thanks for the clarification. I was though more concerned about the differences in-between the LRT runs (specially between 2 and 3) but I guess I am testing different things there too. My question is the effect of condition given a genotype and I would be happy to use the LRT test (run 2) but I want to make sure that my assumptions are correct.

ADD REPLY • link 5.3 years ago casikecola • 0