I am using DESeq2 with single-cell data. Previously, I used the "Wald" test with default settings but I wanted to check how my previous analysis would look using the recommend single-cell settings from the vignette (basically using the "LRT" test).
I have two variables and to simplify the question I will use the same notation as the vignette. So, my design matrix would have two columns: genotype (1,2,3,4) and condition (D, H).
For this comparison I am only interest in asking what genes are D.E. between the conditions for genotype 1.
First, I ran DESeq2 using the Ward test in 3 ways:
- Using an interaction term (~ genotype + condition + genotype:condition)
- Combining the genotype and condition columns (~ combined)
- Slicing the counts matrix to only contain columns with genotype 1 (~ condition)
The number of genes that I got at a padj of 0.05 were (4,4 and 6) and they overlap (not entirely).
Second, I ran DESeq2 using the LRT test in the same 3 ways described above. I get more genes as expected but the numbers look a bit odd (same padj==0.05):
- 108
- 3831
- 10
They don't overlap so well with the genes I got using the Wald test (specially top genes) and the amount of "highly confident" D.E. genes with the "combined" method seems strangely high.
Could I trust these results with the LRT method? I feel more confident sticking with the Wald test even though I get less genes.
Yes, you are right. I figured for 2) and 3) it would be obvious.
The reduced formulas for the LRT runs were:
I am using size factors computed with Scran (default settings) and the following parameters for the DESeq() method:
useT=TRUE, minmu=1e-6, minReplicatesForReplace=Inf
I use results() to obtain the list of genes in the following ways:
I generated the combined column like this:
colData(dds)$combined = factor(paste(meta$genotype, meta$condition, sep="_"))
Would it be better to include the whole code?