Hi,
I am trying to understand why I get unexpected signs (+/-) in my log2FC when I do LRT using DESEq2.
If I use Wald test, I can state the reference condition (A.STATIC) using the contrast function like this:
dds <- DESeqDataSetFromMatrix(countData = counts_data, colData = colData, design = ~ group)
res1 <- results(dds, contrast=c("group", "A.LOW", "A.STATIC"), alpha = 0.1)
In this case, the resulting log2FC have the expected signs (expected, because I plotted the raw counts per each condition). Notice that for these contrasts, I am already grouping my factors (factor1: LOW & factor2: A, vs. factor1: STATIC & factor2: A).
However, when I do LRT, since I have multiple levels in the factors, I don't know anymore what comparisons is the test doing (I know it is comparing the full model vs the reduced model, but then, what happens with the log2FC?). In this case I am keeping factor1 and factor2 split, because I want to test their interaction.
dds <- DESeqDataSetFromMatrix(countData = counts_data, colData = colData, design = ~ factor1 + factor2)
dds$factor1 <- factor(dds$factor1, levels = c("STATIC", "LOW", "MOD", "HIGH1", "HIGH2"))
dds$factor2 <- factor(dds$factor2, levels = c("A", "B", "C"))
#Here I think it's just the same as using "relevel"; it will pick the first level as reference in each case, right?
mm1 <- model.matrix(~ dds$factor1 + dds$factor2 + dds$factor1:dds$factor2, colData) #did this to remove 0-columns, to be able to include the interaction.
idx <- which(colSums(mm1 == 0) == nrow(mm1))
mm1 <- mm1[,-idx]
design(dds) <- ~ factor2 + factor2 + factor1:factor2 # it is necessary to apply the design again
keep <- rowSums(counts(dds) >= 10) >= 5
dds <- dds[keep,]
dds
# Generate the reduced matrix:
mm0 <- model.matrix(~ dds$factor1 + dds$factor2, colData) # INTERACTION will be tested
# Test the reduced matrix using LRT:
dds1 <- DESeq(dds, test="LRT", full = mm1, reduced = mm0) # Testing INTERACTION
res1 <- results(dds1)
To check if the LRT was just giving a random comparison log2FC result, I compared the log2FC from the LRT vs the log2FCs of that gene obtained by Wald tests, and none matched.
Questions:
- What conditions are being compared to obtain the log2FC in the LRT dataframe? Is the reference being properly selected using the above code? (ref= factor1:STATIC, factor2:A).
- If the log2FC in the LRT results are not "useful", does it mean that I can only acknowledge UP or DOWN-regulation of a gene using Walt test?
Thank you.
Laia