DESeq design with and without intercept is giving different results
1
0
Entering edit mode
ktz • 0
@04e46025
Last seen 23 minutes ago
United Kingdom

Hello, I'm trying to identify differentially expressed genes between some conditions. I have read that including and not including the intercept in the design should lead to identical results for the equivalent contrast arguments. However, I am getting different results in my analysis and was wondering if anyone know why this is happening? I have looked for other posts similar to my question but have not been able to find much help.

My coldata looks something like this:

enter image description here

And here is my code to test with and without the intercept in the design:

# with intercept (I get about 1266 genes with padj < 0.05)

test1 <- DESeqDataSetFromMatrix(countData = mat,
                                   colData = meta,
                                   design = ~ treatment,
                                   rowData = genes.list) 

test1 <- DESeq(test1)

resultsNames(test1)

a <- as.data.frame(results(test1, contrast = list("treatment_A_vs_B"), tidy = TRUE))

# without intercept (I get about 1149 genes with padj < 0.05)

test2 <- DESeqDataSetFromMatrix(countData = mat,
                                   colData = meta,
                                   design = ~ 0 + treatment,
                                   rowData = genes.list) 

test2 <- DESeq(test2)

resultsNames(test2)

b <- as.data.frame(results(test2, contrast = list("treatmentA", "treatmentB"), tidy = TRUE))

Similarly, I also get different results when I compare results with the designs ~ donor + treatment and ~ 0 + treatment + donor. This is what I ultimately want (differences in treatment while controlling for donor); I have used only treatment above just as an example.

Would appreciate any insight!

DESeq2 • 475 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 12 hours ago
United States

This can happen due to some instability when one of the groups used for the reference has all zeros. If you plot the Wald stat from the two comparisons, I would expect correlation near 1.

ADD COMMENT
0
Entering edit mode

Hi Michael, thank you for your reply. Like you said the correlation between the two is 0.94. The genes were more or less the same when I did more stringent pre-filtering (removing rows with lots of zeroes), but I do have a few questions:

Is there anything else I could do to retain the number of genes? (maybe pre-filtering by treatment groups).

If i did not pre-filter, would the with or without intercept design be more "accurate"? I am wondering as they are quite different.

Would LRT be a better option?

The vignette says that pre-filtering is not required, but how does independent filtering deal with this if it takes the mean of normalised counts? As in shouldn't I be getting the same results for with and without the intercept in the design even if I don't pre-filter due to independent filtering?

ADD REPLY
0
Entering edit mode

Pre-filtering makes sense to me, for example maybe you want to focus on genes with minimal expression in particular groups being compared, or all groups.

There may be a lot of heterogeneity across all of your groups, and you may have better modeling with just subsetting to the two groups and running DESeq() just on the two groups.

ADD REPLY
0
Entering edit mode

Thank you for your advice! Looking at my PCA, I do have a lot of within group variability so subsetting makes sense.

I was asked to compare each group to an average of other groups (for example group V to the average of D, DL and VL). But I'm not sure about this as subsetting and doing pairwise comparisons makes most sense for this data. Out of curiosity, in cases like these with some within group variability, would this type of comparison (one to an average of others) make sense? If yes, would design = ~ 0 + treatment be acceptable? Or should this avoided?

enter image description here

ADD REPLY
0
Entering edit mode

A few things, first, yes ~0 + treatment is the right design to compare one group to the average of others.

However, I don't recommend doing this type of comparison very often, and you're right on track with why it may be an issue.

Even if there wasn't a lot of heterogeneity across groups in terms of variance, another reason is that the average doesn't necessarily represent anything that exists.

E.g. three people: A is 150 cm tall, B is 160 cm, C is 170 cm. The LFC of B compared to the average of A and C is 0... but is this meaningful?

ADD REPLY
0
Entering edit mode

That all makes sense and is clear now. Thank you very much for your help!

ADD REPLY

Login before adding your answer.

Traffic: 753 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6