Dear Bioconductor/DESeq2 community,
I am trying to analyze RNAseq data with DESseq2 and have a question about contrasts and interactions. I have RNAseq data on untreated and treated cells collected at 4 different time points. I have quantified my transcripts using salmon. Initially, I was performing DESeq2 analysis at each time point separately. For example, here is the sample file for testing for differential expression at 2hr:
sample,condition,replicate
1,mock.2hr,1
2,mock.2hr,2
3,mock.2hr,3
4,infected.2hr,1
5,infected.2hr,2
6,infected.2hr,3
Here is my code snippet
# set the factor levels
dds$condition <- factor(dds$condition, levels = c("mock.2hr","infected.2hr"))
#get rid of low count data
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]
#perform the dfe analysis based on negative binomial model and get the results
dds <- DESeq(dds)
res <- results(dds)
But then after reading the documentation on contrasts I re-ran my analysis with the following sample list:
sample,condition,replicate
1,mock.2hr,1
2,mock.2hr,2
3,mock.2hr,3
4,infected.2hr,1
5,infected.2hr,2
6,infected.2hr,3
7,mock.4hr,1
8,mock.4hr,2
9,mock.4hr,3
10,infected.4hr,1
11,infected.4hr,2
12,infected.4hr,3
13,mock.6hr,1
14,mock.6hr,2
15,mock.6hr,3
16,infected.6hr,1
17,infected.6hr,2
18,infected.6hr,3
19,mock.8hr,1
20,mock.8hr,2
21,mock.8hr,3
22,infected.8hr,1
23,infected.8hr,2
24,infected.8hr,3
Similar code snippet
dds <- DESeqDataSetFromTximport(txi, colData = samples, design = ~ condition )
dds$condition <- factor(dds$condition, levels = c("mock.2hr", "infected.2hr", "mock.4hr", "infected.4hr", "mock.6hr", "infected.6hr", "mock.8hr", "infected.8hr"))
#get rid of low count data
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]
#perform the dfe analysis based on negative binomial model and get the results
dds <- DESeq(dds)
res <- results(dds)
resultsNames(dds)
results(dds, contrast=c("condition","mock.2hr","infected.2hr"))
However, when I compare results from the 1st set of results two the second set of results, I was expecting the same results, but they are slightly different
"transcript","baseMean","log2FoldChange","lfcSE","stat","pvalue","padj"
"tx1",175.15081857585,-0.0714999820256329,0.128181252816721,-0.557803738491045,0.576978409578714,0.807919847826457
"tx1",183.390698,0.13396380,0.1621473,0.82618589,0.4086986,0.6802349
I feel like I'm misunderstanding how contrasts work. Can someone help me clear up my confusion?
Thanks for your reply, Michael. I forgot to check the FAQ. I read If I have multiple groups, should I run all together or split into pairs of groups? Is it because the dispersion parameter is different, then?
Exactly.