Question

Question on DESeq2 on concatenated factors model vs. full interaction model

0

Entering edit mode

Dr Wei Chen ▴ 10

@dr-wei-chen-6318

Last seen 8.5 years ago

United States

I'm running DESeq2 for my expression analysis and have a question on the design formula. I have 2 cell lines, 2 time points, and 4 treatments in 3 batches. I tried the following 2 designs to compare the same 2 treatments and the results are slightly different in terms of the number of significant genes and their ranks. Why is that? Are these two models not the same?

1. dds$group <- factor(paste0(dds$cell, ".", dds$time, ".", dds$treat))
design(dds)<- ~ group + batch
dds<-DESeq(dds)

2. design(dds)<- ~ batch + cell*time*treat
dds<-DESeq(dds)

Thanks for the help!

Best,

Wei

deseq2 • 1.4k views

ADD COMMENT • link updated 8.5 years ago by Simon Anders ★ 3.7k • written 8.5 years ago by Dr Wei Chen ▴ 10

score 0 · Answer 1 · 2015-11-13

0

Entering edit mode

Simon Anders ★ 3.7k

@simon-anders-3855

Last seen 3.7 years ago

Zentrum für Molekularbiologie, Universi…

First: How have you called "results", and how was the comparison described in the first line of the output from "results"? This might already give you a hint what is going on.

Second: No, the two models are not the same.

About the first model: By default, DESeq2 performs a Wald test, checking for significance of the last model coefficient. This would be a comparison between two specific levels of the concatenated factor; something like: differences between cell line 3, treatment 2, time point 3 and cell line 1, treatment 1, time point 1. You can pass extra arguments to "results" to select which levels you want to compare.

In the second model, the comparison is for the double interaction between the three factors, which might be rather hard to interpret.

It might be easier to advice you if you told us more about what comparison you actually want to do.

ADD COMMENT • link 8.5 years ago Simon Anders ★ 3.7k

0

Entering edit mode

Thanks Simon. Take time comparison as an example (I have 2 time points: 3hr and 24hr), this is how I call the results function:

Using the first model:

> results(dds, contrast=c("group","19154.3hr.Typhi","19154.24hr.Typhi"), addMLE=T)
log2 fold change (MAP): group 19154.3hr.Typhi vs 19154.24hr.Typhi
Wald test p-value: group 19154.3hr.Typhi vs 19154.24hr.Typhi

Using the second model:

> results(dds, name="time_3hr_vs_24hr", addMLE=F)
log2 fold change (MLE): time 3hr vs 24hr
Wald test p-value: time 3hr vs 24hr

As I mentioned, the results are slightly different. My dilemma is that: I have to use the first model to make pairwise comparisons, and use the second model to get the 2-way and 3-way interaction terms. Would be nice if I can get all of them using the same model.

Best,

Wei

ADD REPLY • link 8.5 years ago Dr Wei Chen ▴ 10

0

Entering edit mode

The results would be the same (to numerical convergence diffs) if you set betaPrior=FALSE in DESeq(). While the LFC shrinkage has nice properties, it is dependent on how you specify the design. The pvalues are statistically valid with or without beta prior.

ADD REPLY • link 8.5 years ago Michael Love 41k