Question: DESeq2 multifactorialdesign contrasts
gravatar for reubenmcgregor88
6 days ago by
reubenmcgregor880 wrote:


Hi all, 

I have what may seem like as really simple questions, but being new to using R in general and very new to DEseq2 I just can not figure it out reading the manual (which is great and very informative for a biologist, thanks!)

I have a simple design of 3 donors (A, B and C) and 3 conditions (unit,eth and vitd). 

My basic question is how can I do a contrast with a multifactorial design? (IU may be getting these terms mixed up and therein may lie the problem.

If I simply import the "eth" and "vitd" data and follow the manual for MF design I can run through and get great results (with 85 DE genes):

> dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldata, design = ~ condition)

> dds <- dds[ rowSums(counts(dds)) > 1, ]

> ddsMF <- dds

> design(ddsMF) <- formula(~ donor + condition)

> ddsMF <- DESeq(ddsMF)


However when I try and do the same thing but also importing the "uns" sample (so now I have "uns", and the two previous "eth" + "vitd") and try and follow the contrast I get 75 DE genes. So same procedure as before but this time 

> res.vitd.eth <- vitdresults(ddsMF, contrast=c("condition","vitd","eth"))

with the ddsMF run as before.

I'm sure I am doing something so basic wrongs here.

Thank you in advance,



ADD COMMENTlink modified 6 days ago by Gavin Kelly250 • written 6 days ago by reubenmcgregor880
gravatar for Gavin Kelly
6 days ago by
Gavin Kelly250
United Kingdom / London / Francis Crick Institute
Gavin Kelly250 wrote:

There are a couple of reasons I can think of that may explain this, but I assume your 75-gene list is fairly similar in content to the 85-gene list (if not, then the most likely explanation is that the data-importing has got sample labels misaligned).

 The most likely cause is due to the way replicate variability is estimated.  When you include an extra experimental group in the dataset, regardless of whether it is involved in your contrast, the residual noise is calculated within each group and then pooled - so if you're 'uns' data is more or less noisy than your existing conditions (for genes of interest), then you'll see a different amount of noise, and that will feed into the hypothesis testing.

The other reason could be (depending on whether you're using the latest version of DESeq2), a 'shrinkage' is applied to your log-fold-changes, so that each gene's lfc is moderated towards a cross-gene average.  Again, the amount of shrinkage is calculated across all pairwise comparisons (I'm not being particularly precise here, but hopefully this gives a picture) - so if the 'uns vs X' comparisons are substantially different in magnitude than the 'eth vs vitd' comparison, then you may see differences if you include a 'phantom' experimental group in your dataset. You can avoid this by having 'results(...., betaPrior=FALSE)' in your calculation, though if you're using the latest version of DESeq2, this is the default, in which case this reason is moot.

As a side issue, you should think carefully (or get a local statistician to) about your design: you may want to analyse it differently depending on whether you have every condition for each donor, or whether each donor only presents one condition, etc.


ADD COMMENTlink written 6 days ago by Gavin Kelly250

Thank you Gavin,

I see your points, both of them. I am using the most recent DESeq2 version, sorry I should have specified version etc. 

However, as far as I can gather, there is something wrong with my design, please bare with me.

when I run the original:

> dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldata, design = ~ condition)

So there is no multifactorial design, so no controlling for donor variability, I also get 75DE genes.

The same 75 when I run the contrast:

> res.vitd.eth <- vitdresults(ddsMF, contrast=c("condition","vitd","eth")).

However when I run the original multifactorial design:

> design(ddsMF) <- formula(~ donor + condition)

This seems to work well and control for donor variability, and hence bring uptake few extra DE genes (all the 75DE genes in the non MF design are also in the 85DE of the MF design plus an extra 10, as you mentioned at the start of your answer, i.e. the content of the gene lists are very similar). 

So it seems the contrast (res.vitd.eth <- vitdresults(ddsMF, contrast=c("condition","vitd","eth") I am running is not controlling for donor variability, even though I am using ddsMF, which was run with the formula donor + condition: 

> design(ddsMF) <- formula(~ donor + condition)

> ddsMF <- DESeq(ddsMF)

Thank you again for your answer, 

p.s. I will speak to our statistician (they are pretty hard to get an appointment with), but I have evert condition for each donor.


ADD REPLYlink written 4 days ago by reubenmcgregor880
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 228 users visited in the last hour