Question: DESeq2 experimental design
gravatar for Mike
11 months ago by
Mike10 wrote:

I have 12 RNA-seq samples: 3 replicates each of male control, male mutant, female control, and female mutant. I want a list of genes that are significantly differentially expressed in male (mutant vs control) and female (mutant vs control). I'm not interested in comparing for example male control vs female control though I may do something like a Venn diagram of genes that are differentially expressed in both males and females. Should I do two independent analyses for male and female, or combine everything together (ie. all samples in the same summarizedExperiment and DeseqDataSet) and then use contrasts to specify the two comparisons (ie. contrast=c("Group","male.knockout","male.control")) and contrast=c("Group","female.knockout","female.control")))?

ADD COMMENTlink modified 11 months ago by Gavin Kelly560 • written 11 months ago by Mike10
gravatar for Gavin Kelly
11 months ago by
Gavin Kelly560
United Kingdom / London / Francis Crick Institute
Gavin Kelly560 wrote:

Generally you get better statistical power if you have all the samples in the same dataset, as you're estimating the variance across many more degrees of freedom.  The caveat is that if one half of your experiment has, for biological or technical reasons, a different degree of variability, or a greater propensity for samples to be outliers, then the combined approach will over- and under- represent the variability depending on which half of the experiment you're looking at.  But my intuition would be that this doesn't look like one of those situations.  You can get some feel by looking at PCA plots or clusterings - if in one branch of the experiment the clusters are much tighter than the other branch, then you might want to try both approaches and see if positive control genes are better in one case than the other.

Another reason for doing the combined approach is that it will let you do an 2x2 design with interactions, to look at different response to KO between the sexes without having to resort to a venn-diagram-like approach (which often suffers due to two rounds of statistical error).


ADD COMMENTlink written 11 months ago by Gavin Kelly560

Thank you for your reply, I have some additional questions about the design formula. These are my 12 samples and let's assume I'm analyzing them all together:


Genotype BioRep Group
female control 1 female.control
female control 2 female.control
female control 3 female.control
female knockout 1 female.knockout
female knockout 2 female.knockout
female knockout 3 female.knockout
male control 1 male.control
male control 2 male.control
male control 3 male.control
male knockout 1 male.knockout
male knockout 2 male.knockout
male knockout 3 male.knockout

Also the samples are paired, in that male control 1 is paired with male knockout 1, female control 2 is paired with female knockout 2, etc. I want to answer three questions:

1. What genes are differentially expressed in the males (control vs knockout)?

2. What genes are differentially expressed in the females (control vs knockout)?

3. What are the different responses to the knockout in male vs female?

What design formula should I use? I think ~ BioRep + Group

And then to answer questions 1 and 2 above would I use these contrasts?



I'm not sure how I should modify the design and contrasts to answer question 3, any help is appreciated, thank you.

ADD REPLYlink written 11 months ago by Mike10

The contrast you'd need would be something along the lines contrast=list(c("m.ctrl", "f.ko"), c("f.ko", "m.ctrl")) as this would divide the female ko_vs_ctrl by the male ko_vs_ctrl (you'll need to change the entries to correspond to resultsNames values...)

One question that immediately springs to mind, though, is how you've got male and female versions of the same biological replicate.  This may be correct, but seems unlikely - by putting BioRep in as an effect, you're suggesting that there's something connected about samples with the same label (also, double-check that you've got BioRep as a factor, rather than a numeric).


ADD REPLYlink written 11 months ago by Gavin Kelly560
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 297 users visited in the last hour