Question

DESeq2 experimental design

0

Entering edit mode

Mike ▴ 10

@mike-12142

Last seen 2.6 years ago

Canada

I have 12 RNA-seq samples: 3 replicates each of male control, male mutant, female control, and female mutant. I want a list of genes that are significantly differentially expressed in male (mutant vs control) and female (mutant vs control). I'm not interested in comparing for example male control vs female control though I may do something like a Venn diagram of genes that are differentially expressed in both males and females. Should I do two independent analyses for male and female, or combine everything together (ie. all samples in the same summarizedExperiment and DeseqDataSet) and then use contrasts to specify the two comparisons (ie. contrast=c("Group","male.knockout","male.control")) and contrast=c("Group","female.knockout","female.control")))?

deseq2 multiple factor design • 2.0k views

ADD COMMENT • link updated 6.6 years ago by Gavin Kelly ▴ 680 • written 6.6 years ago by Mike ▴ 10

score 1 · Answer 1 · 2017-10-05

1

Entering edit mode

Gavin Kelly ▴ 680

@gavin-kelly-6944

Last seen 4.0 years ago

United Kingdom / London / Francis Crick…

Generally you get better statistical power if you have all the samples in the same dataset, as you're estimating the variance across many more degrees of freedom. The caveat is that if one half of your experiment has, for biological or technical reasons, a different degree of variability, or a greater propensity for samples to be outliers, then the combined approach will over- and under- represent the variability depending on which half of the experiment you're looking at. But my intuition would be that this doesn't look like one of those situations. You can get some feel by looking at PCA plots or clusterings - if in one branch of the experiment the clusters are much tighter than the other branch, then you might want to try both approaches and see if positive control genes are better in one case than the other.

Another reason for doing the combined approach is that it will let you do an 2x2 design with interactions, to look at different response to KO between the sexes without having to resort to a venn-diagram-like approach (which often suffers due to two rounds of statistical error).

ADD COMMENT • link 6.6 years ago Gavin Kelly ▴ 680

0

Entering edit mode

Thank you for your reply, I have some additional questions about the design formula. These are my 12 samples and let's assume I'm analyzing them all together:

Sex	Genotype	BioRep	Group
female	control	1	female.control
female	control	2	female.control
female	control	3	female.control
female	knockout	1	female.knockout
female	knockout	2	female.knockout
female	knockout	3	female.knockout
male	control	1	male.control
male	control	2	male.control
male	control	3	male.control
male	knockout	1	male.knockout
male	knockout	2	male.knockout
male	knockout	3	male.knockout

Also the samples are paired, in that male control 1 is paired with male knockout 1, female control 2 is paired with female knockout 2, etc. I want to answer three questions:

1. What genes are differentially expressed in the males (control vs knockout)?

2. What genes are differentially expressed in the females (control vs knockout)?

3. What are the different responses to the knockout in male vs female?

What design formula should I use? I think ~ BioRep + Group

And then to answer questions 1 and 2 above would I use these contrasts?

contrast=c("Group","male.knockout","male.control")

contrast=c("Group","female.knockout","female.control")

I'm not sure how I should modify the design and contrasts to answer question 3, any help is appreciated, thank you.

ADD REPLY • link 6.5 years ago Mike ▴ 10

0

Entering edit mode

The contrast you'd need would be something along the lines contrast=list(c("m.ctrl", "f.ko"), c("f.ko", "m.ctrl")) as this would divide the female ko_vs_ctrl by the male ko_vs_ctrl (you'll need to change the entries to correspond to resultsNames values...)

One question that immediately springs to mind, though, is how you've got male and female versions of the same biological replicate. This may be correct, but seems unlikely - by putting BioRep in as an effect, you're suggesting that there's something connected about samples with the same label (also, double-check that you've got BioRep as a factor, rather than a numeric).

ADD REPLY • link 6.5 years ago Gavin Kelly ▴ 680