We are struggling in understanding the correct design formula for my rnaseq analysis. We have 16 samples, 4 biological replicates with 2 different variables: if they are treated with a drug or not, and if they are stressed or not.
We want to know: a) Which is the effect of the treatment under basal conditions. b) Which is the effect of the treatment under stressed conditions.
During the exploratory analysis, we realized that the individual differences between biological replicates were bigger than than the ones due to treatment, or infection (i.e. the samples cluster together by biological replicate and not by treatment). We are not interested in quantifying these differences, but on normalizing for them (kinda like a paired analysis?). My question is the following:
Given that the 'genotype' (or 'family') effect will be inherent in all the samples (see colData below), should I take it into account when doing the design matrix?
i.e., my colData would look like this:
genotype treatment stress 1 A untreated stressed 2 A treated stressed 3 A untreated No 4 A treated No 5 B untreated stressed 6 B treated stressed 7 B untreated No 8 B treated No 9 C untreated stressed 10 C treated stressed 11 C untreated No 12 C treated No 13 D untreated stressed 14 D treated stressed 15 D untreated No 16 D treated No
If I'm understanding previous answers right, I should normalize by 'genotype', and the technical process would be similar of dealing with having a 'batch' effect, being every genotype a 'batch'. In this case, should our design look like this?
design = ~ treatment + treatment:genotype + treatment:stress
(This would be similar to this case: https://support.bioconductor.org/p/92108/)
Or should I collapse 'treatment' and 'stress' into a grouped 'condition' variable as suggested here (https://support.bioconductor.org/p/67600/#67612), and input the design:
design = ~ genotype + condition
I would then take out the results that we are interested with
## Which is the effect of the treatment under basal conditions. results(dds, contrast=c("condition","treatedNo", "untreatedNo")) ## Which is the effect of the treatment under stressed conditions. results(dds, contrast=c("condition","treatedstressed", "untreatedstressed"))
But I am not able to see how I would see the "extra" effect of the treatment under stressed conditions 'relative' to being stressed in this case.
Thank you so much for your help