First of all I would like to say that I am new to this kind of analysis. I have been using the DeSeq2 vignette extensively and this is the only part which has tripped me up and confused me a little.
I have been following the “Group specific condition effects, individuals nested within groups” in the DeSeq2 vignette but I am a little confused with its application.
My data can be substituted in the example vignette (DeSeq2 vignette page 78) as follows,
- Grp = year, 2016 or 2017.
- Ind = Genotype, P1 - P12,
- cnd = treatment, baseline or exposed (pg 73 on the vignette).
- Each genotype has 3 baseline and 3 exposed biological replicates.
- Genotypes P1 - P6 are in 2016, and genotypes P7 - P12 are in 2017 (I understand this is not a great dataset, I was given it to work on and have to try and get as much out of it as possible).
- Original PCA plot shows PC1 explaining 50% variance. All 2016 grouped on the left and all 2017 grouped on the right.
2 additional things to note
1. I had additional funds to sequence 3 baseline and 3 exposed from genotypes P1, P3 and P6 which were also run in 2017. These grouped with the other 2017 genotypes (P7-P12) so I am leaning towards this large PC variance being explained by a year affect (which I would expect it to be).
2. I know the experimental design is far from desirable, but it is what I have been given so I am trying to make the most of it while also trying to use best practices to analyse it.
With that in mind, I have three questions.
1. I want to look at the signatures of disease resistance of the genotypes. With the model made through this grouping method (~Year + Year+ind.n + Year:Treatment) does this now mean I have removed the Year affect and can use (results(ddsgroup, contrast=list(“2016.exposed”, “2017.exposed”) to look at the differences between the differentially expressed genes in the 2016 and 2017 exposed samples?
2. If not, and due to the variance being a year affect, would it be an acceptable analyses to make a new 4 level factor of year and treatment, and then run the model ~ genotype + YearandTreatment with the goal of extracting the 2016 baseline and exposed, and 2017 baseline and exposed and running pairwise comparisons from there? This would mean only one model is used instead of 2 and analysing the datasets individually.
3. Is there a way to 'model out' this variance. The PCA plot gives a large variance on PC1 (As mentioned earlier), so if I wanted to remove this could I do it, and would it be by utilising packages such as SVA and combat? Again, if this is extremly bad practice then I will not do this.
I apologise if I am barking up completely the wrong tree but I would appreciate any help in rectifying my confusion. I have followed everything else in the vignette with ease and this is the first case where I have been tripped up.