Group Specific Condition Query
Entering edit mode
bdy8 ▴ 10
Last seen 5.3 years ago

First of all I would like to say that I am new to this kind of analysis. I have been using the DeSeq2 vignette extensively and this is the only part which has tripped me up and confused me a little. 

I have been following the “Group specific condition effects, individuals nested within groups” in the DeSeq2 vignette but I am a little confused with its application. 

My data can be substituted in the example vignette (DeSeq2 vignette page 78) as follows,

  • Grp = year, 2016 or 2017. 
  • Ind = Genotype, P1 - P12, 
  • cnd = treatment, baseline or exposed (pg 73 on the vignette). 
  • Each genotype has 3 baseline and 3 exposed biological replicates. 
  • Genotypes P1 - P6 are in 2016, and genotypes P7 - P12 are in 2017 (I understand this is not a great dataset, I was given it to work on and have to try and get as much out of it as possible). 
  • Original PCA plot shows PC1 explaining 50% variance. All 2016 grouped on the left and all 2017 grouped on the right. 

2 additional things to note

1. I had additional funds to sequence 3 baseline and 3 exposed from genotypes P1, P3 and P6 which were also run in 2017. These grouped with the other 2017 genotypes (P7-P12) so I am leaning towards this large PC variance being explained by a year affect (which I would expect it to be). 

2. I know the experimental design is far from desirable, but  it is what I have been given so I am trying to make the most of it while also trying to use best practices to analyse it. 


With that in mind, I have three questions. 

1. I want to look at the signatures of disease resistance of the genotypes. With the model made through this grouping method (~Year + Year+ind.n + Year:Treatment) does this now mean I have removed the Year affect and can use (results(ddsgroup, contrast=list(“”, “”) to look at the differences between the differentially expressed genes in the 2016 and 2017 exposed samples?

2. If not, and due to the variance being a year affect, would it be an acceptable analyses to make a new 4 level factor of year and treatment, and then run the model ~ genotype + YearandTreatment with the goal of extracting the 2016 baseline and exposed, and 2017 baseline and exposed and running pairwise comparisons from there? This would mean only one model is used instead of 2 and analysing the datasets individually. 

3. Is there a way to 'model out' this variance. The PCA plot gives a large variance on PC1 (As mentioned earlier), so if I wanted to remove this could I do it, and would it be by utilising packages such as SVA and combat? Again, if this is extremly bad practice then I will not do this. 

I apologise if I am barking up completely the wrong tree but I would appreciate any help in rectifying my confusion. I have followed everything else in the vignette with ease and this is the first case where I have been tripped up. 

deseq2 deseq batcheffect • 1.1k views
Entering edit mode
Last seen 35 minutes ago
United States

Is year a nuisance variable in this experiment? In the "Group-specific condition effects..." section, it is of interest to see if the individuals in the different groups have different treatment effects -- where in that section the grouping is a biological quantity of interest.

However, if in your case 'year' is just a nuisance/batch variable, I'd recommend a different and simpler design, but first let me check with you on this.

Can you elaborate on "I want to look at the signatures of disease resistance of the genotypes". Do you want to assume an overall treatment effect for all genotypes? If not, for which groups of genotypes do you want to compare their treatment effects?

Entering edit mode

Hi Michael

Yes, year is a nuisance variable at this point. I have been doing some extra reading and think that a model with ~ Year + Genotype + Treatment (year being the nuisance/batch here) would be a much simpler model to use. This would, if I am correct in my thinking, allow analysis of the difference of expression between the baseline and exposed within and between genotypes. The model is therefore removing the year batch.

Elaborating on "signatures of disease resistance".

I want to be able to show potential differential gene expression which explains why some genotypes are more disease resistant than other genotypes. I forgot to mention (I apologise for this) that we have % transmission data for each genotype, and I want to try and correlate this with the expression seen.

I would also want to look at only the genotypes done in 2016, with the idea to then see how the identified genes (in 2016) change/remain the same for the genotypes in 2017. This could then be used to show potential difference in pathogens/virulence for future studies.

For example, say we had genotypes P1 and P3 and gene x. P1 had a low % disease transmission after disease exposure and P3 had a high % disease transmission after disease exposure. This would mean that P1 is expected to be a more resistant individual. Therefore, if gene x had a large change in expression after disease exposure (high levels of up-regulation)  whereas in P3 this gene was turned of/remained the same level of expression (compared to the baseline) I would suspect this gene (gene x) to be important in resistance. Another scenario (again using gene x) would be if the base expression of  gene X  in P1 was higher and it did not change after exposure, I would assume it has a important role in resistance IF In P3, the same gene (gene X) had a overall lower base expression. Obviously this is very simplified but I hope it gives you a better idea of my aim and goals. 

Entering edit mode

If you wanted to find genes that are DE across all genotypes, you would just do ~genotype + treatment. Because genotype is nested within treatment, this controls for the year effect as well. It assumes a common trreatment effect.

If you want to compare the treatment effect across groups of genotypes, you would use the "individuals nested within groups" strategy you initially posted about. It assumes a common treatment effect within the groups of genotypes.

DESeq2 cannot do random treatment effects, only fixed effects, so these two are your options basically within DESeq2.

Now, adding in a new continuous covariate, you have to be very specific about how you want model the interaction of this with treatment (either if there is a common treatment or group-specific treatments). Model as a continuous or discretized variable? Transformed? This amount of statistical guidance is beyond what I can offer on the support site. I would recommend therefore to collaborate with a local statistician on approaching how to add in this covariate. Anyone with experience with linear modeling in R could help you develop a design, and how to build and interpret contrasts, as the design matrices, coefficients and contrasts available in DESeq2 are built off of, and so the same that are used in base R. 

Entering edit mode

Hi Michael 

Thank you so much for the help. It has answered where I was confused with the nesting within groups section. 

The main problem I think I have had is applying this into DeSeq. With your clarification I understand exactly what I need to do with the analyses. I will be having my models and analyses checked by someone who has done this kind of experiment before as I want to make sure I am ticking all the boxes and not drawing spurios conclusions. 

Finally I just want to say the vignettes you have produced are fantastic and, for someone like me embarking on this for the first time, a life saver so I am very thankful. 

Entering edit mode

Great. Feel free to post any bugs or unexpected errors of course.


Login before adding your answer.

Traffic: 526 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6