I'm working on a case/control study and including some other variables such as genotypes. One thing I want to do is a subgroup analysis of the genotypes for just the case group. The brute force method is to subset my data to the case group and run deseq on that. I'm very new to deseq and the very idea of contrasts. But from reading it seems like contrasts offer a more elegant way to do subgroup analysis without having to rerun deseq multiple times on differently subsetted data.
Suppose I have the following variables:
- Case: [0,1]
- Gene1: [YY, YN, NN]
- Gene2: [YY, YN, NN]
I run deseq with the design "~ Case + Gene1 + Gene2 + Gene1:Gene2"
How would I write contrasts that would be the equivalent of
- Subgroup just those with the disease (Case=1)
- Subgroup just homozygous genotypes: (Gene1 = [YY, NN], Gene2=[YY,NN])
- Ask the question: Is the strongest effect due to Gene1, Gene2, or the interaction?
Thanks!
Here is a putative study design to work with:
> fake_study_design = data.frame(
+ Case=sample(c('Case', 'Control'), 10, replace=T),
+ GeneA=sample(c('YY','YN','NN'), 10, replace=T),
+ GeneB=sample(c('YY', 'YN', 'NN'), 10, replace=T)
+ )
> fake_study_design
Case GeneA GeneB
1 Case YY YY
2 Case YN NN
3 Control YY NN
4 Control YY YN
5 Control YN YY
6 Control NN NN
7 Control NN YY
8 Case NN YN
9 Control YY NN
10 Control YY YY
Added to post. Is this what you wanted?
Yes, so now I have some sense of the design (I'm not sure if you have all the required samples though).
You can combine the variables if you want to make comparisons just within one group, see here:
https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#interactions
The problem is, I'm so clueless about contrasts, I don't know how to take the vignette example and put it in practice. For instance, if my design was
~ Case + GeneA + GeneB + GeneA:GeneB
, then to subgroup justCase=="Case"
, my contrast would becontrast=c(?,?,?)
.I’d recommend chatting with a statistician or bioinformatician who works with R designs. They can help guide you through the missing pieces.
Ok. Defying good community practice, I've cross-posted here to see if a stats person can help.
Hi @Michael--I thought of another way to phrase the question that might be more direct:
If I have created a deseq object with design
~CaseString + Gender
, can I still access results for a subset of the data? For instance, can I create a contrast that would be equivalent to first subset data forCase=="Case"
, then run deseq?See the section on interactions. That is what interactions do, and we have a diagram to explain how to interpret the coefficients in the vignette. But more generally, you want to make sure you get it right, so if it remains unclear after reading the documentation, I strongly recommend to find a local statistician who you can sit down with and have them explain the interpretation of the coefficients in your linear model of interest. There isn't anything specific about DESeq2 here, we use the same coefficients as any linear model in R or otherwise.
Thanks, Micheal! Your reply speed is absolutely stunning. I have made arrangements with a statistician :) I did read the section on interactions, but I wasn't sure if I really have an interaction here. I'm far from an expert in R formulas, but my understanding is that something like
Case:Gender
would mean that Case and Gender are acting together, as opposed to looking at Gender controlled for Case but only for Case="Case". Maybe those are the same thing. Statistician should help sort this out.