Finding genes respond differently to treatment between many genotype comparison
2
0
Entering edit mode
chiayi • 0
@chiayi-13269
Last seen 6.0 years ago
New York University

Hello,

I have an RNA-seq experiment that I would like to analysis using nested design matrix, similar to section 3.5 in the edgeR User's Guide: 
> design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment)
> colnames(design)
[1] "(Intercept)"                      "DiseaseDisease1"
[3] "DiseaseDisease2"                  "DiseaseHealthy:Patient2"
[5] "DiseaseDisease1:Patient2"         "DiseaseDisease2:Patient2"
[7] "DiseaseHealthy:Patient3"          "DiseaseDisease1:Patient3"
[9] "DiseaseDisease2:Patient3"         "DiseaseHealthy:TreatmentHormone"
[11] "DiseaseDisease1:TreatmentHormone" "DiseaseDisease2:TreatmentHormone"

My goal is to find genes that respond differently to the hormone in ANY disease pairs (disease1 vs healthy, disease2 vs healthy, and disease1 vs disease2). Although I could achieve each comparison using contrasts, as stated on page 39 of the User's Guide,
qlf <- glmQLFTest(fit, contrast=c(0,0,0,0,0,0,0,0,0,-1,1,0))
qlf <- glmQLFTest(fit, contrast=c(0,0,0,0,0,0,0,0,0,-1,0,1))
qlf <- glmQLFTest(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,-1,1))

I was wondering if I could combine the above three into one glmQLFTest using either contrast or coefficient. In my data, I have 18 genotypes comparable to the disease group in the above example and I am interested in finding the genes respond differently to the treatment in ANY genotype comparison. This would result in 153 possible pair-wise comparisons and I'm curious to know a way better than conducting glmQLFTest 153 times.

>mydesign <- model.matrix(~genotype+genotype:treatment)
>colnames(mydesign)
 [1] "(Intercept)"           "genotype2"             "genotype3"             "genotype4"            
 [5] "genotype5"             "genotype6"             "genotype7"             "genotype8"            
 [9] "genotype9"             "genotype10"            "genotype11"            "genotype12"           
[13] "genotype13"            "genotype14"            "genotype15"            "genotype16"           
[17] "genotype17"            "genotype18"            "genotype1:treatmentH"  "genotype2:treatmentH" 
[21] "genotype3:treatmentH"  "genotype4:treatmentH"  "genotype5:treatmentH"  "genotype6:treatmentH" 
[25] "genotype7:treatmentH"  "genotype8:treatmentH"  "genotype9:treatmentH"  "genotype10:treatmentH"
[29] "genotype11:treatmentH" "genotype12:treatmentH" "genotype13:treatmentH" "genotype14:treatmentH"
[33] "genotype15:treatmentH" "genotype16:treatmentH" "genotype17:treatmentH" "genotype18:treatmentH"

Any help would be much appreciated!

edgeR nested design differential expression • 1.2k views
ADD COMMENT
3
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 7 hours ago
The city by the bay

Just set up an ANODEV. Using the design from Section 3.5 as an illustrative example:

qlf <- glmQLFTest(fit, 
    contrast=cbind(
        c(0,0,0,0,0,0,0,0,0,-1,1,0),
        c(0,0,0,0,0,0,0,0,0,-1,0,1)
    )
)

In the example code above, DiseaseHealthy:TreatmentHormone serves as a reference, and each column of the contrast matrix compares one of the DiseaseDisease*:TreatmentHormone coefficients to the reference. The null hypothesis of each column is that the hormone effect in the corresponding disease is the same as that in the healthy individuals; the global null hypothesis across the entire contrast matrix is that the hormone effect is the same in the healthy state and in  both diseases. glmQLFTest will then test this global null, which will be rejected if there is DE between any pair of hormone effects.

From this, you can see that it doesn't really matter what you choose as the reference. If you choose A as a reference, the nulls are B == A and C == A, and the global null is A == B == C. You'll get the same global null regardless of what condition you pick to be the reference. Similarly, there's no need to specify all pairwise comparisons, as B == C is implied from the above.

Adapting the above reasoning to mydesign, you can build your contrast matrix as:

con.mat <- matrix(0, 36, 17)
con.mat[19,] <- -1 # first interaction term is the reference
con.mat[cbind(2:18+18, 1:17)] <- 1 # comparing to each other term

... for use in the contrasts= argument of glmQLFit.

ADD COMMENT
0
Entering edit mode

Aaron,

Thank you so much for the detailed clarification. This certainly helped me better understand the contrast and ANODEV. I really appreciate it!

ADD REPLY
2
Entering edit mode
@gordon-smyth
Last seen 2 hours ago
WEHI, Melbourne, Australia

Aaron has explained that the edgeR will conduct an anova-type test (analysis of deviance in this case) if you give it more than one coefficient or more than one contrast. In your case, you are looking for genes that show a genotype x treatment interaction. The easiest way to do this test for your data is to define

design <- model.matrix(~genotype*treatment)

and then later use

qlf <- glmQLFTest(fit, coef=20:36)

This will find genes that respond differently to the treatment in any genotype.

In this case you don't need contrasts at all, because the interactions are already part of the fitted model. The test uses 17 coefficients because the interactions have (18-1) x (2-1) = 17 degrees of freedom.

There are lots of equivalent ways to define the same test in edgeR. The above code is just the shortest route.

ADD COMMENT
0
Entering edit mode

Gordon, 

Thank you for the answer. I'm excited about further understanding edgeR and design matrix. I really appreciate your time and input. 

 

ADD REPLY

Login before adding your answer.

Traffic: 1073 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6