Question

Finding genes respond differently to treatment between many genotype comparison

0

Entering edit mode

chiayi • 0

@chiayi-13269

Last seen 6.2 years ago

New York University

Hello,

I have an RNA-seq experiment that I would like to analysis using nested design matrix, similar to section 3.5 in the edgeR User's Guide:
> design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) > colnames(design) [1] "(Intercept)" "DiseaseDisease1" [3] "DiseaseDisease2" "DiseaseHealthy:Patient2" [5] "DiseaseDisease1:Patient2" "DiseaseDisease2:Patient2" [7] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3" [9] "DiseaseDisease2:Patient3" "DiseaseHealthy:TreatmentHormone" [11] "DiseaseDisease1:TreatmentHormone" "DiseaseDisease2:TreatmentHormone"

My goal is to find genes that respond differently to the hormone in ANY disease pairs (disease1 vs healthy, disease2 vs healthy, and disease1 vs disease2). Although I could achieve each comparison using contrasts, as stated on page 39 of the User's Guide,
qlf <- glmQLFTest(fit, contrast=c(0,0,0,0,0,0,0,0,0,-1,1,0))
qlf <- glmQLFTest(fit, contrast=c(0,0,0,0,0,0,0,0,0,-1,0,1))
qlf <- glmQLFTest(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,-1,1))

I was wondering if I could combine the above three into one glmQLFTest using either contrast or coefficient. In my data, I have 18 genotypes comparable to the disease group in the above example and I am interested in finding the genes respond differently to the treatment in ANY genotype comparison. This would result in 153 possible pair-wise comparisons and I'm curious to know a way better than conducting glmQLFTest 153 times.

>mydesign <- model.matrix(~genotype+genotype:treatment) >colnames(mydesign) [1] "(Intercept)" "genotype2" "genotype3" "genotype4" [5] "genotype5" "genotype6" "genotype7" "genotype8" [9] "genotype9" "genotype10" "genotype11" "genotype12" [13] "genotype13" "genotype14" "genotype15" "genotype16" [17] "genotype17" "genotype18" "genotype1:treatmentH" "genotype2:treatmentH" [21] "genotype3:treatmentH" "genotype4:treatmentH" "genotype5:treatmentH" "genotype6:treatmentH" [25] "genotype7:treatmentH" "genotype8:treatmentH" "genotype9:treatmentH" "genotype10:treatmentH" [29] "genotype11:treatmentH" "genotype12:treatmentH" "genotype13:treatmentH" "genotype14:treatmentH" [33] "genotype15:treatmentH" "genotype16:treatmentH" "genotype17:treatmentH" "genotype18:treatmentH"

Any help would be much appreciated!

edgeR nested design differential expression • 1.3k views

ADD COMMENT • link updated 6.2 years ago by Gordon Smyth 51k • written 6.2 years ago by chiayi • 0

score 3 · Accepted Answer · 2018-04-28

Just set up an ANODEV. Using the design from Section 3.5 as an illustrative example:

qlf <- glmQLFTest(fit, 
    contrast=cbind(
        c(0,0,0,0,0,0,0,0,0,-1,1,0),
        c(0,0,0,0,0,0,0,0,0,-1,0,1)
    )
)

In the example code above, DiseaseHealthy:TreatmentHormone serves as a reference, and each column of the contrast matrix compares one of the DiseaseDisease*:TreatmentHormone coefficients to the reference. The null hypothesis of each column is that the hormone effect in the corresponding disease is the same as that in the healthy individuals; the global null hypothesis across the entire contrast matrix is that the hormone effect is the same in the healthy state and in both diseases. glmQLFTest will then test this global null, which will be rejected if there is DE between any pair of hormone effects.

From this, you can see that it doesn't really matter what you choose as the reference. If you choose A as a reference, the nulls are B == A and C == A, and the global null is A == B == C. You'll get the same global null regardless of what condition you pick to be the reference. Similarly, there's no need to specify all pairwise comparisons, as B == C is implied from the above.

Adapting the above reasoning to mydesign, you can build your contrast matrix as:

con.mat <- matrix(0, 36, 17)
con.mat[19,] <- -1 # first interaction term is the reference
con.mat[cbind(2:18+18, 1:17)] <- 1 # comparing to each other term

... for use in the contrasts= argument of glmQLFit.

score 2 · Accepted Answer · 2018-04-28

Aaron has explained that the edgeR will conduct an anova-type test (analysis of deviance in this case) if you give it more than one coefficient or more than one contrast. In your case, you are looking for genes that show a genotype x treatment interaction. The easiest way to do this test for your data is to define

design <- model.matrix(~genotype*treatment)

and then later use

qlf <- glmQLFTest(fit, coef=20:36)

This will find genes that respond differently to the treatment in any genotype.

In this case you don't need contrasts at all, because the interactions are already part of the fitted model. The test uses 17 coefficients because the interactions have (18-1) x (2-1) = 17 degrees of freedom.

There are lots of equivalent ways to define the same test in edgeR. The above code is just the shortest route.