edgeR analysis of effect of treatments within time points
Entering edit mode
kaylamh6 • 0
Last seen 8.1 years ago
United States

I'm using edgeR to analyze my RNA-Seq dataset, which includes data for ten different treatments and a control sampled at three different time points. I used the glmFit() function to fit a model that includes the effects of treatment, time, and the interaction between the two. And when I conduct likelihood ratio tests, treatment, time, and the interaction all have significant effects on a number of genes. At this point, I'm interested in finding out how treatments differ from one another within each time point. I'm not sure exactly how to go about getting this information; my understanding is that the effect of time in the model is relative to the first time point, which is why none of the coefficients refer to time point 1. So if I wanted to determine how each treatment compared to the control within the first time point only, how would I approach this? Would I need to rerun the analysis using just the data from that time point, taking time out of the model? Any help is greatly appreciated!

Here's the code for what I've done so far:

> samples=read.csv("edgeRmetadata_full.csv",header=TRUE)
> samples$countf = paste("~/kayla/WaspRNASeq/july",as.character(samples$timepoint),"_wd/",samples$LibraryName,"_clean_tophat/",samples$LibraryName,"_clean_tophat_counts.txt",sep="")
> treat = factor(samples$condition)
> treat = relevel(treat, ref="CTL")
> time = factor(samples$timepoint)
> counts=readDGE(samples$countf)$counts
> colnames(counts)=samples$LibraryName
> noint = rownames(counts) %in% c("__no_feature","__ambiguous","__too_low_aQual","__not_aligned","__alignment_not_unique")
> cpms = cpm(counts)
> keep = rowSums(cpms>1)>=3 & !noint
> counts = counts[keep,]
> y = DGEList(counts=counts, group=samples$condition)
> y = calcNormFactors(y)
> design <- model.matrix(~treat*time)
> y= estimateGLMCommonDisp(y, design, verbose=TRUE)
Disp = 0.10542 , BCV = 0.3247
> y <- estimateGLMTrendedDisp(y, design)

> y <- estimateGLMTagwiseDisp(y, design)
> fit <- glmFit(y, design)
> colnames(fit)
 [1] "(Intercept)"       "treatG1F1"         "treatG2"
 [4] "treatG3"           "treatG4"           "treatGxUg"
 [7] "treatLb17"         "treatLcAtl"        "treatLgCam"
[10] "treatLh14"         "treatLvHaw"        "time2"
[13] "time3"            "treatG1F1:time2"  "treatG2:time2"
[16] "treatG3:time2"    "treatG4:time2"    "treatGxUg:time2"
[19] "treatLb17:time2"  "treatLcAtl:time2" "treatLgCam:time2"
[22] "treatLh14:time2"  "treatLvHaw:time2" "treatG1F1:time3"
[25] "treatG2:time3"    "treatG3:time3"    "treatG4:time3"
[28] "treatGxUg:time3"  "treatLb17:time3"  "treatLcAtl:time3"
[31] "treatLgCam:time3" "treatLh14:time3"  "treatLvHaw:time3"

> lrt <- glmLRT(fit, coef=2:11)
> FDR <- p.adjust(lrt$table$PValue, method="BH")
> sum(FDR < 0.05)
[1] 945
> lrt <- glmLRT(fit, coef=12:13)
> FDR <- p.adjust(lrt$table$PValue, method="BH")
> sum(FDR < 0.05)
[1] 15
> lrt <- glmLRT(fit, coef=14:33)
> FDR <- p.adjust(lrt$table$PValue, method="BH")
> sum(FDR < 0.05)
[1] 191

edger rnaseq • 1.8k views
Entering edit mode
Yunshun Chen ▴ 810
Last seen 14 days ago

You don't have to rerun the analysis. If you use data from one time point only, you'll lose 2/3 of the total information.

All you have to do is to test some coefficients or contrasts under the current design. Presumably the intercept is your control at time 1. According to your design, the coefficient treatG1F1 represents the comparison "G1F1_Time1-Control_Time1", time2 represents the comparison "Control_Time2-Control_Time1" and the interaction term treatG1F1:time2 represents "(G1F1_Time2-Control_Time2)-(G1F1_Time1-Control_Time1)", etc.

To compare G1F1 with the control at time 1, you only need to test for the second coefficient.

lrt <- glmLRT(fit, coef=2)

To compare G1F1 with the control at time 2, you need to make the following contrast and test it in glmLRT.

contr <- c(0,1,rep(0,11),1,rep(0,19))
lrt <- glmLRT(fit, contrast=contr)

Alternatively, you can make the contrast using makeContrasts() in limma. However, you need to avoid the symbol ":" in the column names of the design.

colnames(design) <- gsub(":", "_", colnames(design))
contr <- makeContrasts(treatG1F1_time2+treatG1F1, levels=design)
Entering edit mode

Thanks, this is super helpful! I actually have a follow-up question that I'm hoping you can help me with. Essentially, I would like to make comparisons between treatments within time points, without using the control as the baseline. My understanding is that my design would be:

design <- model.matrix(~0+treat*time)

And then would my contrasts be something like this?

Time point 1:
contr <- makeContrasts(treatG1F1-treatG2, levels=design)
Time point 2:
contr <- makeContrasts((treatG1F1_time2+treatG1F1)-(treatG2_time2+treatG2), levels=design)
Time point 3:
contr <- makeContrasts((treatG1F1_time3+treatG1F1)-(treatG2_time3+treatG2), levels=design)

Making contrasts isn't intuitive to me, so I appreciate the help. Thanks again!

Entering edit mode

You don't have to change your design matrix to compare treatments within time points. In your case, the three contrasts you made above are okay for both model.matrix(~treat*time) and model.matrix(~0+treat*time).

See Chapter 9 of the limma user's guide for how to make a design matrix and contrasts under different circumstances. I find it quite useful.


Login before adding your answer.

Traffic: 546 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6