I'm using edgeR to analyze my RNA-Seq dataset, which includes data for ten different treatments and a control sampled at three different time points. I used the glmFit() function to fit a model that includes the effects of treatment, time, and the interaction between the two. And when I conduct likelihood ratio tests, treatment, time, and the interaction all have significant effects on a number of genes. At this point, I'm interested in finding out how treatments differ from one another within each time point. I'm not sure exactly how to go about getting this information; my understanding is that the effect of time in the model is relative to the first time point, which is why none of the coefficients refer to time point 1. So if I wanted to determine how each treatment compared to the control within the first time point only, how would I approach this? Would I need to rerun the analysis using just the data from that time point, taking time out of the model? Any help is greatly appreciated!
Here's the code for what I've done so far:
> samples=read.csv("edgeRmetadata_full.csv",header=TRUE)
> samples$countf = paste("~/kayla/WaspRNASeq/july",as.character(samples$timepoint),"_wd/",samples$LibraryName,"_clean_tophat/",samples$LibraryName,"_clean_tophat_counts.txt",sep="")
>
> treat = factor(samples$condition)
> treat = relevel(treat, ref="CTL")
> time = factor(samples$timepoint)
>
> counts=readDGE(samples$countf)$counts
> colnames(counts)=samples$LibraryName
>
> noint = rownames(counts) %in% c("__no_feature","__ambiguous","__too_low_aQual","__not_aligned","__alignment_not_unique")
>
> cpms = cpm(counts)
> keep = rowSums(cpms>1)>=3 & !noint
> counts = counts[keep,]
>
> y = DGEList(counts=counts, group=samples$condition)
> y = calcNormFactors(y)
> design <- model.matrix(~treat*time)
> y= estimateGLMCommonDisp(y, design, verbose=TRUE)
Disp = 0.10542 , BCV = 0.3247
> y <- estimateGLMTrendedDisp(y, design)
> y <- estimateGLMTagwiseDisp(y, design)
>
> fit <- glmFit(y, design)
> colnames(fit)
[1] "(Intercept)" "treatG1F1" "treatG2"
[4] "treatG3" "treatG4" "treatGxUg"
[7] "treatLb17" "treatLcAtl" "treatLgCam"
[10] "treatLh14" "treatLvHaw" "time2"
[13] "time3" "treatG1F1:time2" "treatG2:time2"
[16] "treatG3:time2" "treatG4:time2" "treatGxUg:time2"
[19] "treatLb17:time2" "treatLcAtl:time2" "treatLgCam:time2"
[22] "treatLh14:time2" "treatLvHaw:time2" "treatG1F1:time3"
[25] "treatG2:time3" "treatG3:time3" "treatG4:time3"
[28] "treatGxUg:time3" "treatLb17:time3" "treatLcAtl:time3"
[31] "treatLgCam:time3" "treatLh14:time3" "treatLvHaw:time3"
> lrt <- glmLRT(fit, coef=2:11)
> FDR <- p.adjust(lrt$table$PValue, method="BH")
> sum(FDR < 0.05)
[1] 945
> lrt <- glmLRT(fit, coef=12:13)
> FDR <- p.adjust(lrt$table$PValue, method="BH")
> sum(FDR < 0.05)
[1] 15
> lrt <- glmLRT(fit, coef=14:33)
> FDR <- p.adjust(lrt$table$PValue, method="BH")
> sum(FDR < 0.05)
[1] 191
Thanks, this is super helpful! I actually have a follow-up question that I'm hoping you can help me with. Essentially, I would like to make comparisons between treatments within time points, without using the control as the baseline. My understanding is that my design would be:
design <- model.matrix(~0+treat*time)
And then would my contrasts be something like this?
Time point 1:
contr <- makeContrasts(treatG1F1-treatG2, levels=design)
Time point 2:
contr <- makeContrasts((treatG1F1_time2+treatG1F1)-(treatG2_time2+treatG2), levels=design)
Time point 3:
contr <- makeContrasts((treatG1F1_time3+treatG1F1)-(treatG2_time3+treatG2), levels=design)
Making contrasts isn't intuitive to me, so I appreciate the help. Thanks again!
You don't have to change your design matrix to compare treatments within time points. In your case, the three contrasts you made above are okay for both
model.matrix(~treat*time)
andmodel.matrix(~0+treat*time)
.See Chapter 9 of the limma user's guide for how to make a design matrix and contrasts under different circumstances. I find it quite useful.