Good evening BioConductor ListServ,
I am interested in using edgeR to generate a single list of genes differentially expressed between treatment and control and either of two time points (similar to User Guide section 3.3.3). I have generated what I think is the correct contrast and model, but I am confused by the number of genes at FDR<0.1 on the output list compared to the number of genes at FDR<0.1 at either of the two time points alone.
I see non-additive numbers of genes with adjusted p-values (FDR) <0.1 coming up on related lists. In my analysis, I am using a contrast model (section 3.2.3) to examine differential gene expression between individuals collected at two time points (30 and 60 min) and in two conditions (control and treatment). There are (1) 11 genes differentially expressed at 30 min between treatment and control, but (2) 772 genes at 60 min. Independent of time point, there are (3) 216 differentially expressed genes between treatment and control. I wanted to create a single list that represented the differentially expressed genes at either time point, so I added contrasts (1) and (2) together prior to running the glmLRT. Now, my list has 319 genes at FDR<0.1, but I am not sure why the number of genes isn't between 772-783, or reflect the results from contrast (3). Can you help me understand these differences?
Please find relevant sections of my code below. I am using edgeR version 3.3.8 in R version 3.0.2.
Thanks very much for any help you can provide!
Matt
group <- factor(paste(targets$treatment, targets$timepoint, sep = ".")) design.TimeCDateLib <- model.matrix(~0+zlib + group + collect.date.factor, data=targets) rownames(design.TimeCDateLib) <- colnames(y) y.TimeCDateLib.common <- estimateGLMCommonDisp(y,design.TimeCDateLib, verbose=TRUE) y.TimeCDateLib.AmelFor30.60.common.trend <- estimateGLMTrendedDisp(y.TimeCDateLib.common,design.TimeCDateLib) y.TimeCDateLib.AmelFor30.60.common.trend.tag.df17 <- estimateGLMTagwiseDisp(y.TimeCDateLib.AmelFor30.60.common.trend,design.TimeCDateLib,prior.df=17) # This one is best. Most dots close to theoretical line. fit.TimeCDateLib.AmelFor30.60.df17 <- glmFit(y.TimeCDateLib.AmelFor30.60.common.trend.tag.df17,design.TimeCDateLib) x11(); qq <- gof(fit.TimeCDateLib.AmelFor30.60.df17,pcutoff=.05,plot=TRUE) sum(qq$outlier==TRUE) contrast.TimeCDateLib.AmelFor30.60 <- makeContrasts( CtrlTreat3060=(groupc.a-grouptx.a)+(groupc.b-grouptx.b), CtrlTreat30=(groupc.a-grouptx.a), CtrlTreat60=(groupc.b-grouptx.b), CT30v60=(groupc.a-groupc.b)+(grouptx.a-grouptx.b), RespDiff=(groupc.a-groupc.b)-(grouptx.a-grouptx.b), C30v60=(groupc.a-groupc.b), T30v60 = (grouptx.a-grouptx.b), levels=design.TimeCDateLib) # (1) lrt.CtrlTreat30<- glmLRT(fit.TimeCDateLib.AmelFor30.60.df17, contrast= contrast.TimeCDateLib.AmelFor30.60[,"CtrlTreat30"]) # (2) lrt.CtrlTreat60<- glmLRT(fit.TimeCDateLib.AmelFor30.60.df17, contrast= contrast.TimeCDateLib.AmelFor30.60[,"CtrlTreat60"]) # (3) lrt.CtrlTreat3060<- glmLRT(fit.TimeCDateLib.AmelFor30.60.df17, contrast= contrast.TimeCDateLib.AmelFor30.60[,"CtrlTreat3060"]) # (1)+(2) lrt.CtrlTreat30or60<- glmLRT(fit.TimeCDateLib.AmelFor30.60.df17, contrast= contrast.TimeCDateLib.AmelFor30.60[,2:3]) topTags(lrt.CtrlTreat3060,n=Inf) toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat3060 <- topTags(lrt.CtrlTreat3060,n=Inf) sum(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat3060$table$FDR<0.05) sum(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat3060$table$FDR<0.1) head(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat3060$table) write.csv(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat3060, "toptags.TimeCDateLib.AmelFor30.60.df17.contrast.lrt.CtrlvTreat3060.csv") topTags(lrt.CtrlTreat30,n=Inf) toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat30 <- topTags(lrt.CtrlTreat30,n=Inf) sum(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat30$table$FDR<0.05) sum(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat30$table$FDR<0.1) head(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat30$table) write.csv(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat30, "toptags.TimeCDateLib.AmelFor30.60.df17.contrast.lrt.CtrlvTreat30.csv") topTags(lrt.CtrlTreat60,n=Inf) toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat60 <- topTags(lrt.CtrlTreat60,n=Inf) sum(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat60$table$FDR<0.05) sum(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat60$table$FDR<0.1) head(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat60$table) write.csv(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat60, "toptags.TimeCDateLib.AmelFor30.60.df17.contrast.lrt.CtrlvTreat60.csv") topTags(lrt.CtrlTreat30or60,n=Inf) toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat30or60 <- topTags(lrt.CtrlTreat30or60,n=Inf) sum(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat30or60$table$FDR<0.05) sum(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat30or60$table$FDR<0.1) head(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat30or60$table) write.csv(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat30or60, "toptags.TimeCDateLib.AmelFor30.60.df17.contrast.lrt.CtrlvTreat30or60.csv")
--
Matthew S. McNeill, Ph.D.
Post Doctoral Research Associate
University of Illinois
Institute for Genomic Biology
Department of Entomology
1206 W Gregory, rm 2414-G
Urbana, IL 61801
Thank you for your clarification. I had been confused by section 3.3.3 because I interpreted "any time point" to mean a summation of DEGs at each time point as opposed to a comparison to an F-test output.