Question

edgeR: Treatment effect at any time (UG3.3.3)

0

Entering edit mode

Matt McNeill ▴ 20

@matt-mcneill-6199

Last seen 9.6 years ago

Good evening BioConductor ListServ,

I am interested in using edgeR to generate a single list of genes differentially expressed between treatment and control and either of two time points (similar to User Guide section 3.3.3). I have generated what I think is the correct contrast and model, but I am confused by the number of genes at FDR<0.1 on the output list compared to the number of genes at FDR<0.1 at either of the two time points alone.

I see non-additive numbers of genes with adjusted p-values (FDR) <0.1 coming up on related lists. In my analysis, I am using a contrast model (section 3.2.3) to examine differential gene expression between individuals collected at two time points (30 and 60 min) and in two conditions (control and treatment). There are (1) 11 genes differentially expressed at 30 min between treatment and control, but (2) 772 genes at 60 min. Independent of time point, there are (3) 216 differentially expressed genes between treatment and control. I wanted to create a single list that represented the differentially expressed genes at either time point, so I added contrasts (1) and (2) together prior to running the glmLRT. Now, my list has 319 genes at FDR<0.1, but I am not sure why the number of genes isn't between 772-783, or reflect the results from contrast (3). Can you help me understand these differences?

Please find relevant sections of my code below. I am using edgeR version 3.3.8 in R version 3.0.2.

Thanks very much for any help you can provide!

Matt

group <- factor(paste(targets$treatment, targets$timepoint, sep = "."))
design.TimeCDateLib <- model.matrix(~0+zlib + group + collect.date.factor, data=targets)
rownames(design.TimeCDateLib) <- colnames(y)
y.TimeCDateLib.common <-
 estimateGLMCommonDisp(y,design.TimeCDateLib, verbose=TRUE)
y.TimeCDateLib.AmelFor30.60.common.trend <-
 estimateGLMTrendedDisp(y.TimeCDateLib.common,design.TimeCDateLib)

y.TimeCDateLib.AmelFor30.60.common.trend.tag.df17 <-
estimateGLMTagwiseDisp(y.TimeCDateLib.AmelFor30.60.common.trend,design.TimeCDateLib,prior.df=17)
# This one is best. Most dots close to theoretical line.

fit.TimeCDateLib.AmelFor30.60.df17 <- glmFit(y.TimeCDateLib.AmelFor30.60.common.trend.tag.df17,design.TimeCDateLib)

x11(); qq <- gof(fit.TimeCDateLib.AmelFor30.60.df17,pcutoff=.05,plot=TRUE)
sum(qq$outlier==TRUE)

contrast.TimeCDateLib.AmelFor30.60 <- makeContrasts(
 CtrlTreat3060=(groupc.a-grouptx.a)+(groupc.b-grouptx.b),
 CtrlTreat30=(groupc.a-grouptx.a),
 CtrlTreat60=(groupc.b-grouptx.b),
 CT30v60=(groupc.a-groupc.b)+(grouptx.a-grouptx.b),
 RespDiff=(groupc.a-groupc.b)-(grouptx.a-grouptx.b),
 C30v60=(groupc.a-groupc.b),
 T30v60 = (grouptx.a-grouptx.b),
 levels=design.TimeCDateLib)

# (1)
lrt.CtrlTreat30<- glmLRT(fit.TimeCDateLib.AmelFor30.60.df17,
contrast=
contrast.TimeCDateLib.AmelFor30.60[,"CtrlTreat30"])

# (2)
lrt.CtrlTreat60<- glmLRT(fit.TimeCDateLib.AmelFor30.60.df17,
contrast=
contrast.TimeCDateLib.AmelFor30.60[,"CtrlTreat60"])

# (3)
lrt.CtrlTreat3060<- glmLRT(fit.TimeCDateLib.AmelFor30.60.df17,
contrast=
contrast.TimeCDateLib.AmelFor30.60[,"CtrlTreat3060"])

# (1)+(2)
lrt.CtrlTreat30or60<- glmLRT(fit.TimeCDateLib.AmelFor30.60.df17,
contrast=
contrast.TimeCDateLib.AmelFor30.60[,2:3])

topTags(lrt.CtrlTreat3060,n=Inf)

toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat3060 <- topTags(lrt.CtrlTreat3060,n=Inf)

sum(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat3060$table$FDR<0.05)

sum(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat3060$table$FDR<0.1)

head(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat3060$table)

write.csv(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat3060,
"toptags.TimeCDateLib.AmelFor30.60.df17.contrast.lrt.CtrlvTreat3060.csv")

topTags(lrt.CtrlTreat30,n=Inf)

toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat30 <- topTags(lrt.CtrlTreat30,n=Inf)

sum(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat30$table$FDR<0.05)

sum(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat30$table$FDR<0.1)

head(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat30$table)

write.csv(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat30,
"toptags.TimeCDateLib.AmelFor30.60.df17.contrast.lrt.CtrlvTreat30.csv")

topTags(lrt.CtrlTreat60,n=Inf)

toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat60 <- topTags(lrt.CtrlTreat60,n=Inf)

sum(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat60$table$FDR<0.05)

sum(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat60$table$FDR<0.1)

head(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat60$table)

write.csv(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat60,
"toptags.TimeCDateLib.AmelFor30.60.df17.contrast.lrt.CtrlvTreat60.csv")

topTags(lrt.CtrlTreat30or60,n=Inf)

toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat30or60 <- topTags(lrt.CtrlTreat30or60,n=Inf)

sum(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat30or60$table$FDR<0.05)

sum(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat30or60$table$FDR<0.1)

head(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat30or60$table)

write.csv(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat30or60,
"toptags.TimeCDateLib.AmelFor30.60.df17.contrast.lrt.CtrlvTreat30or60.csv")

--
Matthew S. McNeill, Ph.D.
Post Doctoral Research Associate
University of Illinois
Institute for Genomic Biology
Department of Entomology
1206 W Gregory, rm 2414-G
Urbana, IL 61801

edgeR • 1.1k views

ADD COMMENT • link updated 9.3 years ago by Gordon Smyth 50k • written 10.5 years ago by Matt McNeill ▴ 20

Gordon Smyth · Answer 1 · 2013-10-25

This is to be expected, and is similar to the difference between an F-test as compared to a t-test. When you do test (2) you are looking specifically for changes at time 60min. When you do test (3) you are not specifying when the changes occur. Given that almost all the changes do in fact occur at 60min, you naturally have more power to detect these changes when you look for changes at this time specifically. When you perform test (3), you are asking the statistical test to control the error rate over a larger number of possible comparisons, so the bigger changes at 60min get diluted by the smaller changes at 30min making them harder to detect.

In general, the F-test (multiple contrasts at once) has more power than testing for individual contrasts if all the contrasts have logFC about the same size, so that they accumulate strength. However if one of the contrasts is much larger than the others, then the opposite is true. In that case power is maximized by testing for that contrast specifically, meaning that the t-test for that contrast specifically has more power.

Bottom line: you can't increase power in general for a statistical test by being more vague regarding the alternative hypothesis.

Best wishes
Gordon