ANOVA in EdgeR
2
0
Entering edit mode
@juliegreen-9268
Last seen 2.6 years ago
United States

Hi.  I am trying to figure out how to do an ANOVA like test with my data in edgeR.

My data has 2 variables: treatment (2 levels) and time (5 levels).   I'd like to test for the effect of treatment, time, and the interaction between the 2.  Something like y ~ time + treatment + treatment:time

I can see two possible ways to do this in edgeR:

1) combine treatment and time into a single factor and then test as in section 3.3.1 of the edgeR vignette

design=model.matrix(~0+comboFactor,data=myData)

the treatment effect contrast would be:  (Treat1.Time1 + Treat1.Time2 + Treat1.Time3 + Treat1.Time4 + Treat1.Time5) - (Treat2.Time1 + Treat2.Time2 + Treat2.Time3 + Treat2.Time4 + Treat2.Time5)

the time effect contrast would be: (Treat1.Time1 + Treat2.Time1) - (Treat1.Time2 + Treat2.Time2) - (Treat1.Time3 + Treat2.Time3) - (Treat1.Time4 + Treat2.Time4) - (Treat1.Time5 + Treat2.Time5)

the interaction effect contrast would be: (Treat1.Time1 - Treat2.Time1) - (Treat1.Time2 - Treat2.Time2) - (Treat1.Time3 - Treat2.Time3) - (Treat1.Time4 - Treat2.Time4) - (Treat1.Time5 - Treat2.Time5)

test this model with glmFit

2) use glmLRT as in the vignette 3.3.2 and 3.3.3 sections

for interaction effect:

design = model.matrix(~Treat * Time, data = myData)

This gives me a design matrix with columns:

(Intercept)    treat2    time2    time3    time4    time5    treat2:time2    treat2:time3    treat2:time4    treat2:time5

I think glmLRT(fit, coef=7:10) gives me the treatment by time interaction effect

for main effect of time:  glmLRT(fit, coef=3:6)

for main effect of treatment: glmLRT(fit, coef=2)

Is this correct??? and which method is 'best'?

Thanks,

Julie

Here's my R session info:

R Session Info:

R version 3.2.2 (2015-08-14)

Platform: x86_64-w64-mingw32/x64 (64-bit)

Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:

[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages:

[1] stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages:

[1] DESeq2_1.10.0 RcppArmadillo_0.6.200.2.0 Rcpp_0.12.1 qvalue_2.2.0 goseq_1.22.0 geneLenDataBase_1.6.0 BiasedUrn_1.06.1 WGCNA_1.48 RSQLite_1.0.0

[10] DBI_0.3.1 fastcluster_1.1.16 dynamicTreeCut_1.62 ggplot2_1.0.1 rgl_0.95.1367 EDASeq_2.4.0 ShortRead_1.28.0 GenomicAlignments_1.6.1 SummarizedExperiment_1.0.1

[19] Rsamtools_1.22.0 GenomicRanges_1.22.1 GenomeInfoDb_1.6.1 Biostrings_2.38.0 XVector_0.10.0 IRanges_2.4.1 S4Vectors_0.8.1 BiocParallel_1.4.0 Biobase_2.30.0

[28] BiocGenerics_0.16.1 cluster_2.0.3 edgeR_3.12.0 limma_3.26.2 rj_2.0.4-2

loaded via a namespace (and not attached):

[1] splines_3.2.2 foreach_1.4.3 R.utils_2.1.0 Formula_1.2-1 aroma.light_3.0.0 latticeExtra_0.6-26 impute_1.44.0 lattice_0.20-33 digest_0.6.8 RColorBrewer_1.1-2

[11] colorspace_1.2-6 Matrix_1.2-2 preprocessCore_1.32.0 R.oo_1.19.0 plyr_1.8.3 XML_3.98-1.3 biomaRt_2.26.0 genefilter_1.52.0 zlibbioc_1.16.0 xtable_1.8-0

[21] GO.db_3.2.2 scales_0.3.0 annotate_1.48.0 mgcv_1.8-9 GenomicFeatures_1.22.4 nnet_7.3-11 proto_0.3-10 survival_2.38-3 magrittr_1.5 R.methodsS3_1.7.0

[31] doParallel_1.0.10 nlme_3.1-122 MASS_7.3-44 hwriter_1.3.2 foreign_0.8-66 tools_3.2.2 matrixStats_0.15.0 stringr_1.0.0 locfit_1.5-9.1 munsell_0.4.2

[41] AnnotationDbi_1.32.0 lambda.r_1.1.7 DESeq_1.22.0 futile.logger_1.4.1 grid_3.2.2 RCurl_1.95-4.7 iterators_1.0.8 bitops_1.0-6 gtable_0.1.2 codetools_0.2-14

[51] reshape2_1.4.1 gridExtra_2.0.0 rtracklayer_1.30.1 Hmisc_3.17-0 futile.options_1.0.0 stringi_1.0-1 geneplotter_1.48.0 rpart_4.1-10 acepack_1.3-3.3

0
Entering edit mode
Yunshun Chen ▴ 790
@yunshun-chen-5451
Last seen 4 weeks ago
Australia

Your second method looks correct. The time effect and interaction effect contrasts in your first method don't make any sense.

0
Entering edit mode

I don't believe the second method is correct either. The interaction coefficients are correct, but the coefficients for the "main effects" are actually the coefficients for the time effects in treatment 1 and the treatment effects in time 1.

1
Entering edit mode

I agree with Ryan here. The naming scheme used by model.matrix is quite misleading.

In general, if you want to test for main effects, you'll have to first check that the interaction terms are not significant. Otherwise, you could end up with a situation where, e.g., Treat1 is downregulated relative to Treat2 at an earlier time point, but is upregulated relative to Treat2 at a later time point. Trying to consider the "main effect" of treatment would make no sense here, as the effects at the two time points are in opposing directions.

I prefer to use a one-way layout (i.e., the parametrization in your first approach) and compare pairs of groups separately, e.g., Treat1.Time1 against Treat2.Time1, Treat1.Time2 against Treat2.Time2, and so on. Then I intersect the DE lists from the pairwise comparisons between treatments at some or all time points, to get a set of genes that change robustly and in the same direction in response to treatment at each time point. The other strategy is to do all those pairwise contrasts at once in an ANOVA-style comparison, and only pick the DE genes that have the same sign of the log-fold change across all of the contrasts.

0
Entering edit mode
@juliegreen-9268
Last seen 2.6 years ago
United States

Thank you all.  It looks like I can get the time by treatment interaction effect via the following:

design = model.matrix(~Treat * Time, data = myData)

This gives me a design matrix with columns:

(Intercept)    treat2    time2    time3    time4    time5    treat2:time2    treat2:time3    treat2:time4    treat2:time5

glmLRT(fit, coef=7:10) gives me the treatment by time interaction effect

but I'd really like to assign a p-value to each gene for the overall treatment and time effect.  Is that possible in edgeR?

1
Entering edit mode

Despite what it may seem, this is not a simple question to answer. For starters, there are many definitions of "overall". If you want to compute a "main effect" or an average effect of time or treatment, then see my comment to Ryan's answer. If you just want to get genes that have any DE with respect to time, then you could set coef=3:10 in glmLRT. This will test for DE between any of the time points, conditional on the treatment. Similarly, if you want to get genes that have any DE with respect to treatment, then you could set coef=c(2, 7:10). In both cases, you'll get a single p-value indicating whether time or treatment has any effect on gene expression. You'll have to pick through the log-fold changes to figure out exactly what the effect is, though - for example, a treatment effect might only be occurring at one or two time points, but may be sufficient to be significant.