Hi. I am trying to figure out how to do an ANOVA like test with my data in edgeR.
My data has 2 variables: treatment (2 levels) and time (5 levels). I'd like to test for the effect of treatment, time, and the interaction between the 2. Something like y ~ time + treatment + treatment:time
I can see two possible ways to do this in edgeR:
1) combine treatment and time into a single factor and then test as in section 3.3.1 of the edgeR vignette
design=model.matrix(~0+comboFactor,data=myData)
the treatment effect contrast would be: (Treat1.Time1 + Treat1.Time2 + Treat1.Time3 + Treat1.Time4 + Treat1.Time5) - (Treat2.Time1 + Treat2.Time2 + Treat2.Time3 + Treat2.Time4 + Treat2.Time5)
the time effect contrast would be: (Treat1.Time1 + Treat2.Time1) - (Treat1.Time2 + Treat2.Time2) - (Treat1.Time3 + Treat2.Time3) - (Treat1.Time4 + Treat2.Time4) - (Treat1.Time5 + Treat2.Time5)
the interaction effect contrast would be: (Treat1.Time1 - Treat2.Time1) - (Treat1.Time2 - Treat2.Time2) - (Treat1.Time3 - Treat2.Time3) - (Treat1.Time4 - Treat2.Time4) - (Treat1.Time5 - Treat2.Time5)
test this model with glmFit
2) use glmLRT as in the vignette 3.3.2 and 3.3.3 sections
for interaction effect:
design = model.matrix(~Treat * Time, data = myData)
This gives me a design matrix with columns:
(Intercept) treat2 time2 time3 time4 time5 treat2:time2 treat2:time3 treat2:time4 treat2:time5
I think glmLRT(fit, coef=7:10) gives me the treatment by time interaction effect
for main effect of time: glmLRT(fit, coef=3:6)
for main effect of treatment: glmLRT(fit, coef=2)
Is this correct??? and which method is 'best'?
Thanks,
Julie
Here's my R session info:
R Session Info:
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] DESeq2_1.10.0 RcppArmadillo_0.6.200.2.0 Rcpp_0.12.1 qvalue_2.2.0 goseq_1.22.0 geneLenDataBase_1.6.0 BiasedUrn_1.06.1 WGCNA_1.48 RSQLite_1.0.0
[10] DBI_0.3.1 fastcluster_1.1.16 dynamicTreeCut_1.62 ggplot2_1.0.1 rgl_0.95.1367 EDASeq_2.4.0 ShortRead_1.28.0 GenomicAlignments_1.6.1 SummarizedExperiment_1.0.1
[19] Rsamtools_1.22.0 GenomicRanges_1.22.1 GenomeInfoDb_1.6.1 Biostrings_2.38.0 XVector_0.10.0 IRanges_2.4.1 S4Vectors_0.8.1 BiocParallel_1.4.0 Biobase_2.30.0
[28] BiocGenerics_0.16.1 cluster_2.0.3 edgeR_3.12.0 limma_3.26.2 rj_2.0.4-2
loaded via a namespace (and not attached):
[1] splines_3.2.2 foreach_1.4.3 R.utils_2.1.0 Formula_1.2-1 aroma.light_3.0.0 latticeExtra_0.6-26 impute_1.44.0 lattice_0.20-33 digest_0.6.8 RColorBrewer_1.1-2
[11] colorspace_1.2-6 Matrix_1.2-2 preprocessCore_1.32.0 R.oo_1.19.0 plyr_1.8.3 XML_3.98-1.3 biomaRt_2.26.0 genefilter_1.52.0 zlibbioc_1.16.0 xtable_1.8-0
[21] GO.db_3.2.2 scales_0.3.0 annotate_1.48.0 mgcv_1.8-9 GenomicFeatures_1.22.4 nnet_7.3-11 proto_0.3-10 survival_2.38-3 magrittr_1.5 R.methodsS3_1.7.0
[31] doParallel_1.0.10 nlme_3.1-122 MASS_7.3-44 hwriter_1.3.2 foreign_0.8-66 tools_3.2.2 matrixStats_0.15.0 stringr_1.0.0 locfit_1.5-9.1 munsell_0.4.2
[41] AnnotationDbi_1.32.0 lambda.r_1.1.7 DESeq_1.22.0 futile.logger_1.4.1 grid_3.2.2 RCurl_1.95-4.7 iterators_1.0.8 bitops_1.0-6 gtable_0.1.2 codetools_0.2-14
[51] reshape2_1.4.1 gridExtra_2.0.0 rtracklayer_1.30.1 Hmisc_3.17-0 futile.options_1.0.0 stringi_1.0-1 geneplotter_1.48.0 rpart_4.1-10 acepack_1.3-3.3
I don't believe the second method is correct either. The interaction coefficients are correct, but the coefficients for the "main effects" are actually the coefficients for the time effects in treatment 1 and the treatment effects in time 1.
I agree with Ryan here. The naming scheme used by
model.matrix
is quite misleading.In general, if you want to test for main effects, you'll have to first check that the interaction terms are not significant. Otherwise, you could end up with a situation where, e.g., Treat1 is downregulated relative to Treat2 at an earlier time point, but is upregulated relative to Treat2 at a later time point. Trying to consider the "main effect" of treatment would make no sense here, as the effects at the two time points are in opposing directions.
I prefer to use a one-way layout (i.e., the parametrization in your first approach) and compare pairs of groups separately, e.g., Treat1.Time1 against Treat2.Time1, Treat1.Time2 against Treat2.Time2, and so on. Then I intersect the DE lists from the pairwise comparisons between treatments at some or all time points, to get a set of genes that change robustly and in the same direction in response to treatment at each time point. The other strategy is to do all those pairwise contrasts at once in an ANOVA-style comparison, and only pick the DE genes that have the same sign of the log-fold change across all of the contrasts.