Question

EdgeR multi-factor testing question

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.6 years ago

Dear Gordon, I have one more question about the estimation of dispersion. When the three-way interaction term is insignificant, I will fit the model 2 without the three-way interaction to test the two-way interaction terms. When all interaction terms are insignificant, I fit the additive model (model 3) to test the main effect. Could I use the same dispersion for all the models, i.e., model 1 (including everything), model 2 (without three-way interaction term) and model 3 (additive model)? Could this dispersion be estimated under design of model 1? Thank you! Yanzhu --------------------------------------------------------- Dear Yanzhu, Your analysis is fine from a code point of view. From a statistical point of view however your analysis is too simple because you are neglecting the principle of marginality: http://en.wikipedia.org/wiki/Principle_of_marginality For the model you have fitted, it makes sense to test for the three- way interaction as you do. However it does not make statistical sense to test for the main effects or two-interactions until you have established that the three-way interaction is non-significant. For count data, the tests for the lower-level interactions need to be computed by successively removing each level of interactions from the model. See for example: https://stat.ethz.ch/pipermail/bioconductor/2013-December/056584.html This is the same as the anova() function does in R for unbalanced linear factorial models. Furthermore, testing the two-way interations is only sensible for genes with non-signicant 3-way interactions. Similarly, testing the main effect is only sensible for genes with non-significant 2-way and 3-way interactions. Otherwise these tests have no useful scientific meaning. This is a basic drawback of the factorial anova approach. You might consider the alternative approach described in Section 3.3.1 of the edgeR User's Guide. Best wishes Gordon -- output of sessionInfo(): > sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_3.2.4 limma_3.16.8 loaded via a namespace (and not attached): [1] tools_3.0.1 -- Sent via the guest posting facility at bioconductor.org.

• 1.2k views

ADD COMMENT • link updated 10.3 years ago by Gordon Smyth 50k • written 10.3 years ago by Guest User ★ 13k

score 0 · Answer 1 · 2014-01-10

Dear Yanzhu, Yes, that's how I would do it. Keep the same dispersions for all fits. Best wishes Gordon > Date: Wed, 8 Jan 2014 06:36:16 -0800 (PST) > From: "Yanzhu [guest]" <guest at="" bioconductor.org=""> > To: bioconductor at r-project.org, mlinyzh at gmail.com > Subject: [BioC] EdgeR multi-factor testing question > > Dear Gordon, > > I have one more question about the estimation of dispersion. > > When the three-way interaction term is insignificant, I will fit the > model 2 without the three-way interaction to test the two-way > interaction terms. When all interaction terms are insignificant, I fit > the additive model (model 3) to test the main effect. Could I use the > same dispersion for all the models, i.e., model 1 (including > everything), model 2 (without three-way interaction term) and model 3 > (additive model)? Could this dispersion be estimated under design of > model 1? > > Thank you! > Yanzhu > > --------------------------------------------------------- > > Dear Yanzhu, > > Your analysis is fine from a code point of view. From a statistical point > of view however your analysis is too simple because you are neglecting the > principle of marginality: > > http://en.wikipedia.org/wiki/Principle_of_marginality > > For the model you have fitted, it makes sense to test for the three- way > interaction as you do. However it does not make statistical sense to test > for the main effects or two-interactions until you have established that > the three-way interaction is non-significant. > > For count data, the tests for the lower-level interactions need to be > computed by successively removing each level of interactions from the > model. See for example: > > https://stat.ethz.ch/pipermail/bioconductor/2013-December/056584.html > > This is the same as the anova() function does in R for unbalanced linear > factorial models. > > Furthermore, testing the two-way interations is only sensible for genes > with non-signicant 3-way interactions. Similarly, testing the main effect > is only sensible for genes with non-significant 2-way and 3-way > interactions. Otherwise these tests have no useful scientific meaning. > > This is a basic drawback of the factorial anova approach. You might > consider the alternative approach described in Section 3.3.1 of the edgeR > User's Guide. > > Best wishes > Gordon > > > > > -- output of sessionInfo(): > >> sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] edgeR_3.2.4 limma_3.16.8 > > loaded via a namespace (and not attached): > [1] tools_3.0.1 > > > -- > Sent via the guest posting facility at bioconductor.org. ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}