EdgeR multi-factor testing question
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.6 years ago
Dear Gordon, I have one more question about the estimation of dispersion. When the three-way interaction term is insignificant, I will fit the model 2 without the three-way interaction to test the two-way interaction terms. When all interaction terms are insignificant, I fit the additive model (model 3) to test the main effect. Could I use the same dispersion for all the models, i.e., model 1 (including everything), model 2 (without three-way interaction term) and model 3 (additive model)? Could this dispersion be estimated under design of model 1? Thank you! Yanzhu --------------------------------------------------------- Dear Yanzhu, Your analysis is fine from a code point of view. From a statistical point of view however your analysis is too simple because you are neglecting the principle of marginality: http://en.wikipedia.org/wiki/Principle_of_marginality For the model you have fitted, it makes sense to test for the three- way interaction as you do. However it does not make statistical sense to test for the main effects or two-interactions until you have established that the three-way interaction is non-significant. For count data, the tests for the lower-level interactions need to be computed by successively removing each level of interactions from the model. See for example: https://stat.ethz.ch/pipermail/bioconductor/2013-December/056584.html This is the same as the anova() function does in R for unbalanced linear factorial models. Furthermore, testing the two-way interations is only sensible for genes with non-signicant 3-way interactions. Similarly, testing the main effect is only sensible for genes with non-significant 2-way and 3-way interactions. Otherwise these tests have no useful scientific meaning. This is a basic drawback of the factorial anova approach. You might consider the alternative approach described in Section 3.3.1 of the edgeR User's Guide. Best wishes Gordon -- output of sessionInfo(): > sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_3.2.4 limma_3.16.8 loaded via a namespace (and not attached): [1] tools_3.0.1 -- Sent via the guest posting facility at bioconductor.org.
• 1.2k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia
Dear Yanzhu, Yes, that's how I would do it. Keep the same dispersions for all fits. Best wishes Gordon > Date: Wed, 8 Jan 2014 06:36:16 -0800 (PST) > From: "Yanzhu [guest]" <guest at="" bioconductor.org=""> > To: bioconductor at r-project.org, mlinyzh at gmail.com > Subject: [BioC] EdgeR multi-factor testing question > > Dear Gordon, > > I have one more question about the estimation of dispersion. > > When the three-way interaction term is insignificant, I will fit the > model 2 without the three-way interaction to test the two-way > interaction terms. When all interaction terms are insignificant, I fit > the additive model (model 3) to test the main effect. Could I use the > same dispersion for all the models, i.e., model 1 (including > everything), model 2 (without three-way interaction term) and model 3 > (additive model)? Could this dispersion be estimated under design of > model 1? > > Thank you! > Yanzhu > > --------------------------------------------------------- > > Dear Yanzhu, > > Your analysis is fine from a code point of view. From a statistical point > of view however your analysis is too simple because you are neglecting the > principle of marginality: > > http://en.wikipedia.org/wiki/Principle_of_marginality > > For the model you have fitted, it makes sense to test for the three- way > interaction as you do. However it does not make statistical sense to test > for the main effects or two-interactions until you have established that > the three-way interaction is non-significant. > > For count data, the tests for the lower-level interactions need to be > computed by successively removing each level of interactions from the > model. See for example: > > https://stat.ethz.ch/pipermail/bioconductor/2013-December/056584.html > > This is the same as the anova() function does in R for unbalanced linear > factorial models. > > Furthermore, testing the two-way interations is only sensible for genes > with non-signicant 3-way interactions. Similarly, testing the main effect > is only sensible for genes with non-significant 2-way and 3-way > interactions. Otherwise these tests have no useful scientific meaning. > > This is a basic drawback of the factorial anova approach. You might > consider the alternative approach described in Section 3.3.1 of the edgeR > User's Guide. > > Best wishes > Gordon > > > > > -- output of sessionInfo(): > >> sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] edgeR_3.2.4 limma_3.16.8 > > loaded via a namespace (and not attached): > [1] tools_3.0.1 > > > -- > Sent via the guest posting facility at bioconductor.org. ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT

Login before adding your answer.

Traffic: 666 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6