EdgeR multi-factors testing questions

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.6 years ago

Hi Gordon, I have one more question about edgeR. I have used different normalization methods to normalize the data and then test the main effects, the two-way interaction terms and the three-way interaction term as we discussed before. To my surprise, the P-values results of the testings are the same for data normalized by TMM and Upper quartile. Is it possible? Best, Yanzhu --------------------------------------------------------- Dear Yanzhu, Yes, that's how I would do it. Keep the same dispersions for all fits. Best wishes Gordon -- output of sessionInfo(): > sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] DESeq_1.12.1 lattice_0.20-15 locfit_1.5-9.1 Biobase_2.20.1 BiocGenerics_0.6.0 [6] edgeR_3.2.4 limma_3.16.8 loaded via a namespace (and not attached): [1] annotate_1.38.0 AnnotationDbi_1.22.6 DBI_0.2-7 genefilter_1.42.0 [5] geneplotter_1.38.0 grid_3.0.1 IRanges_1.18.4 RColorBrewer_1.0-5 [9] RSQLite_0.11.4 splines_3.0.1 stats4_3.0.1 survival_2.37-4 [13] tools_3.0.1 XML_3.98-1.1 xtable_1.7-1 -- Sent via the guest posting facility at bioconductor.org.

edgeR edgeR • 1.1k views

ADD COMMENT • link updated 10.2 years ago by Gordon Smyth 50k • written 10.3 years ago by Guest User ★ 13k

0

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 8 months ago

Scripps Research, La Jolla, CA

I don't think it's surprising. TMM and upper quartile tend to give very similar normalization factors. Are you getting *exactly* identical P-values, or only very similar ones? Also, you can check the normalization factors themselves to see how much of a difference there is. If "d" is your DGEList object, you can get the normalization factors by typeing "d$samples$norm.factors. -Ryan On Mon 27 Jan 2014 05:16:13 AM PST, Yanzhu [guest] wrote: > > Hi Gordon, > > I have one more question about edgeR. I have used different normalization methods to normalize the data and then test the main effects, the two-way interaction terms and the three-way interaction term as we discussed before. To my surprise, the P-values results of the testings are the same for data normalized by TMM and Upper quartile. Is it possible? > > > Best, > > > Yanzhu > > > --------------------------------------------------------- > > > Dear Yanzhu, > > Yes, that's how I would do it. Keep the same dispersions for all fits. > > Best wishes > Gordon > > > > -- output of sessionInfo(): > >> sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods base > > other attached packages: > [1] DESeq_1.12.1 lattice_0.20-15 locfit_1.5-9.1 Biobase_2.20.1 BiocGenerics_0.6.0 > [6] edgeR_3.2.4 limma_3.16.8 > > loaded via a namespace (and not attached): > [1] annotate_1.38.0 AnnotationDbi_1.22.6 DBI_0.2-7 genefilter_1.42.0 > [5] geneplotter_1.38.0 grid_3.0.1 IRanges_1.18.4 RColorBrewer_1.0-5 > [9] RSQLite_0.11.4 splines_3.0.1 stats4_3.0.1 survival_2.37-4 > [13] tools_3.0.1 XML_3.98-1.1 xtable_1.7-1 > > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 10.2 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.6 years ago

Hi Ryan, The P-values are exactly identical, please see the example of the testing the three-way interaction (as below (II)): (I) I have checked the normalization factors for both methods, (i) TMM: head(y$sample$norm.factors) [1] 0.9404963 0.9056276 0.9928999 0.9961922 1.0109106 0.9115558 (ii) UQ: head(y$sample$norm.factors) [1] 0.9883357 0.9625269 1.0414372 1.0143681 1.1134091 0.8992270 (II) Example: the following is how I test the three-way interaction term: (i) TMM: group<-paste(L,S,R,sep=".") design<-model.matrix(~L+R+S+L:R+L:S+R:S+L:R:S) y<-DGEList(counts=counts,group=group) y<-calcNormFactors(y,method="TMM") offset=log((y$sample)[,2]) y<-estimateGLMCommonDisp(y,design) y<-estimateGLMTagwiseDisp(y,design) fiteTMM_LRS<-glmFit(y,design,offset=offset ) ### testing the three-way interaction term L:R:S lrteTMM_LRS<-glmLRT(fiteTMM_LRS,coef=c(67:96)) ### P-values: head((lrteTMM_LRS$table)[,33]) [1] 1.233769e-30 5.648507e-30 4.254337e-06 9.304154e-05 8.504918e-01 1.075495e-41 (ii) UQ: ##################### UQ group<-paste(L,S,R,sep=".") design<-model.matrix(~L+R+S+L:R+L:S+R:S+L:R:S) y<-DGEList(counts=counts,group=group) y<-calcNormFactors(y,method="upperquartile",p=0.75) offset=log((y$sample)[,2]) y<-estimateGLMCommonDisp(y,design) y<-estimateGLMTagwiseDisp(y,design) fiteUQ_LRS<-glmFit(y,design,offset=offset ) ### testing the three-way interaction term L:R:S lrteUQ_LRS<-glmLRT(fiteUQ_LRS,coef=c(67:96)) ### P-values: head((lrteUQ_LRS$table)[,33]) [1] 1.233769e-30 5.648507e-30 4.254337e-06 9.304154e-05 8.504918e-01 1.075495e-41 The P-values from both normalization methods are exactly identical, here I only show part of it. Thanks. Yanzhu ------------------------------------------------------- I don't think it's surprising. TMM and upper quartile tend to give very similar normalization factors. Are you getting *exactly* identical P-values, or only very similar ones? Also, you can check the normalization factors themselves to see how much of a difference there is. If "d" is your DGEList object, you can get the normalization factors by typeing "d$samples$norm.factors. -Ryan -- output of sessionInfo(): sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] DESeq_1.12.1 lattice_0.20-15 locfit_1.5-9.1 Biobase_2.20.1 BiocGenerics_0.6.0 edgeR_3.2.4 limma_3.16.8 loaded via a namespace (and not attached): [1] annotate_1.38.0 AnnotationDbi_1.22.6 DBI_0.2-7 genefilter_1.42.0 geneplotter_1.38.0 grid_3.0.1 [7] IRanges_1.18.4 RColorBrewer_1.0-5 RSQLite_0.11.4 splines_3.0.1 stats4_3.0.1 survival_2.37-4 [13] XML_3.98-1.1 xtable_1.7-1 -- Sent via the guest posting facility at bioconductor.org.

ADD COMMENT • link 10.2 years ago Guest User ★ 13k

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 29 minutes ago

WEHI, Melbourne, Australia

Dear Yanzhu, The culprit is your code here: glmFit(y,design,offset=offset ) whereby you are feeding your own offset to glmFit() and hence overwriting any normalization that has been done. Why are you doing this? If you just remove the offset=offset argument and allow edgeR to work as normal, then the results will be correct. Best wishes Gordon > Date: Tue, 28 Jan 2014 06:38:26 -0800 (PST) > From: "Yanzhu [guest]" <guest at="" bioconductor.org=""> > To: bioconductor at r-project.org, mlinyzh at gmail.com > Subject: [BioC] EdgeR multi-factors testing questions > > > Hi Ryan, > > The P-values are exactly identical, please see the example of the testing the three-way interaction (as below (II)): > > (I) I have checked the normalization factors for both methods, > (i) TMM: > head(y$sample$norm.factors) > [1] 0.9404963 0.9056276 0.9928999 0.9961922 1.0109106 0.9115558 > > (ii) UQ: > head(y$sample$norm.factors) > [1] 0.9883357 0.9625269 1.0414372 1.0143681 1.1134091 0.8992270 > > (II) Example: the following is how I test the three-way interaction term: > (i) TMM: > group<-paste(L,S,R,sep=".") > design<-model.matrix(~L+R+S+L:R+L:S+R:S+L:R:S) > y<-DGEList(counts=counts,group=group) > y<-calcNormFactors(y,method="TMM") > offset=log((y$sample)[,2]) > y<-estimateGLMCommonDisp(y,design) > y<-estimateGLMTagwiseDisp(y,design) > > fiteTMM_LRS<-glmFit(y,design,offset=offset ) > > ### testing the three-way interaction term L:R:S > lrteTMM_LRS<-glmLRT(fiteTMM_LRS,coef=c(67:96)) > > ### P-values: > head((lrteTMM_LRS$table)[,33]) > [1] 1.233769e-30 5.648507e-30 4.254337e-06 9.304154e-05 8.504918e-01 1.075495e-41 > > (ii) UQ: > ##################### UQ > group<-paste(L,S,R,sep=".") > design<-model.matrix(~L+R+S+L:R+L:S+R:S+L:R:S) > y<-DGEList(counts=counts,group=group) > y<-calcNormFactors(y,method="upperquartile",p=0.75) > offset=log((y$sample)[,2]) > y<-estimateGLMCommonDisp(y,design) > y<-estimateGLMTagwiseDisp(y,design) > > fiteUQ_LRS<-glmFit(y,design,offset=offset ) > > ### testing the three-way interaction term L:R:S > lrteUQ_LRS<-glmLRT(fiteUQ_LRS,coef=c(67:96)) > > ### P-values: > head((lrteUQ_LRS$table)[,33]) > [1] 1.233769e-30 5.648507e-30 4.254337e-06 9.304154e-05 8.504918e-01 1.075495e-41 > > The P-values from both normalization methods are exactly identical, here I only show part of it. > > > Thanks. > > > Yanzhu > > ------------------------------------------------------- > I don't think it's surprising. TMM and upper quartile tend to give very > similar normalization factors. Are you getting *exactly* identical > P-values, or only very similar ones? Also, you can check the > normalization factors themselves to see how much of a difference there > is. If "d" is your DGEList object, you can get the normalization > factors by typeing "d$samples$norm.factors. > > -Ryan > > > > -- output of sessionInfo(): > > sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C LC_TIME=English_United States.1252 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods base > > other attached packages: > [1] DESeq_1.12.1 lattice_0.20-15 locfit_1.5-9.1 Biobase_2.20.1 BiocGenerics_0.6.0 edgeR_3.2.4 limma_3.16.8 > > loaded via a namespace (and not attached): > [1] annotate_1.38.0 AnnotationDbi_1.22.6 DBI_0.2-7 genefilter_1.42.0 geneplotter_1.38.0 grid_3.0.1 > [7] IRanges_1.18.4 RColorBrewer_1.0-5 RSQLite_0.11.4 splines_3.0.1 stats4_3.0.1 survival_2.37-4 > [13] XML_3.98-1.1 xtable_1.7-1 ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD COMMENT • link 10.2 years ago Gordon Smyth 50k

Login before adding your answer.