limma: is it best to always include paired structure (sibship) in design?
2
0
Entering edit mode
Guido Hooiveld ★ 3.9k
@guido-hooiveld-2020
Last seen 1 day ago
Wageningen University, Wageningen, the …
Hi, I am doing an analysis on a dataset, and have a question on whether or not to include the paired structure (sibship) in the model fitted by limma if this is not explicitly needed. We discussed this internally, but didn't get consensus. Hence, I would like to ask for opinions on this list... :) Let me explain: - Affymetrix data, RMA normalized. - samples from 20 subjects were analyzed on arrays, obtained from 10 'young' and 10 'old' individuals. - samples were taken at baseline, and after treatment with a drug (WY). Part of target file: >targets Filename Treatment Sibship Category 1 G068_B07_05_05_CTRL.CEL Ctrl 5 old 2 G068_B09_06_06_CTRL.CEL Ctrl 6 old 3 G068_C05_07_07_CTRL.CEL Ctrl 7 old 4 G068_C09_09_09_CTRL_2.CEL Ctrl 9 young 5 G068_D05_10_10_CTRL.CEL Ctrl 10 young 6 G068_D07_11_11_CTRL.CEL Ctrl 11 young 7 G068_F07_17_05_WY.CEL WY 5 old 8 G068_F09_18_06_WY.CEL WY 6 old 9 G068_G05_19_07_WY.CEL WY 7 old 10 G068_G09_21_09_WY.CEL WY 9 young 11 G068_H05_22_10_WY.CEL WY 10 young 12 G068_H07_23_11_WY.CEL WY 11 young It is obvious that for the treatment effect a paired analysis should be performed (using sibship info). This could be done for the whole group, or for young and old separately. > TC <- as.factor(paste(targets$Treatment, targets$Category, sep=".")) > design <- model.matrix(~0+TC+Sibship) > > fit <- lmFit(x.norm, design) Coefficients not estimable: Sibship8 Sibship12 Warning message: Partial NA coefficients for 19682 probe(s) > >#test for effect of WY in old and young > cont.matrix <- makeContrasts( + WYold=(TCWY.old-TCCtrl.old), + WYyoung=(TCWY.young-TCCtrl.young), + levels=design) > > fit2 <- contrasts.fit(fit, cont.matrix) > fit2 <- eBayes(fit2) If I would like to identify the probesets that are differentially expressed between old and young under either control or treatment conditions, I am essentially performing an unpaired t-test. Hence, info on sibship is thus not required. > cont.matrix <- makeContrasts( + ctrlold_ctrlyoung=(TCCtrl.old-TCCtrl.young), + WYold_WYyoung=(TCWY.old-TCWY.young), + levels=design) > However, I noticed that the results of these 2nd set of comparisons (the old vs young) are strongly affected by including or not the sibship in the design. In other words, if I define this design: > design <- model.matrix(~0+TC+Sibship) I get a completely different set of top regulated probesets for the before mentioned contrasts when compared to this design (without sibship): > design <- model.matrix(~0+TC) I noticed that also the p-values are much smaller when including the sibship. As an example, WITH sibship: > topTable(fit2,coef=1) ID logFC AveExpr t P.Value adj.P.Val B 15031 671_at -4.763308 5.591752 -51.58016 4.457435e-18 6.583366e-14 29.06475 9938 4317_at -5.545104 4.893897 -50.17288 6.689732e-18 6.583366e-14 28.82092 7454 2944_at -7.992487 5.861613 -43.89646 4.747228e-17 3.114498e-13 27.56297 14844 654433_at 5.136677 5.807219 40.89921 1.337134e-16 6.579367e-13 26.84567 1115 1088_at -4.762000 5.650997 -39.67501 2.085766e-16 8.210408e-13 26.52704 656 10321_at -6.458380 5.445312 -37.74764 4.319761e-16 1.417026e-12 25.99187 4162 1991_at -4.080680 6.171290 -36.76955 6.339135e-16 1.627014e-12 25.70344 3784 1669_at -6.130596 5.171480 -36.66315 6.613207e-16 1.627014e-12 25.67134 9557 4057_at -6.079963 5.789428 -32.16516 4.460724e-15 9.755107e-12 24.17063 2584 1359_at 3.730962 6.155636 30.49084 9.709508e-15 1.911025e-11 23.53108 > WITHOUT sibship: > topTable(fit2,coef=1) ID logFC AveExpr t P.Value adj.P.Val B 14844 654433_at 5.320186 5.807219 9.802842 2.297163e-09 2.577632e-05 11.534050 18751 9173_at 3.052422 4.715249 9.439677 4.453505e-09 2.577632e-05 10.917557 6220 2624_at 2.443030 5.771388 9.281647 5.970493e-09 2.577632e-05 10.643512 13773 59340_at 4.154059 4.967316 9.221019 6.686698e-09 2.577632e-05 10.537431 2584 1359_at 3.572039 6.155636 9.095515 8.466416e-09 2.577632e-05 10.316169 16233 79608_at 1.774024 5.678042 9.073712 8.822506e-09 2.577632e-05 10.277501 2040 1232_at 2.911866 4.211083 9.053443 9.167473e-09 2.577632e-05 10.241491 17108 83478_at 1.522142 6.045796 8.750724 1.635842e-08 4.024580e-05 9.696605 1855 1178_at 3.134552 8.481612 8.619180 2.111666e-08 4.304048e-05 9.455663 6733 2766_at -2.157836 5.223548 -8.568223 2.332607e-08 4.304048e-05 9.361647 > Thus, although it is not required, would it be recommended to include for the 2nd set of contrasts the paired structure of the data in the design? I would argue to do so (since intuitively I feel it would be good to always include as much info as possible on the correlation structure of the data), but as said not everyone in the project team agrees with me. So any opinions/comments are much appreciated. Regards, Guido > sessionInfo() R version 2.15.1 (2012-06-22) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] splines stats graphics grDevices utils datasets methods base other attached packages: [1] hugene11stv1hsentrezg.db_15.1.0 org.Hs.eg.db_2.7.1 RSQLite_0.11.1 DBI_0.2-5 hugene11stv1hsentrezgcdf_15.1.0 AnnotationDbi_1.18.3 [7] qvalue_1.30.0 multtest_2.12.0 affy_1.34.0 limma_3.12.3 pamr_1.54 survival_2.36-14 [13] cluster_1.14.2 bladderbatch_1.0.3 Biobase_2.16.0 BiocGenerics_0.2.0 sva_3.2.1 mgcv_1.7-20 [19] corpcor_1.6.4 BiocInstaller_1.4.7 loaded via a namespace (and not attached): [1] affyio_1.24.0 grid_2.15.1 IRanges_1.14.4 lattice_0.20-10 MASS_7.3-21 Matrix_1.0-9 nlme_3.1-104 preprocessCore_1.18.0 [9] stats4_2.15.1 tcltk_2.15.1 tools_2.15.1 zlibbioc_1.2.0 > --------------------------------------------------------- Guido Hooiveld, PhD Nutrition, Metabolism & Genomics Group Division of Human Nutrition Wageningen University Biotechnion, Bomenweg 2 NL-6703 HD Wageningen the Netherlands tel: (+)31 317 485788 fax: (+)31 317 483342 email: guido.hooiveld@wur.nl internet: http://nutrigene.4t.com http://scholar.google.com/citations?user=qFHaMnoAAAAJ http://www.researcherid.com/rid/F-4912-2010 [[alternative HTML version deleted]]
• 1.3k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 14 hours ago
United States
Hi Guido, On 9/17/2012 5:50 PM, Hooiveld, Guido wrote: > Hi, > I am doing an analysis on a dataset, and have a question on whether or not to include the paired structure (sibship) in the model fitted by limma if this is not explicitly needed. We discussed this internally, but didn't get consensus. Hence, I would like to ask for opinions on this list... :) Personally I would keep the pairing in the model regardless. The goal IMO is to capture as much of the observed variability with biologically meaningful/plausible coefficients. Accounting for intra-subject variability is fair game in this context, and as you see, it increases your power to detect differences. Best, Jim > > Let me explain: > - Affymetrix data, RMA normalized. > - samples from 20 subjects were analyzed on arrays, obtained from 10 'young' and 10 'old' individuals. > - samples were taken at baseline, and after treatment with a drug (WY). > > Part of target file: >> targets > Filename Treatment Sibship Category > 1 G068_B07_05_05_CTRL.CEL Ctrl 5 old > 2 G068_B09_06_06_CTRL.CEL Ctrl 6 old > 3 G068_C05_07_07_CTRL.CEL Ctrl 7 old > 4 G068_C09_09_09_CTRL_2.CEL Ctrl 9 young > 5 G068_D05_10_10_CTRL.CEL Ctrl 10 young > 6 G068_D07_11_11_CTRL.CEL Ctrl 11 young > 7 G068_F07_17_05_WY.CEL WY 5 old > 8 G068_F09_18_06_WY.CEL WY 6 old > 9 G068_G05_19_07_WY.CEL WY 7 old > 10 G068_G09_21_09_WY.CEL WY 9 young > 11 G068_H05_22_10_WY.CEL WY 10 young > 12 G068_H07_23_11_WY.CEL WY 11 young > > It is obvious that for the treatment effect a paired analysis should be performed (using sibship info). This could be done for the whole group, or for young and old separately. > >> TC<- as.factor(paste(targets$Treatment, targets$Category, sep=".")) >> design<- model.matrix(~0+TC+Sibship) >> >> fit<- lmFit(x.norm, design) > Coefficients not estimable: Sibship8 Sibship12 > Warning message: > Partial NA coefficients for 19682 probe(s) >> #test for effect of WY in old and young >> cont.matrix<- makeContrasts( > + WYold=(TCWY.old-TCCtrl.old), > + WYyoung=(TCWY.young-TCCtrl.young), > + levels=design) >> fit2<- contrasts.fit(fit, cont.matrix) >> fit2<- eBayes(fit2) > > If I would like to identify the probesets that are differentially expressed between old and young under either control or treatment conditions, I am essentially performing an unpaired t-test. Hence, info on sibship is thus not required. > >> cont.matrix<- makeContrasts( > + ctrlold_ctrlyoung=(TCCtrl.old-TCCtrl.young), > + WYold_WYyoung=(TCWY.old-TCWY.young), > + levels=design) > However, I noticed that the results of these 2nd set of comparisons (the old vs young) are strongly affected by including or not the sibship in the design. In other words, if I define this design: >> design<- model.matrix(~0+TC+Sibship) > I get a completely different set of top regulated probesets for the before mentioned contrasts when compared to this design (without sibship): >> design<- model.matrix(~0+TC) > I noticed that also the p-values are much smaller when including the sibship. > > As an example, > WITH sibship: >> topTable(fit2,coef=1) > ID logFC AveExpr t P.Value adj.P.Val B > 15031 671_at -4.763308 5.591752 -51.58016 4.457435e-18 6.583366e-14 29.06475 > 9938 4317_at -5.545104 4.893897 -50.17288 6.689732e-18 6.583366e-14 28.82092 > 7454 2944_at -7.992487 5.861613 -43.89646 4.747228e-17 3.114498e-13 27.56297 > 14844 654433_at 5.136677 5.807219 40.89921 1.337134e-16 6.579367e-13 26.84567 > 1115 1088_at -4.762000 5.650997 -39.67501 2.085766e-16 8.210408e-13 26.52704 > 656 10321_at -6.458380 5.445312 -37.74764 4.319761e-16 1.417026e-12 25.99187 > 4162 1991_at -4.080680 6.171290 -36.76955 6.339135e-16 1.627014e-12 25.70344 > 3784 1669_at -6.130596 5.171480 -36.66315 6.613207e-16 1.627014e-12 25.67134 > 9557 4057_at -6.079963 5.789428 -32.16516 4.460724e-15 9.755107e-12 24.17063 > 2584 1359_at 3.730962 6.155636 30.49084 9.709508e-15 1.911025e-11 23.53108 > WITHOUT sibship: >> topTable(fit2,coef=1) > ID logFC AveExpr t P.Value adj.P.Val B > 14844 654433_at 5.320186 5.807219 9.802842 2.297163e-09 2.577632e-05 11.534050 > 18751 9173_at 3.052422 4.715249 9.439677 4.453505e-09 2.577632e-05 10.917557 > 6220 2624_at 2.443030 5.771388 9.281647 5.970493e-09 2.577632e-05 10.643512 > 13773 59340_at 4.154059 4.967316 9.221019 6.686698e-09 2.577632e-05 10.537431 > 2584 1359_at 3.572039 6.155636 9.095515 8.466416e-09 2.577632e-05 10.316169 > 16233 79608_at 1.774024 5.678042 9.073712 8.822506e-09 2.577632e-05 10.277501 > 2040 1232_at 2.911866 4.211083 9.053443 9.167473e-09 2.577632e-05 10.241491 > 17108 83478_at 1.522142 6.045796 8.750724 1.635842e-08 4.024580e-05 9.696605 > 1855 1178_at 3.134552 8.481612 8.619180 2.111666e-08 4.304048e-05 9.455663 > 6733 2766_at -2.157836 5.223548 -8.568223 2.332607e-08 4.304048e-05 9.361647 > Thus, although it is not required, would it be recommended to include for the 2nd set of contrasts the paired structure of the data in the design? > I would argue to do so (since intuitively I feel it would be good to always include as much info as possible on the correlation structure of the data), but as said not everyone in the project team agrees with me. > > So any opinions/comments are much appreciated. > > Regards, > Guido > > >> sessionInfo() > R version 2.15.1 (2012-06-22) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C LC_TIME=English_United States.1252 > > attached base packages: > [1] splines stats graphics grDevices utils datasets methods base > > other attached packages: > [1] hugene11stv1hsentrezg.db_15.1.0 org.Hs.eg.db_2.7.1 RSQLite_0.11.1 DBI_0.2-5 hugene11stv1hsentrezgcdf_15.1.0 AnnotationDbi_1.18.3 > [7] qvalue_1.30.0 multtest_2.12.0 affy_1.34.0 limma_3.12.3 pamr_1.54 survival_2.36-14 > [13] cluster_1.14.2 bladderbatch_1.0.3 Biobase_2.16.0 BiocGenerics_0.2.0 sva_3.2.1 mgcv_1.7-20 > [19] corpcor_1.6.4 BiocInstaller_1.4.7 > > loaded via a namespace (and not attached): > [1] affyio_1.24.0 grid_2.15.1 IRanges_1.14.4 lattice_0.20-10 MASS_7.3-21 Matrix_1.0-9 nlme_3.1-104 preprocessCore_1.18.0 > [9] stats4_2.15.1 tcltk_2.15.1 tools_2.15.1 zlibbioc_1.2.0 > --------------------------------------------------------- > Guido Hooiveld, PhD > Nutrition, Metabolism& Genomics Group > Division of Human Nutrition > Wageningen University > Biotechnion, Bomenweg 2 > NL-6703 HD Wageningen > the Netherlands > tel: (+)31 317 485788 > fax: (+)31 317 483342 > email: guido.hooiveld at wur.nl > internet: http://nutrigene.4t.com > http://scholar.google.com/citations?user=qFHaMnoAAAAJ > http://www.researcherid.com/rid/F-4912-2010 > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 10 minutes ago
WEHI, Melbourne, Australia
Dear Guido, There actually is no valid way to compare young vs old in your experiment while including Sibship in the design matrix. You would see this if you reverse the order of the terms in your model, forming the model ~Sibship+TC instead of ~TC+Sibship. You would find that the young vs old comparisons are all NA. This doesn't mean that the within sibship correlations shouldn't be taken into account. Your experiment is an example of a multilevel design with two levels of variability, one or families and one for animals within a family. See Section 8.7 "Multilevel models" in the limma User's Guide for how to handle this: http://bioconductor.org/packages/2.11/bioc/vignettes/limma/inst/doc/us ersguide.pdf Best wishes Gordon > Date: Mon, 17 Sep 2012 21:50:03 +0000 > From: "Hooiveld, Guido" <guido.hooiveld at="" wur.nl=""> > To: "bioconductor (bioconductor at stat.math.ethz.ch)" > <bioconductor at="" stat.math.ethz.ch=""> > Subject: [BioC] limma: is it best to always include paired structure > (sibship) in design? > > Hi, > I am doing an analysis on a dataset, and have a question on whether or > not to include the paired structure (sibship) in the model fitted by > limma if this is not explicitly needed. We discussed this internally, > but didn't get consensus. Hence, I would like to ask for opinions on > this list... :) > > Let me explain: > - Affymetrix data, RMA normalized. > - samples from 20 subjects were analyzed on arrays, obtained from 10 'young' and 10 'old' individuals. > - samples were taken at baseline, and after treatment with a drug (WY). > > Part of target file: >> targets > > Filename Treatment Sibship Category > 1 G068_B07_05_05_CTRL.CEL Ctrl 5 old > 2 G068_B09_06_06_CTRL.CEL Ctrl 6 old > 3 G068_C05_07_07_CTRL.CEL Ctrl 7 old > 4 G068_C09_09_09_CTRL_2.CEL Ctrl 9 young > 5 G068_D05_10_10_CTRL.CEL Ctrl 10 young > 6 G068_D07_11_11_CTRL.CEL Ctrl 11 young > 7 G068_F07_17_05_WY.CEL WY 5 old > 8 G068_F09_18_06_WY.CEL WY 6 old > 9 G068_G05_19_07_WY.CEL WY 7 old > 10 G068_G09_21_09_WY.CEL WY 9 young > 11 G068_H05_22_10_WY.CEL WY 10 young > 12 G068_H07_23_11_WY.CEL WY 11 young > > It is obvious that for the treatment effect a paired analysis should be > performed (using sibship info). This could be done for the whole group, > or for young and old separately. > >> TC <- as.factor(paste(targets$Treatment, targets$Category, sep=".")) >> design <- model.matrix(~0+TC+Sibship) >> >> fit <- lmFit(x.norm, design) > Coefficients not estimable: Sibship8 Sibship12 > Warning message: > Partial NA coefficients for 19682 probe(s) >> >> #test for effect of WY in old and young >> cont.matrix <- makeContrasts( > + WYold=(TCWY.old-TCCtrl.old), > + WYyoung=(TCWY.young-TCCtrl.young), > + levels=design) >> >> fit2 <- contrasts.fit(fit, cont.matrix) >> fit2 <- eBayes(fit2) > > > If I would like to identify the probesets that are differentially > expressed between old and young under either control or treatment > conditions, I am essentially performing an unpaired t-test. Hence, info > on sibship is thus not required. > >> cont.matrix <- makeContrasts( > + ctrlold_ctrlyoung=(TCCtrl.old-TCCtrl.young), > + WYold_WYyoung=(TCWY.old-TCWY.young), > + levels=design) >> > > However, I noticed that the results of these 2nd set of comparisons (the old vs young) are strongly affected by including or not the sibship in the design. In other words, if I define this design: >> design <- model.matrix(~0+TC+Sibship) > I get a completely different set of top regulated probesets for the before mentioned contrasts when compared to this design (without sibship): >> design <- model.matrix(~0+TC) > I noticed that also the p-values are much smaller when including the sibship. > > As an example, > WITH sibship: >> topTable(fit2,coef=1) > ID logFC AveExpr t P.Value adj.P.Val B > 15031 671_at -4.763308 5.591752 -51.58016 4.457435e-18 6.583366e-14 29.06475 > 9938 4317_at -5.545104 4.893897 -50.17288 6.689732e-18 6.583366e-14 28.82092 > 7454 2944_at -7.992487 5.861613 -43.89646 4.747228e-17 3.114498e-13 27.56297 > 14844 654433_at 5.136677 5.807219 40.89921 1.337134e-16 6.579367e-13 26.84567 > 1115 1088_at -4.762000 5.650997 -39.67501 2.085766e-16 8.210408e-13 26.52704 > 656 10321_at -6.458380 5.445312 -37.74764 4.319761e-16 1.417026e-12 25.99187 > 4162 1991_at -4.080680 6.171290 -36.76955 6.339135e-16 1.627014e-12 25.70344 > 3784 1669_at -6.130596 5.171480 -36.66315 6.613207e-16 1.627014e-12 25.67134 > 9557 4057_at -6.079963 5.789428 -32.16516 4.460724e-15 9.755107e-12 24.17063 > 2584 1359_at 3.730962 6.155636 30.49084 9.709508e-15 1.911025e-11 23.53108 >> > > WITHOUT sibship: >> topTable(fit2,coef=1) > ID logFC AveExpr t P.Value adj.P.Val B > 14844 654433_at 5.320186 5.807219 9.802842 2.297163e-09 2.577632e-05 11.534050 > 18751 9173_at 3.052422 4.715249 9.439677 4.453505e-09 2.577632e-05 10.917557 > 6220 2624_at 2.443030 5.771388 9.281647 5.970493e-09 2.577632e-05 10.643512 > 13773 59340_at 4.154059 4.967316 9.221019 6.686698e-09 2.577632e-05 10.537431 > 2584 1359_at 3.572039 6.155636 9.095515 8.466416e-09 2.577632e-05 10.316169 > 16233 79608_at 1.774024 5.678042 9.073712 8.822506e-09 2.577632e-05 10.277501 > 2040 1232_at 2.911866 4.211083 9.053443 9.167473e-09 2.577632e-05 10.241491 > 17108 83478_at 1.522142 6.045796 8.750724 1.635842e-08 4.024580e-05 9.696605 > 1855 1178_at 3.134552 8.481612 8.619180 2.111666e-08 4.304048e-05 9.455663 > 6733 2766_at -2.157836 5.223548 -8.568223 2.332607e-08 4.304048e-05 9.361647 >> > > Thus, although it is not required, would it be recommended to include for the 2nd set of contrasts the paired structure of the data in the design? > I would argue to do so (since intuitively I feel it would be good to always include as much info as possible on the correlation structure of the data), but as said not everyone in the project team agrees with me. > > So any opinions/comments are much appreciated. > > Regards, > Guido > > >> sessionInfo() > R version 2.15.1 (2012-06-22) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C LC_TIME=English_United States.1252 > > attached base packages: > [1] splines stats graphics grDevices utils datasets methods base > > other attached packages: > [1] hugene11stv1hsentrezg.db_15.1.0 org.Hs.eg.db_2.7.1 RSQLite_0.11.1 DBI_0.2-5 hugene11stv1hsentrezgcdf_15.1.0 AnnotationDbi_1.18.3 > [7] qvalue_1.30.0 multtest_2.12.0 affy_1.34.0 limma_3.12.3 pamr_1.54 survival_2.36-14 > [13] cluster_1.14.2 bladderbatch_1.0.3 Biobase_2.16.0 BiocGenerics_0.2.0 sva_3.2.1 mgcv_1.7-20 > [19] corpcor_1.6.4 BiocInstaller_1.4.7 > > loaded via a namespace (and not attached): > [1] affyio_1.24.0 grid_2.15.1 IRanges_1.14.4 lattice_0.20-10 MASS_7.3-21 Matrix_1.0-9 nlme_3.1-104 preprocessCore_1.18.0 > [9] stats4_2.15.1 tcltk_2.15.1 tools_2.15.1 zlibbioc_1.2.0 >> > > --------------------------------------------------------- > Guido Hooiveld, PhD > Nutrition, Metabolism & Genomics Group > Division of Human Nutrition > Wageningen University > Biotechnion, Bomenweg 2 > NL-6703 HD Wageningen > the Netherlands > tel: (+)31 317 485788 > fax: (+)31 317 483342 > email: guido.hooiveld at wur.nl > internet: http://nutrigene.4t.com > http://scholar.google.com/citations?user=qFHaMnoAAAAJ > http://www.researcherid.com/rid/F-4912-2010 > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT
0
Entering edit mode
Dear Gordon, Thanks very much for your suggestion; after re-analyzing the dataset according the Multilevel Design using duplicateCorrelation the results of the young vs old comparisons are in line with what I expected (and made much more sense as compared to including sibship in the design). However, please allow me to ask two additional questions that arose: - When I would ONLY be interested in the treatment effect (and not in difference between young vs old), I would better use the paired (sibship) design? Since one of my targets is ALSO to find out the between-subject differences, a joint analysis is recommended using duplicateCorrelation. However, this approach is (slightly) less powerful to detect treatment effects because only a robust average correlation is taken into account. Is my reasoning OK? - If so, is it somehow possible to take into account for each gene its individual estimated inter-duplicate correlation when calling the lmfit function? Or would this not increase the power? Thanks, Guido -----Original Message----- From: Gordon K Smyth [mailto:smyth@wehi.EDU.AU] Sent: Wednesday, September 19, 2012 01:57 To: Hooiveld, Guido Cc: Bioconductor mailing list Subject: limma: is it best to always include paired structure (sibship) in design? Dear Guido, There actually is no valid way to compare young vs old in your experiment while including Sibship in the design matrix. You would see this if you reverse the order of the terms in your model, forming the model ~Sibship+TC instead of ~TC+Sibship. You would find that the young vs old comparisons are all NA. This doesn't mean that the within sibship correlations shouldn't be taken into account. Your experiment is an example of a multilevel design with two levels of variability, one or families and one for animals within a family. See Section 8.7 "Multilevel models" in the limma User's Guide for how to handle this: http://bioconductor.org/packages/2.11/bioc/vignettes/limma/inst/doc/us ersguide.pdf Best wishes Gordon > Date: Mon, 17 Sep 2012 21:50:03 +0000 > From: "Hooiveld, Guido" <guido.hooiveld at="" wur.nl=""> > To: "bioconductor (bioconductor at stat.math.ethz.ch)" > <bioconductor at="" stat.math.ethz.ch=""> > Subject: [BioC] limma: is it best to always include paired structure > (sibship) in design? > > Hi, > I am doing an analysis on a dataset, and have a question on whether or > not to include the paired structure (sibship) in the model fitted by > limma if this is not explicitly needed. We discussed this internally, > but didn't get consensus. Hence, I would like to ask for opinions on > this list... :) > > Let me explain: > - Affymetrix data, RMA normalized. > - samples from 20 subjects were analyzed on arrays, obtained from 10 'young' and 10 'old' individuals. > - samples were taken at baseline, and after treatment with a drug (WY). > > Part of target file: >> targets > > Filename Treatment Sibship Category > 1 G068_B07_05_05_CTRL.CEL Ctrl 5 old > 2 G068_B09_06_06_CTRL.CEL Ctrl 6 old > 3 G068_C05_07_07_CTRL.CEL Ctrl 7 old > 4 G068_C09_09_09_CTRL_2.CEL Ctrl 9 young > 5 G068_D05_10_10_CTRL.CEL Ctrl 10 young > 6 G068_D07_11_11_CTRL.CEL Ctrl 11 young > 7 G068_F07_17_05_WY.CEL WY 5 old > 8 G068_F09_18_06_WY.CEL WY 6 old > 9 G068_G05_19_07_WY.CEL WY 7 old > 10 G068_G09_21_09_WY.CEL WY 9 young > 11 G068_H05_22_10_WY.CEL WY 10 young > 12 G068_H07_23_11_WY.CEL WY 11 young > > It is obvious that for the treatment effect a paired analysis should > be performed (using sibship info). This could be done for the whole > group, or for young and old separately. > >> TC <- as.factor(paste(targets$Treatment, targets$Category, sep=".")) >> design <- model.matrix(~0+TC+Sibship) >> >> fit <- lmFit(x.norm, design) > Coefficients not estimable: Sibship8 Sibship12 Warning message: > Partial NA coefficients for 19682 probe(s) >> >> #test for effect of WY in old and young cont.matrix <- makeContrasts( > + WYold=(TCWY.old-TCCtrl.old), > + WYyoung=(TCWY.young-TCCtrl.young), > + levels=design) >> >> fit2 <- contrasts.fit(fit, cont.matrix) >> fit2 <- eBayes(fit2) > > > If I would like to identify the probesets that are differentially > expressed between old and young under either control or treatment > conditions, I am essentially performing an unpaired t-test. Hence, > info on sibship is thus not required. > >> cont.matrix <- makeContrasts( > + ctrlold_ctrlyoung=(TCCtrl.old-TCCtrl.young), > + WYold_WYyoung=(TCWY.old-TCWY.young), > + levels=design) >> > > However, I noticed that the results of these 2nd set of comparisons (the old vs young) are strongly affected by including or not the sibship in the design. In other words, if I define this design: >> design <- model.matrix(~0+TC+Sibship) > I get a completely different set of top regulated probesets for the before mentioned contrasts when compared to this design (without sibship): >> design <- model.matrix(~0+TC) > I noticed that also the p-values are much smaller when including the sibship. > > As an example, > WITH sibship: >> topTable(fit2,coef=1) > ID logFC AveExpr t P.Value adj.P.Val B > 15031 671_at -4.763308 5.591752 -51.58016 4.457435e-18 6.583366e-14 29.06475 > 9938 4317_at -5.545104 4.893897 -50.17288 6.689732e-18 6.583366e-14 28.82092 > 7454 2944_at -7.992487 5.861613 -43.89646 4.747228e-17 3.114498e-13 27.56297 > 14844 654433_at 5.136677 5.807219 40.89921 1.337134e-16 6.579367e-13 26.84567 > 1115 1088_at -4.762000 5.650997 -39.67501 2.085766e-16 8.210408e-13 26.52704 > 656 10321_at -6.458380 5.445312 -37.74764 4.319761e-16 1.417026e-12 25.99187 > 4162 1991_at -4.080680 6.171290 -36.76955 6.339135e-16 1.627014e-12 25.70344 > 3784 1669_at -6.130596 5.171480 -36.66315 6.613207e-16 1.627014e-12 25.67134 > 9557 4057_at -6.079963 5.789428 -32.16516 4.460724e-15 9.755107e-12 24.17063 > 2584 1359_at 3.730962 6.155636 30.49084 9.709508e-15 1.911025e-11 23.53108 >> > > WITHOUT sibship: >> topTable(fit2,coef=1) > ID logFC AveExpr t P.Value adj.P.Val B > 14844 654433_at 5.320186 5.807219 9.802842 2.297163e-09 2.577632e-05 11.534050 > 18751 9173_at 3.052422 4.715249 9.439677 4.453505e-09 2.577632e-05 10.917557 > 6220 2624_at 2.443030 5.771388 9.281647 5.970493e-09 2.577632e-05 10.643512 > 13773 59340_at 4.154059 4.967316 9.221019 6.686698e-09 2.577632e-05 10.537431 > 2584 1359_at 3.572039 6.155636 9.095515 8.466416e-09 2.577632e-05 10.316169 > 16233 79608_at 1.774024 5.678042 9.073712 8.822506e-09 2.577632e-05 10.277501 > 2040 1232_at 2.911866 4.211083 9.053443 9.167473e-09 2.577632e-05 10.241491 > 17108 83478_at 1.522142 6.045796 8.750724 1.635842e-08 4.024580e-05 9.696605 > 1855 1178_at 3.134552 8.481612 8.619180 2.111666e-08 4.304048e-05 9.455663 > 6733 2766_at -2.157836 5.223548 -8.568223 2.332607e-08 4.304048e-05 9.361647 >> > > Thus, although it is not required, would it be recommended to include for the 2nd set of contrasts the paired structure of the data in the design? > I would argue to do so (since intuitively I feel it would be good to always include as much info as possible on the correlation structure of the data), but as said not everyone in the project team agrees with me. > > So any opinions/comments are much appreciated. > > Regards, > Guido > > >> sessionInfo() > R version 2.15.1 (2012-06-22) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C LC_TIME=English_United States.1252 > > attached base packages: > [1] splines stats graphics grDevices utils datasets methods base > > other attached packages: > [1] hugene11stv1hsentrezg.db_15.1.0 org.Hs.eg.db_2.7.1 RSQLite_0.11.1 DBI_0.2-5 hugene11stv1hsentrezgcdf_15.1.0 AnnotationDbi_1.18.3 > [7] qvalue_1.30.0 multtest_2.12.0 affy_1.34.0 limma_3.12.3 pamr_1.54 survival_2.36-14 > [13] cluster_1.14.2 bladderbatch_1.0.3 Biobase_2.16.0 BiocGenerics_0.2.0 sva_3.2.1 mgcv_1.7-20 > [19] corpcor_1.6.4 BiocInstaller_1.4.7 > > loaded via a namespace (and not attached): > [1] affyio_1.24.0 grid_2.15.1 IRanges_1.14.4 lattice_0.20-10 MASS_7.3-21 Matrix_1.0-9 nlme_3.1-104 preprocessCore_1.18.0 > [9] stats4_2.15.1 tcltk_2.15.1 tools_2.15.1 zlibbioc_1.2.0 >> > > --------------------------------------------------------- > Guido Hooiveld, PhD > Nutrition, Metabolism & Genomics Group Division of Human Nutrition > Wageningen University Biotechnion, Bomenweg 2 > NL-6703 HD Wageningen > the Netherlands > tel: (+)31 317 485788 > fax: (+)31 317 483342 > email: guido.hooiveld at wur.nl > internet: http://nutrigene.4t.com > http://scholar.google.com/citations?user=qFHaMnoAAAAJ > http://www.researcherid.com/rid/F-4912-2010 > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:9}}
ADD REPLY
0
Entering edit mode
On Thu, 20 Sep 2012, Hooiveld, Guido wrote: > Dear Gordon, > Thanks very much for your suggestion; after re-analyzing the dataset > according the Multilevel Design using duplicateCorrelation the results > of the young vs old comparisons are in line with what I expected (and > made much more sense as compared to including sibship in the design). > > However, please allow me to ask two additional questions that arose: > - When I would ONLY be interested in the treatment effect (and not in > difference between young vs old), I would better use the paired > (sibship) design? Since one of my targets is ALSO to find out the > between-subject differences, a joint analysis is recommended using > duplicateCorrelation. However, this approach is (slightly) less powerful > to detect treatment effects because only a robust average correlation is > taken into account. Is my reasoning OK? No. Neither method is necessarily more powerful than the other, indeed the duplicateCorrelation approach will usually give more DE genes. The paired analysis makes fewer assumptions however and is therefore preferable. Note that the paired analysis does not estimate a correlation. Rather, it ignores the between sib information entirely. > - If so, is it somehow possible to take into account for each gene its > individual estimated inter-duplicate correlation when calling the lmfit > function? Or would this not increase the power? No, it would greatly decrease power rather than increasing it. Best wishes Gordon > Thanks, > Guido > > -----Original Message----- > From: Gordon K Smyth [mailto:smyth at wehi.EDU.AU] > Sent: Wednesday, September 19, 2012 01:57 > To: Hooiveld, Guido > Cc: Bioconductor mailing list > Subject: limma: is it best to always include paired structure (sibship) in design? > > Dear Guido, > > There actually is no valid way to compare young vs old in your experiment while including Sibship in the design matrix. You would see this if you reverse the order of the terms in your model, forming the model ~Sibship+TC instead of ~TC+Sibship. You would find that the young vs old comparisons are all NA. > > This doesn't mean that the within sibship correlations shouldn't be taken into account. Your experiment is an example of a multilevel design with two levels of variability, one or families and one for animals within a family. See Section 8.7 "Multilevel models" in the limma User's Guide for how to handle this: > > http://bioconductor.org/packages/2.11/bioc/vignettes/limma/inst/doc/ usersguide.pdf > > Best wishes > Gordon > >> Date: Mon, 17 Sep 2012 21:50:03 +0000 >> From: "Hooiveld, Guido" <guido.hooiveld at="" wur.nl=""> >> To: "bioconductor (bioconductor at stat.math.ethz.ch)" >> <bioconductor at="" stat.math.ethz.ch=""> >> Subject: [BioC] limma: is it best to always include paired structure >> (sibship) in design? >> >> Hi, > >> I am doing an analysis on a dataset, and have a question on whether or >> not to include the paired structure (sibship) in the model fitted by >> limma if this is not explicitly needed. We discussed this internally, >> but didn't get consensus. Hence, I would like to ask for opinions on >> this list... :) >> >> Let me explain: >> - Affymetrix data, RMA normalized. >> - samples from 20 subjects were analyzed on arrays, obtained from 10 'young' and 10 'old' individuals. >> - samples were taken at baseline, and after treatment with a drug (WY). >> >> Part of target file: >>> targets >> >> Filename Treatment Sibship Category >> 1 G068_B07_05_05_CTRL.CEL Ctrl 5 old >> 2 G068_B09_06_06_CTRL.CEL Ctrl 6 old >> 3 G068_C05_07_07_CTRL.CEL Ctrl 7 old >> 4 G068_C09_09_09_CTRL_2.CEL Ctrl 9 young >> 5 G068_D05_10_10_CTRL.CEL Ctrl 10 young >> 6 G068_D07_11_11_CTRL.CEL Ctrl 11 young >> 7 G068_F07_17_05_WY.CEL WY 5 old >> 8 G068_F09_18_06_WY.CEL WY 6 old >> 9 G068_G05_19_07_WY.CEL WY 7 old >> 10 G068_G09_21_09_WY.CEL WY 9 young >> 11 G068_H05_22_10_WY.CEL WY 10 young >> 12 G068_H07_23_11_WY.CEL WY 11 young >> >> It is obvious that for the treatment effect a paired analysis should >> be performed (using sibship info). This could be done for the whole >> group, or for young and old separately. >> >>> TC <- as.factor(paste(targets$Treatment, targets$Category, sep=".")) >>> design <- model.matrix(~0+TC+Sibship) >>> >>> fit <- lmFit(x.norm, design) >> Coefficients not estimable: Sibship8 Sibship12 Warning message: >> Partial NA coefficients for 19682 probe(s) >>> >>> #test for effect of WY in old and young cont.matrix <- makeContrasts( >> + WYold=(TCWY.old-TCCtrl.old), >> + WYyoung=(TCWY.young-TCCtrl.young), >> + levels=design) >>> >>> fit2 <- contrasts.fit(fit, cont.matrix) >>> fit2 <- eBayes(fit2) >> >> >> If I would like to identify the probesets that are differentially >> expressed between old and young under either control or treatment >> conditions, I am essentially performing an unpaired t-test. Hence, >> info on sibship is thus not required. >> >>> cont.matrix <- makeContrasts( >> + ctrlold_ctrlyoung=(TCCtrl.old-TCCtrl.young), >> + WYold_WYyoung=(TCWY.old-TCWY.young), >> + levels=design) >>> >> >> However, I noticed that the results of these 2nd set of comparisons (the old vs young) are strongly affected by including or not the sibship in the design. In other words, if I define this design: >>> design <- model.matrix(~0+TC+Sibship) >> I get a completely different set of top regulated probesets for the before mentioned contrasts when compared to this design (without sibship): >>> design <- model.matrix(~0+TC) >> I noticed that also the p-values are much smaller when including the sibship. >> >> As an example, >> WITH sibship: >>> topTable(fit2,coef=1) >> ID logFC AveExpr t P.Value adj.P.Val B >> 15031 671_at -4.763308 5.591752 -51.58016 4.457435e-18 6.583366e-14 29.06475 >> 9938 4317_at -5.545104 4.893897 -50.17288 6.689732e-18 6.583366e-14 28.82092 >> 7454 2944_at -7.992487 5.861613 -43.89646 4.747228e-17 3.114498e-13 27.56297 >> 14844 654433_at 5.136677 5.807219 40.89921 1.337134e-16 6.579367e-13 26.84567 >> 1115 1088_at -4.762000 5.650997 -39.67501 2.085766e-16 8.210408e-13 26.52704 >> 656 10321_at -6.458380 5.445312 -37.74764 4.319761e-16 1.417026e-12 25.99187 >> 4162 1991_at -4.080680 6.171290 -36.76955 6.339135e-16 1.627014e-12 25.70344 >> 3784 1669_at -6.130596 5.171480 -36.66315 6.613207e-16 1.627014e-12 25.67134 >> 9557 4057_at -6.079963 5.789428 -32.16516 4.460724e-15 9.755107e-12 24.17063 >> 2584 1359_at 3.730962 6.155636 30.49084 9.709508e-15 1.911025e-11 23.53108 >>> >> >> WITHOUT sibship: >>> topTable(fit2,coef=1) >> ID logFC AveExpr t P.Value adj.P.Val B >> 14844 654433_at 5.320186 5.807219 9.802842 2.297163e-09 2.577632e-05 11.534050 >> 18751 9173_at 3.052422 4.715249 9.439677 4.453505e-09 2.577632e-05 10.917557 >> 6220 2624_at 2.443030 5.771388 9.281647 5.970493e-09 2.577632e-05 10.643512 >> 13773 59340_at 4.154059 4.967316 9.221019 6.686698e-09 2.577632e-05 10.537431 >> 2584 1359_at 3.572039 6.155636 9.095515 8.466416e-09 2.577632e-05 10.316169 >> 16233 79608_at 1.774024 5.678042 9.073712 8.822506e-09 2.577632e-05 10.277501 >> 2040 1232_at 2.911866 4.211083 9.053443 9.167473e-09 2.577632e-05 10.241491 >> 17108 83478_at 1.522142 6.045796 8.750724 1.635842e-08 4.024580e-05 9.696605 >> 1855 1178_at 3.134552 8.481612 8.619180 2.111666e-08 4.304048e-05 9.455663 >> 6733 2766_at -2.157836 5.223548 -8.568223 2.332607e-08 4.304048e-05 9.361647 >>> >> >> Thus, although it is not required, would it be recommended to include for the 2nd set of contrasts the paired structure of the data in the design? >> I would argue to do so (since intuitively I feel it would be good to always include as much info as possible on the correlation structure of the data), but as said not everyone in the project team agrees with me. >> >> So any opinions/comments are much appreciated. >> >> Regards, >> Guido >> >> >>> sessionInfo() >> R version 2.15.1 (2012-06-22) >> Platform: i386-pc-mingw32/i386 (32-bit) >> >> locale: >> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C LC_TIME=English_United States.1252 >> >> attached base packages: >> [1] splines stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] hugene11stv1hsentrezg.db_15.1.0 org.Hs.eg.db_2.7.1 RSQLite_0.11.1 DBI_0.2-5 hugene11stv1hsentrezgcdf_15.1.0 AnnotationDbi_1.18.3 >> [7] qvalue_1.30.0 multtest_2.12.0 affy_1.34.0 limma_3.12.3 pamr_1.54 survival_2.36-14 >> [13] cluster_1.14.2 bladderbatch_1.0.3 Biobase_2.16.0 BiocGenerics_0.2.0 sva_3.2.1 mgcv_1.7-20 >> [19] corpcor_1.6.4 BiocInstaller_1.4.7 >> >> loaded via a namespace (and not attached): >> [1] affyio_1.24.0 grid_2.15.1 IRanges_1.14.4 lattice_0.20-10 MASS_7.3-21 Matrix_1.0-9 nlme_3.1-104 preprocessCore_1.18.0 >> [9] stats4_2.15.1 tcltk_2.15.1 tools_2.15.1 zlibbioc_1.2.0 >>> >> >> --------------------------------------------------------- >> Guido Hooiveld, PhD >> Nutrition, Metabolism & Genomics Group Division of Human Nutrition >> Wageningen University Biotechnion, Bomenweg 2 >> NL-6703 HD Wageningen >> the Netherlands >> tel: (+)31 317 485788 >> fax: (+)31 317 483342 >> email: guido.hooiveld at wur.nl >> internet: http://nutrigene.4t.com >> http://scholar.google.com/citations?user=qFHaMnoAAAAJ >> http://www.researcherid.com/rid/F-4912-2010 >> > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:14}}
ADD REPLY

Login before adding your answer.

Traffic: 747 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6