Limma : Single Channel experiment design matrix

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 11.2 years ago

Dear All, I have a question regarding the way to analyse single channel experiment (several groups). In a first approach, I followed the limma user's guide for several groups (chapter 9.3), and used a contrast matrix to make the comparison between two groups among all groups. I also followed another approach : I take a sub expression set with only the two groups of samples I need to compare, and then follow the two groups approach (chapter 9.2) If fold change remains the same, the p.value of moderated t-test is different : for the "chapter 9.3" I get this (topTable): logFC AveExpr t P.Value adj.P.Val B NM_013409 4.804450 9.351186 63.46856 5.198462e-32 2.225306e-27 60.42083 NM_170685 3.327586 7.476924 43.29198 2.292074e-27 4.102931e-23 51.64301 NM_021995 3.598441 8.731876 42.94068 2.875416e-27 4.102931e-23 51.44328 NM_000014 2.686684 11.968353 38.61755 5.481149e-26 4.817512e-22 48.80565 NM_001747 2.727227 8.834094 38.33543 6.716748e-26 4.817512e-22 48.62109 for the "chapter 9.2", I get this topTable : logFC AveExpr t P.Value adj.P.Val B NM_013409 4.804450 10.238329 70.14768 7.077519e-15 2.709195e-10 23.07593 NM_015464 3.868533 9.850459 66.20398 1.265772e-14 2.709195e-10 22.72371 NM_000119 -3.322662 11.608264 -61.31983 2.733108e-14 3.899871e-10 22.22951 BC025320 2.908061 7.112412 56.61705 6.089619e-14 6.516958e-10 21.68233 NM_000014 2.686684 11.682645 53.85715 1.005598e-13 8.609327e-10 21.32326 NM_170685 3.327586 7.826983 51.22412 1.662803e-13 1.086579e-09 20.95091 Of course, logFC remains the same, Avg Expression are obviously differents, but the p.value are differents. So I was wondering why ? and wich is the best approach to choose since one give results with more statistical power ? Thank you for your kind answers. Koran -- output of sessionInfo(): R version 3.0.2 (2013-09-25) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] RColorBrewer_1.0-5 R.basic_0.53.0 R.utils_1.29.8 R.oo_1.18.0 R.methodsS3_1.6.1 [6] plotrix_3.5-3 multicore_0.1-7 pvclust_1.2-2 arrayQualityMetrics_3.18.0 impute_1.36.0 [11] marray_1.40.0 limma_3.18.13 fortunes_1.5-2 snowfall_1.84-6 snow_0.3-13 loaded via a namespace (and not attached): [1] affy_1.40.0 affyio_1.30.0 affyPLM_1.38.0 annotate_1.40.1 AnnotationDbi_1.24.0 beadarray_2.12.0 [7] BeadDataPackR_1.14.0 Biobase_2.22.0 BiocGenerics_0.8.0 BiocInstaller_1.12.0 Biostrings_2.30.1 Cairo_1.5-5 [13] cluster_1.14.4 colorspace_1.2-4 DBI_0.2-7 Formula_1.1-1 gcrma_2.34.0 genefilter_1.44.0 [19] grid_3.0.2 Hmisc_3.14-2 hwriter_1.3 IRanges_1.20.6 KernSmooth_2.23-10 lattice_0.20-27 [25] latticeExtra_0.6-26 parallel_3.0.2 plyr_1.8.1 preprocessCore_1.24.0 Rcpp_0.11.0 reshape2_1.2.2 [31] RSQLite_0.11.4 setRNG_2011.11-2 splines_3.0.2 stats4_3.0.2 stringr_0.6.2 survival_2.37-7 [37] SVGAnnotation_0.93-1 tools_3.0.2 vsn_3.30.0 XML_3.95-0.2 xtable_1.7-1 XVector_0.2.0 [43] zlibbioc_1.8.0 -- Sent via the guest posting facility at bioconductor.org.

limma limma • 671 views

ADD COMMENT • link updated 11.7 years ago by James W. MacDonald 68k • written 11.7 years ago by Guest User ★ 13k

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 2 days ago

United States

Hi Koran, On 3/7/2014 3:49 AM, Koran [guest] wrote: > Dear All, > > I have a question regarding the way to analyse single channel experiment (several groups). > > In a first approach, I followed the limma user's guide for several groups (chapter 9.3), and used a contrast > matrix to make the comparison between two groups among all groups. > > I also followed another approach : I take a sub expression set with only the two groups of samples I need to compare, and then follow the two groups approach (chapter 9.2) > > If fold change remains the same, the p.value of moderated t-test is different : > > for the "chapter 9.3" I get this (topTable): > logFC AveExpr t P.Value adj.P.Val B > NM_013409 4.804450 9.351186 63.46856 5.198462e-32 2.225306e-27 60.42083 > NM_170685 3.327586 7.476924 43.29198 2.292074e-27 4.102931e-23 51.64301 > NM_021995 3.598441 8.731876 42.94068 2.875416e-27 4.102931e-23 51.44328 > NM_000014 2.686684 11.968353 38.61755 5.481149e-26 4.817512e-22 48.80565 > NM_001747 2.727227 8.834094 38.33543 6.716748e-26 4.817512e-22 48.62109 > > for the "chapter 9.2", I get this topTable : > logFC AveExpr t P.Value adj.P.Val B > NM_013409 4.804450 10.238329 70.14768 7.077519e-15 2.709195e-10 23.07593 > NM_015464 3.868533 9.850459 66.20398 1.265772e-14 2.709195e-10 22.72371 > NM_000119 -3.322662 11.608264 -61.31983 2.733108e-14 3.899871e-10 22.22951 > BC025320 2.908061 7.112412 56.61705 6.089619e-14 6.516958e-10 21.68233 > NM_000014 2.686684 11.682645 53.85715 1.005598e-13 8.609327e-10 21.32326 > NM_170685 3.327586 7.826983 51.22412 1.662803e-13 1.086579e-09 20.95091 > > > Of course, logFC remains the same, Avg Expression are obviously differents, but the p.value are differents. > So I was wondering why ? and wich is the best approach to choose since one give results with more statistical power ? The difference between the two models has to do primarily with the measure of intra-group variability, which is used to construct the denominator of your t-statistic. This measure is a pooled estimate, based on all samples in the model. All else equal, increasing the number of samples used to estimate variance tends to make the estimate smaller (and arguably more accurate). Since you are thus shrinking your denominator, the statistic gets larger and you get smaller p-values. As a general rule I would think fitting the first model would be the preferred way to go. Best, Jim > > Thank you for your kind answers. > > Koran > > > > > > > > > > > > > -- output of sessionInfo(): > > R version 3.0.2 (2013-09-25) > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices datasets utils methods base > > other attached packages: > [1] RColorBrewer_1.0-5 R.basic_0.53.0 R.utils_1.29.8 R.oo_1.18.0 R.methodsS3_1.6.1 > [6] plotrix_3.5-3 multicore_0.1-7 pvclust_1.2-2 arrayQualityMetrics_3.18.0 impute_1.36.0 > [11] marray_1.40.0 limma_3.18.13 fortunes_1.5-2 snowfall_1.84-6 snow_0.3-13 > > loaded via a namespace (and not attached): > [1] affy_1.40.0 affyio_1.30.0 affyPLM_1.38.0 annotate_1.40.1 AnnotationDbi_1.24.0 beadarray_2.12.0 > [7] BeadDataPackR_1.14.0 Biobase_2.22.0 BiocGenerics_0.8.0 BiocInstaller_1.12.0 Biostrings_2.30.1 Cairo_1.5-5 > [13] cluster_1.14.4 colorspace_1.2-4 DBI_0.2-7 Formula_1.1-1 gcrma_2.34.0 genefilter_1.44.0 > [19] grid_3.0.2 Hmisc_3.14-2 hwriter_1.3 IRanges_1.20.6 KernSmooth_2.23-10 lattice_0.20-27 > [25] latticeExtra_0.6-26 parallel_3.0.2 plyr_1.8.1 preprocessCore_1.24.0 Rcpp_0.11.0 reshape2_1.2.2 > [31] RSQLite_0.11.4 setRNG_2011.11-2 splines_3.0.2 stats4_3.0.2 stringr_0.6.2 survival_2.37-7 > [37] SVGAnnotation_0.93-1 tools_3.0.2 vsn_3.30.0 XML_3.95-0.2 xtable_1.7-1 XVector_0.2.0 > [43] zlibbioc_1.8.0 > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 11.7 years ago James W. MacDonald 68k

Login before adding your answer.