batch effects 450K

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.6 years ago

Dear All, I have Infinium 450K data for 56 breast cancer tumors. As a first analysis I wanted to do a clustering and see the distribution of the samples. For this I used the minfi package. Unfortunately, the assays were done in 2 batches and there is a clear batch effect. I looked into Combat and SVA to remove the batch effect. As far as I understand, to use these approaches I need to have a phenotype/variable of interest. In the tutorial ("The SVA package for removing batch effects and other unwanted variation in high-throughput experiments ??? Modified: October 24, 2011 Compiled: April 25, 2012") the variable of interest is cancer status. However, I do not have normals. Does anyone have suggestions on how I should tackle these batch effects? Many thanks in advance and all the best! Femke -- output of sessionInfo(): R version 2.15.0 (2012-03-30) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] C attached base packages: [1] grid stats graphics grDevices utils datasets methods [8] base other attached packages: [1] bladderbatch_1.0.3 [2] sva_3.2.1 [3] mgcv_1.7-17 [4] corpcor_1.6.3 [5] IlluminaHumanMethylation450kmanifest_0.2.1 [6] gplots_2.10.1 [7] KernSmooth_2.23-7 [8] caTools_1.13 [9] bitops_1.0-4.1 [10] gdata_2.8.2 [11] gtools_2.6.2 [12] minfi_1.2.0 [13] GenomicRanges_1.8.6 [14] IRanges_1.14.3 [15] reshape_0.8.4 [16] plyr_1.7.1 [17] lattice_0.20-6 [18] Biobase_2.16.0 [19] BiocGenerics_0.2.0 loaded via a namespace (and not attached): [1] AnnotationDbi_1.18.1 BiocInstaller_1.4.4 Biostrings_2.24.1 [4] DBI_0.2-5 MASS_7.3-18 Matrix_1.0-6 [7] R.methodsS3_1.2.2 RColorBrewer_1.0-5 RSQLite_0.11.1 [10] affyio_1.24.0 annotate_1.34.0 beanplot_1.1 [13] bit_1.1-8 codetools_0.2-8 crlmm_1.14.0 [16] ellipse_0.3-7 ff_2.2-7 foreach_1.4.0 [19] genefilter_1.38.0 iterators_1.0.6 limma_3.12.0 [22] matrixStats_0.5.0 mclust_3.4.11 multtest_2.12.0 [25] mvtnorm_0.9-9992 nlme_3.1-104 nor1mix_1.1-3 [28] oligoClasses_1.18.0 preprocessCore_1.18.0 siggenes_1.30.0 [31] splines_2.15.0 stats4_2.15.0 survival_2.36-14 [34] xtable_1.7-0 zlibbioc_1.2.0 -- Sent via the guest posting facility at bioconductor.org.

Clustering Cancer Breast minfi sva Clustering Cancer Breast minfi sva • 2.2k views

ADD COMMENT • link updated 11.9 years ago by Brent Pedersen ▴ 110 • written 11.9 years ago by Guest User ★ 13k

0

Entering edit mode

Brent Pedersen ▴ 110

@brent-pedersen-4815

Last seen 9.4 years ago

United States

On Fri, Jun 8, 2012 at 8:44 AM, Femke [guest] <guest at="" bioconductor.org=""> wrote: > > Dear All, > > I have Infinium 450K data for 56 breast cancer tumors. As a first analysis I wanted to do a clustering and see the distribution of the samples. For this I used the minfi package. Unfortunately, the assays were done in 2 batches and there is a clear batch effect. I looked into Combat and SVA to remove the batch effect. As far as I understand, to use these approaches I need to have a phenotype/variable of interest. In the tutorial ("The SVA package for removing batch effects and other unwanted variation in high-throughput experiments ??? Modified: October 24, 2011 Compiled: April 25, 2012") the variable of interest is cancer status. However, I do not have normals. Does anyone have suggestions on how I should tackle these batch effects? > > Many thanks in advance and all the best! > > Femke > > > ?-- output of sessionInfo(): > > R version 2.15.0 (2012-03-30) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] C > > attached base packages: > [1] grid ? ? ?stats ? ? graphics ?grDevices utils ? ? datasets ?methods > [8] base > > other attached packages: > ?[1] bladderbatch_1.0.3 > ?[2] sva_3.2.1 > ?[3] mgcv_1.7-17 > ?[4] corpcor_1.6.3 > ?[5] IlluminaHumanMethylation450kmanifest_0.2.1 > ?[6] gplots_2.10.1 > ?[7] KernSmooth_2.23-7 > ?[8] caTools_1.13 > ?[9] bitops_1.0-4.1 > [10] gdata_2.8.2 > [11] gtools_2.6.2 > [12] minfi_1.2.0 > [13] GenomicRanges_1.8.6 > [14] IRanges_1.14.3 > [15] reshape_0.8.4 > [16] plyr_1.7.1 > [17] lattice_0.20-6 > [18] Biobase_2.16.0 > [19] BiocGenerics_0.2.0 > > loaded via a namespace (and not attached): > ?[1] AnnotationDbi_1.18.1 ?BiocInstaller_1.4.4 ? Biostrings_2.24.1 > ?[4] DBI_0.2-5 ? ? ? ? ? ? MASS_7.3-18 ? ? ? ? ? Matrix_1.0-6 > ?[7] R.methodsS3_1.2.2 ? ? RColorBrewer_1.0-5 ? ?RSQLite_0.11.1 > [10] affyio_1.24.0 ? ? ? ? annotate_1.34.0 ? ? ? beanplot_1.1 > [13] bit_1.1-8 ? ? ? ? ? ? codetools_0.2-8 ? ? ? crlmm_1.14.0 > [16] ellipse_0.3-7 ? ? ? ? ff_2.2-7 ? ? ? ? ? ? ?foreach_1.4.0 > [19] genefilter_1.38.0 ? ? iterators_1.0.6 ? ? ? limma_3.12.0 > [22] matrixStats_0.5.0 ? ? mclust_3.4.11 ? ? ? ? multtest_2.12.0 > [25] mvtnorm_0.9-9992 ? ? ?nlme_3.1-104 ? ? ? ? ?nor1mix_1.1-3 > [28] oligoClasses_1.18.0 ? preprocessCore_1.18.0 siggenes_1.30.0 > [31] splines_2.15.0 ? ? ? ?stats4_2.15.0 ? ? ? ? survival_2.36-14 > [34] xtable_1.7-0 ? ? ? ? ?zlibbioc_1.2.0 > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor Since the batch is known, why not just include it in your model and run with limma or lm()? But what's your study-design if you don't have controls?

ADD COMMENT • link 11.9 years ago Brent Pedersen ▴ 110

0

Entering edit mode

the study design is case-only the goal appears to be subtype discovery (or perhaps the clustering was just a QC step for batch effect detection) limma might also work, though the distribution of the variance in methylation data is far from uniform even after transformation. the experimenter might prefer to use ComBat because it tends to work well when there is a clear batch effect and a smallish sample size. question for the experimenter -- did you normalize and background correct your samples? Because if not you're going to be dealing with both dye bias and background effects which tend to differ across samples. there is a third option for "doubly unsupervised" normalization but I will leave it up to the author of the method to describe it if he wishes. On Fri, Jun 8, 2012 at 10:58 AM, Brent Pedersen <bpederse@gmail.com> wrote: > On Fri, Jun 8, 2012 at 8:44 AM, Femke [guest] <guest@bioconductor.org> > wrote: > > > > Dear All, > > > > I have Infinium 450K data for 56 breast cancer tumors. As a first > analysis I wanted to do a clustering and see the distribution of the > samples. For this I used the minfi package. Unfortunately, the assays were > done in 2 batches and there is a clear batch effect. I looked into Combat > and SVA to remove the batch effect. As far as I understand, to use these > approaches I need to have a phenotype/variable of interest. In the tutorial > ("The SVA package for removing batch effects and other unwanted variation > in high-throughput experiments â¦ Modified: October 24, 2011 Compiled: > April 25, 2012") the variable of interest is cancer status. However, I do > not have normals. Does anyone have suggestions on how I should tackle these > batch effects? > > > > Many thanks in advance and all the best! > > > > Femke > > > > > > -- output of sessionInfo(): > > > > R version 2.15.0 (2012-03-30) > > Platform: x86_64-pc-linux-gnu (64-bit) > > > > locale: > > [1] C > > > > attached base packages: > > [1] grid stats graphics grDevices utils datasets methods > > [8] base > > > > other attached packages: > > [1] bladderbatch_1.0.3 > > [2] sva_3.2.1 > > [3] mgcv_1.7-17 > > [4] corpcor_1.6.3 > > [5] IlluminaHumanMethylation450kmanifest_0.2.1 > > [6] gplots_2.10.1 > > [7] KernSmooth_2.23-7 > > [8] caTools_1.13 > > [9] bitops_1.0-4.1 > > [10] gdata_2.8.2 > > [11] gtools_2.6.2 > > [12] minfi_1.2.0 > > [13] GenomicRanges_1.8.6 > > [14] IRanges_1.14.3 > > [15] reshape_0.8.4 > > [16] plyr_1.7.1 > > [17] lattice_0.20-6 > > [18] Biobase_2.16.0 > > [19] BiocGenerics_0.2.0 > > > > loaded via a namespace (and not attached): > > [1] AnnotationDbi_1.18.1 BiocInstaller_1.4.4 Biostrings_2.24.1 > > [4] DBI_0.2-5 MASS_7.3-18 Matrix_1.0-6 > > [7] R.methodsS3_1.2.2 RColorBrewer_1.0-5 RSQLite_0.11.1 > > [10] affyio_1.24.0 annotate_1.34.0 beanplot_1.1 > > [13] bit_1.1-8 codetools_0.2-8 crlmm_1.14.0 > > [16] ellipse_0.3-7 ff_2.2-7 foreach_1.4.0 > > [19] genefilter_1.38.0 iterators_1.0.6 limma_3.12.0 > > [22] matrixStats_0.5.0 mclust_3.4.11 multtest_2.12.0 > > [25] mvtnorm_0.9-9992 nlme_3.1-104 nor1mix_1.1-3 > > [28] oligoClasses_1.18.0 preprocessCore_1.18.0 siggenes_1.30.0 > > [31] splines_2.15.0 stats4_2.15.0 survival_2.36-14 > > [34] xtable_1.7-0 zlibbioc_1.2.0 > > > > > > -- > > Sent via the guest posting facility at bioconductor.org. > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > Since the batch is known, why not just include it in your model and > run with limma or lm()? > But what's your study-design if you don't have controls? > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]

ADD REPLY • link 11.9 years ago Tim Triche ★ 4.2k

0

Entering edit mode

Andrew Teschendorff ▴ 60

@andrew-teschendorff-4903

Last seen 5.1 years ago

Hi Femke, For COMBAT you do not need to specify a phenotype of interest. Read the original paper presenting COMBAT. rgds A ********************************************************************** ********************************************************************** *** Andrew E Teschendorff PhD Heller Research Fellow Statistical Cancer Genomics Paul O'Gorman Building UCL Cancer Institute University College London 72 Huntley Street London WC1E 6BT, UK. Tel: +44 (0)20 7679 0727 Mob: +44 (0)7876 561263 Email: a.teschendorff at ucl.ac.uk http://www.ucl.ac.uk/cancer/rescancerbiol/statisticalgenomics ********************************************************************** ********************************************************************** ________________________________________ From: bioconductor-bounces@r-project.org [bioconductor- bounces@r-project.org] on behalf of Femke [guest] [guest@bioconductor.org] Sent: 08 June 2012 15:44 To: bioconductor at r-project.org; f.simmer at ncmls.ru.nl Subject: [BioC] batch effects 450K Dear All, I have Infinium 450K data for 56 breast cancer tumors. As a first analysis I wanted to do a clustering and see the distribution of the samples. For this I used the minfi package. Unfortunately, the assays were done in 2 batches and there is a clear batch effect. I looked into Combat and SVA to remove the batch effect. As far as I understand, to use these approaches I need to have a phenotype/variable of interest. In the tutorial ("The SVA package for removing batch effects and other unwanted variation in high-throughput experiments ??? Modified: October 24, 2011 Compiled: April 25, 2012") the variable of interest is cancer status. However, I do not have normals. Does anyone have suggestions on how I should tackle these batch effects? Many thanks in advance and all the best! Femke -- output of sessionInfo(): R version 2.15.0 (2012-03-30) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] C attached base packages: [1] grid stats graphics grDevices utils datasets methods [8] base other attached packages: [1] bladderbatch_1.0.3 [2] sva_3.2.1 [3] mgcv_1.7-17 [4] corpcor_1.6.3 [5] IlluminaHumanMethylation450kmanifest_0.2.1 [6] gplots_2.10.1 [7] KernSmooth_2.23-7 [8] caTools_1.13 [9] bitops_1.0-4.1 [10] gdata_2.8.2 [11] gtools_2.6.2 [12] minfi_1.2.0 [13] GenomicRanges_1.8.6 [14] IRanges_1.14.3 [15] reshape_0.8.4 [16] plyr_1.7.1 [17] lattice_0.20-6 [18] Biobase_2.16.0 [19] BiocGenerics_0.2.0 loaded via a namespace (and not attached): [1] AnnotationDbi_1.18.1 BiocInstaller_1.4.4 Biostrings_2.24.1 [4] DBI_0.2-5 MASS_7.3-18 Matrix_1.0-6 [7] R.methodsS3_1.2.2 RColorBrewer_1.0-5 RSQLite_0.11.1 [10] affyio_1.24.0 annotate_1.34.0 beanplot_1.1 [13] bit_1.1-8 codetools_0.2-8 crlmm_1.14.0 [16] ellipse_0.3-7 ff_2.2-7 foreach_1.4.0 [19] genefilter_1.38.0 iterators_1.0.6 limma_3.12.0 [22] matrixStats_0.5.0 mclust_3.4.11 multtest_2.12.0 [25] mvtnorm_0.9-9992 nlme_3.1-104 nor1mix_1.1-3 [28] oligoClasses_1.18.0 preprocessCore_1.18.0 siggenes_1.30.0 [31] splines_2.15.0 stats4_2.15.0 survival_2.36-14 [34] xtable_1.7-0 zlibbioc_1.2.0 -- Sent via the guest posting facility at bioconductor.org. _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 11.9 years ago Andrew Teschendorff ▴ 60

Login before adding your answer.