sequential removeBatchEffect()?
1
0
Entering edit mode
Jenny Drnevich ★ 2.0k
@jenny-drnevich-2812
Last seen 15 days ago
United States
Hi Gordon and everyone, I was wondering what is the best way to remove two different batch effects from a set of sample values for plotting heatmaps? I have data from Illumina's MouseWG6 arrays, 24 samples on 4 BeadChips. There is the effect of BeadChip, which I'm accounting for in the model using duplicateCorrelation(), and also population batch effect, which I have as a term in the design matrix. I'd like to remove both of these effects from the individual sample data before making heatmaps of significant genes. Should I use removeBatchEffect() twice, first to remove the BeadChip effect (with population in the design) and then remove the population effect? Here's an example of my analysis code and what I'm thinking of doing to remove the two effect. Please let me know if this is alright, or if I should do something different: > design <- model.matrix(~0+ factor(targets$Sample_Group)) > colnames(design) <- levels(factor(targets$Sample_Group)) > design <- cbind(design,pop=as.numeric(targets$pop==1)) > design Cont.24 Cont.8 DHT.24 DHT.8 pop 1 0 1 0 0 1 2 0 1 0 0 0 3 0 1 0 0 0 4 0 1 0 0 1 5 0 1 0 0 1 6 0 1 0 0 0 7 0 0 0 1 1 8 0 0 0 1 0 9 0 0 0 1 0 10 0 0 0 1 1 11 0 0 0 1 0 12 0 0 0 1 1 13 1 0 0 0 1 14 1 0 0 0 0 15 1 0 0 0 1 16 1 0 0 0 0 17 1 0 0 0 0 18 1 0 0 0 1 19 0 0 1 0 1 20 0 0 1 0 0 21 0 0 1 0 1 22 0 0 1 0 0 23 0 0 1 0 0 24 0 0 1 0 1 > duplicateCorrelation(BSlimma.neqc, design, block=as.numeric(factor(targets$Sentrix_ID)))$consensus [1] 0.1528686 > fit.neqc <- lmFit(BSlimma.neqc, design, block=as.numeric(factor(targets$Sentrix_ID)), correlation=0.1528686) > cont.matrix <- makeContrasts(DHT.8_v_Cont.8 = DHT.8 - Cont.8, DHT.24_v_Cont.24 = DHT.24 - Cont.24, + Cont.24_v_Cont.8 = Cont.24 - Cont.8, DHT.24_v_DHT.8 = DHT.24 - DHT.8, + interact = (DHT.24 - Cont.24) - (DHT.8 - Cont.8),levels=design) > fit.neqc2 <- eBayes(contrasts.fit(fit.neqc,cont.matrix)) # This is what I _think_ I should do to remove both batch effects from the sample data for heatmaps: > noChip.values <- removeBatchEffect(BSlimm.neqc$E, batch=as.numeric(factor(targets$Sentrix_ID)), design=design) > noChip.noPop.values <- removeBatchEffect(noChip.values, batch=targets$pop, design = design[,1:4] ) Then I can use noChip.noPop.values for heatmaps? Thanks, Jenny > sessionInfo() R version 2.13.0 (2011-04-13) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] splines grid stats graphics grDevices datasets utils methods base other attached packages: [1] statmod_1.4.9 beadarray_2.2.0 limma_3.8.2 WGCNA_1.00 Hmisc_3.8-3 [6] survival_2.36-5 qvalue_1.26.0 flashClust_1.01 dynamicTreeCut_1.21 impute_1.26.0 [11] made4_1.26.0 scatterplot3d_0.3-33 gplots_2.8.0 caTools_1.11 bitops_1.0-4.1 [16] gdata_2.8.1 gtools_2.6.2 RColorBrewer_1.0-2 ade4_1.4-17 affyPLM_1.28.5 [21] preprocessCore_1.14.0 gcrma_2.24.1 affycoretools_1.24.0 KEGG.db_2.5.0 GO.db_2.5.0 [26] RSQLite_0.9-4 DBI_0.2-5 AnnotationDbi_1.14.1 affy_1.30.0 Biobase_2.12.1 [31] RWinEdt_1.8-2 loaded via a namespace (and not attached): [1] affyio_1.20.0 annaffy_1.24.0 annotate_1.30.0 biomaRt_2.8.1 Biostrings_2.20.1 Category_2.18.0 cluster_1.13.3 [8] genefilter_1.34.0 GOstats_2.18.0 graph_1.30.0 GSEABase_1.14.0 IRanges_1.10.4 lattice_0.19-23 RBGL_1.28.0 [15] RCurl_1.6-6.1 tcltk_2.13.0 tools_2.13.0 XML_3.4-0.2 xtable_1.5-6
• 882 views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia
Hi Jenny, Perhaps I'm mis-understanding your setup, but isn't Sample_Group confounded with BeadChip? If the samples are entered in BeadChip order, then the first six would be BeadChip 1 (and also Cont.8), the second six would be BeadChip 2 (and also DHT.8), etc. Or have the samples been reordered by Sample_Group? Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. smyth at wehi.edu.au http://www.wehi.edu.au http://www.statsci.org/smyth On Wed, 29 Jun 2011, Jenny Drnevich wrote: > Hi Gordon and everyone, > > I was wondering what is the best way to remove two different batch effects > from a set of sample values for plotting heatmaps? I have data from > Illumina's MouseWG6 arrays, 24 samples on 4 BeadChips. There is the effect of > BeadChip, which I'm accounting for in the model using duplicateCorrelation(), > and also population batch effect, which I have as a term in the design > matrix. I'd like to remove both of these effects from the individual sample > data before making heatmaps of significant genes. Should I use > removeBatchEffect() twice, first to remove the BeadChip effect (with > population in the design) and then remove the population effect? Here's an > example of my analysis code and what I'm thinking of doing to remove the two > effect. Please let me know if this is alright, or if I should do something > different: > >> design <- model.matrix(~0+ factor(targets$Sample_Group)) >> colnames(design) <- levels(factor(targets$Sample_Group)) >> design <- cbind(design,pop=as.numeric(targets$pop==1)) >> design > Cont.24 Cont.8 DHT.24 DHT.8 pop > 1 0 1 0 0 1 > 2 0 1 0 0 0 > 3 0 1 0 0 0 > 4 0 1 0 0 1 > 5 0 1 0 0 1 > 6 0 1 0 0 0 > 7 0 0 0 1 1 > 8 0 0 0 1 0 > 9 0 0 0 1 0 > 10 0 0 0 1 1 > 11 0 0 0 1 0 > 12 0 0 0 1 1 > 13 1 0 0 0 1 > 14 1 0 0 0 0 > 15 1 0 0 0 1 > 16 1 0 0 0 0 > 17 1 0 0 0 0 > 18 1 0 0 0 1 > 19 0 0 1 0 1 > 20 0 0 1 0 0 > 21 0 0 1 0 1 > 22 0 0 1 0 0 > 23 0 0 1 0 0 > 24 0 0 1 0 1 > >> duplicateCorrelation(BSlimma.neqc, design, > block=as.numeric(factor(targets$Sentrix_ID)))$consensus > [1] 0.1528686 > >> fit.neqc <- lmFit(BSlimma.neqc, design, > block=as.numeric(factor(targets$Sentrix_ID)), correlation=0.1528686) > >> cont.matrix <- makeContrasts(DHT.8_v_Cont.8 = DHT.8 - Cont.8, > DHT.24_v_Cont.24 = DHT.24 - Cont.24, > + Cont.24_v_Cont.8 = Cont.24 - Cont.8, > DHT.24_v_DHT.8 = DHT.24 - DHT.8, > + interact = (DHT.24 - Cont.24) - (DHT.8 - > Cont.8),levels=design) > >> fit.neqc2 <- eBayes(contrasts.fit(fit.neqc,cont.matrix)) > > # This is what I _think_ I should do to remove both batch effects from the > sample data for heatmaps: > >> noChip.values <- removeBatchEffect(BSlimm.neqc$E, > batch=as.numeric(factor(targets$Sentrix_ID)), design=design) > >> noChip.noPop.values <- removeBatchEffect(noChip.values, batch=targets$pop, > design = design[,1:4] ) > > Then I can use noChip.noPop.values for heatmaps? > > Thanks, > Jenny > >> sessionInfo() > R version 2.13.0 (2011-04-13) > Platform: x86_64-pc-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C LC_TIME=English_United States.1252 > > attached base packages: > [1] splines grid stats graphics grDevices datasets utils > methods base > > other attached packages: > [1] statmod_1.4.9 beadarray_2.2.0 limma_3.8.2 WGCNA_1.00 > Hmisc_3.8-3 > [6] survival_2.36-5 qvalue_1.26.0 flashClust_1.01 > dynamicTreeCut_1.21 impute_1.26.0 > [11] made4_1.26.0 scatterplot3d_0.3-33 gplots_2.8.0 caTools_1.11 > bitops_1.0-4.1 > [16] gdata_2.8.1 gtools_2.6.2 RColorBrewer_1.0-2 > ade4_1.4-17 affyPLM_1.28.5 > [21] preprocessCore_1.14.0 gcrma_2.24.1 affycoretools_1.24.0 > KEGG.db_2.5.0 GO.db_2.5.0 > [26] RSQLite_0.9-4 DBI_0.2-5 AnnotationDbi_1.14.1 > affy_1.30.0 Biobase_2.12.1 > [31] RWinEdt_1.8-2 > > loaded via a namespace (and not attached): > [1] affyio_1.20.0 annaffy_1.24.0 annotate_1.30.0 biomaRt_2.8.1 > Biostrings_2.20.1 Category_2.18.0 cluster_1.13.3 > [8] genefilter_1.34.0 GOstats_2.18.0 graph_1.30.0 GSEABase_1.14.0 > IRanges_1.10.4 lattice_0.19-23 RBGL_1.28.0 > [15] RCurl_1.6-6.1 tcltk_2.13.0 tools_2.13.0 XML_3.4-0.2 > xtable_1.5-6 > > > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT
0
Entering edit mode
HI Gordon, No, Sample_Group isn't confounded with BeadChip. Our microarray unit is very good about asking customers about the treatment groups and randomizing them on arrays. I use GenomeStudio to output summarized beadtype values (no bg or norm), and GenomeStudio always re-organizes the by Sample_Group. I've learned to double-check that the order in my targets file is the same as the order coming out of GenomeStudio! Jenny At 06:08 PM 6/29/2011, Gordon K Smyth wrote: >Hi Jenny, > >Perhaps I'm mis-understanding your setup, but isn't Sample_Group >confounded with BeadChip? If the samples are entered in BeadChip >order, then the first six would be BeadChip 1 (and also Cont.8), the >second six would be BeadChip 2 (and also DHT.8), etc. Or have the >samples been reordered by Sample_Group? > >Best wishes >Gordon > >--------------------------------------------- >Professor Gordon K Smyth, >Bioinformatics Division, >Walter and Eliza Hall Institute of Medical Research, >1G Royal Parade, Parkville, Vic 3052, Australia. >smyth at wehi.edu.au >http://www.wehi.edu.au >http://www.statsci.org/smyth > >On Wed, 29 Jun 2011, Jenny Drnevich wrote: > >>Hi Gordon and everyone, >> >>I was wondering what is the best way to remove two different batch >>effects from a set of sample values for plotting heatmaps? I have >>data from Illumina's MouseWG6 arrays, 24 samples on 4 BeadChips. >>There is the effect of BeadChip, which I'm accounting for in the >>model using duplicateCorrelation(), and also population batch >>effect, which I have as a term in the design matrix. I'd like to >>remove both of these effects from the individual sample data before >>making heatmaps of significant genes. Should I use >>removeBatchEffect() twice, first to remove the BeadChip effect >>(with population in the design) and then remove the population >>effect? Here's an example of my analysis code and what I'm thinking >>of doing to remove the two effect. Please let me know if this is >>alright, or if I should do something different: >> >>>design <- model.matrix(~0+ factor(targets$Sample_Group)) >>>colnames(design) <- levels(factor(targets$Sample_Group)) >>>design <- cbind(design,pop=as.numeric(targets$pop==1)) >>>design >> Cont.24 Cont.8 DHT.24 DHT.8 pop >>1 0 1 0 0 1 >>2 0 1 0 0 0 >>3 0 1 0 0 0 >>4 0 1 0 0 1 >>5 0 1 0 0 1 >>6 0 1 0 0 0 >>7 0 0 0 1 1 >>8 0 0 0 1 0 >>9 0 0 0 1 0 >>10 0 0 0 1 1 >>11 0 0 0 1 0 >>12 0 0 0 1 1 >>13 1 0 0 0 1 >>14 1 0 0 0 0 >>15 1 0 0 0 1 >>16 1 0 0 0 0 >>17 1 0 0 0 0 >>18 1 0 0 0 1 >>19 0 0 1 0 1 >>20 0 0 1 0 0 >>21 0 0 1 0 1 >>22 0 0 1 0 0 >>23 0 0 1 0 0 >>24 0 0 1 0 1 >> >>>duplicateCorrelation(BSlimma.neqc, design, >>block=as.numeric(factor(targets$Sentrix_ID)))$consensus >>[1] 0.1528686 >> >>>fit.neqc <- lmFit(BSlimma.neqc, design, >>block=as.numeric(factor(targets$Sentrix_ID)), correlation=0.1528686) >> >>>cont.matrix <- makeContrasts(DHT.8_v_Cont.8 = DHT.8 - Cont.8, >>DHT.24_v_Cont.24 = DHT.24 - Cont.24, >>+ Cont.24_v_Cont.8 = Cont.24 - Cont.8, >>DHT.24_v_DHT.8 = DHT.24 - DHT.8, >>+ interact = (DHT.24 - Cont.24) - >>(DHT.8 - Cont.8),levels=design) >> >>>fit.neqc2 <- eBayes(contrasts.fit(fit.neqc,cont.matrix)) >> >># This is what I _think_ I should do to remove both batch effects >>from the sample data for heatmaps: >> >>>noChip.values <- removeBatchEffect(BSlimm.neqc$E, >>batch=as.numeric(factor(targets$Sentrix_ID)), design=design) >> >>>noChip.noPop.values <- removeBatchEffect(noChip.values, batch=targets$pop, >>design = design[,1:4] ) >> >>Then I can use noChip.noPop.values for heatmaps? >> >>Thanks, >>Jenny >> >>>sessionInfo() >>R version 2.13.0 (2011-04-13) >>Platform: x86_64-pc-mingw32/x64 (64-bit) >> >>locale: >>[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United >>States.1252 LC_MONETARY=English_United States.1252 >>[4] LC_NUMERIC=C LC_TIME=English_United States.1252 >> >>attached base packages: >>[1] splines grid stats graphics grDevices >>datasets utils methods base >> >>other attached packages: >>[1] statmod_1.4.9 beadarray_2.2.0 limma_3.8.2 >>WGCNA_1.00 Hmisc_3.8-3 >>[6] survival_2.36-5 qvalue_1.26.0 flashClust_1.01 >>dynamicTreeCut_1.21 impute_1.26.0 >>[11] made4_1.26.0 scatterplot3d_0.3-33 gplots_2.8.0 >>caTools_1.11 bitops_1.0-4.1 >>[16] gdata_2.8.1 gtools_2.6.2 RColorBrewer_1.0-2 >>ade4_1.4-17 affyPLM_1.28.5 >>[21] preprocessCore_1.14.0 >>gcrma_2.24.1 affycoretools_1.24.0 KEGG.db_2.5.0 GO.db_2.5.0 >>[26] >>RSQLite_0.9-4 DBI_0.2-5 AnnotationDbi_1.14.1 >>affy_1.30.0 Biobase_2.12.1 >>[31] RWinEdt_1.8-2 >> >>loaded via a namespace (and not attached): >>[1] >>affyio_1.20.0 annaffy_1.24.0 annotate_1.30.0 biomaRt_2.8.1 >>Biostrings_2.20.1 Category_2.18.0 cluster_1.13.3 >>[8] genefilter_1.34.0 >>GOstats_2.18.0 graph_1.30.0 GSEABase_1.14.0 >>IRanges_1.10.4 lattice_0.19-23 RBGL_1.28.0 >>[15] >>RCurl_1.6-6.1 tcltk_2.13.0 tools_2.13.0 XML_3.4-0.2 xtable_1.5-6 >> >> > >_____________________________________________________________________ _ >The information in this email is confidential and inten...{{dropped:6}}
ADD REPLY

Login before adding your answer.

Traffic: 872 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6