Voom and removing batch effect
1
0
Entering edit mode
@heather-estrella-6259
Last seen 9.6 years ago
Hi, I have a miRNA-Seq dataset with clear batch effect that I'm trying to correct for in order to run differential expression using voom/limma. Based on the Limma User Guide, this can be done by adding batch as a confounding factor to the design matrix used by voom and limma fit. However, when I run an MDS plot on batch using the voom results, it's still showing a clear batch effect. What am I missing? What is the best way to correct for and test for batch effect after correction? In the design matrix, the "dz_cat" group to compare all the other groups does not appear in the design matrix. My understanding from reading other posting by Gordom Smyth is that it's because it is used as the comparison group to all the other groups. I'm not sure why it is all 1's though. FYI: I am able to correct for batch effect using the cpm-log2 values and ComBat. However, for differential expression using voom/limma I need to use raw counts for the normalization and differential comparison. Here is my code: > library(edgeR) > library(limma) > y = DGEList(counts=datamat,genes=rownames(datamat)) > y = calcNormFactors(y) > design <- model.matrix(~batch+dz_cat, data = samplemat) > png(file=paste("../results/MV_plot_voom_",name,".png",sep=""),poi ntsize=11,bg="white") > v <- voom(counts=y, design,plot=TRUE) > dev.off() > colrs = rainbow(length(unique(samplemat$batch))) > mycol=colrs[samplemat$batch] > o = whichis.na(mycol)) > png(height=600,width=600,file=paste("../results/MDS_plot_voom_",na me,".png",sep=""),pointsize=11,bg="white") > plotMDS(v,top=50,labels=substring(samplemat$batch,1,1),col=m ycol,gene.selection="common") > dev.off() > design (Intercept) batch2nd batch3rd dz_catF dz_catL dz_catO 1 1 0 1 0 0 0 3 1 1 0 0 0 0 5 1 1 0 0 0 0 7 1 1 0 0 0 0 8 1 0 1 0 0 0 9 1 0 1 0 0 0 11 1 1 0 0 0 0 12 1 0 1 0 0 0 14 1 1 0 0 0 0 16 1 1 0 0 0 0 17 1 0 1 0 0 0 18 1 0 1 0 0 0 20 1 1 0 0 0 0 21 1 0 1 0 0 0 23 1 1 0 0 0 0 24 1 0 1 0 0 0 26 1 1 0 0 1 0 27 1 0 1 0 1 0 28 1 0 1 0 1 0 29 1 0 1 0 1 0 31 1 1 0 0 1 0 33 1 1 0 0 1 0 35 1 1 0 0 0 1 36 1 0 1 0 0 1 38 1 1 0 0 0 1 39 1 0 1 0 0 1 41 1 1 0 0 0 1 42 1 0 1 0 0 1 44 1 1 0 0 0 1 45 1 0 1 0 0 1 attr(,"assign") [1] 0 1 1 2 2 2 attr(,"contrasts") attr(,"contrasts")$batch [1] "contr.treatment" attr(,"contrasts")$dz_cat [1] "contr.treatment" > sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] sva_3.8.0 mgcv_1.7-27 nlme_3.1-113 corpcor_1.6.6 [5] edgeR_3.4.2 matrixStats_0.8.14 limma_3.18.9 gplots_2.12.1 loaded via a namespace (and not attached): [1] bitops_1.0-6 caTools_1.16 gdata_2.13.2 grid_3.0.2 [5] gtools_3.2.1 KernSmooth_2.23-10 lattice_0.20-24 Matrix_1.1-1.1 [9] R.methodsS3_1.6.1 Many thanks, Heather -------------- next part -------------- A non-text attachment was scrubbed... Name: MDS_plot_voom_UCSDOlefsky_voom.png Type: image/png Size: 5908 bytes Desc: MDS_plot_voom_UCSDOlefsky_voom.png URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20140121="" ffd71c9f="" attachment.png="">
Normalization limma Normalization limma • 5.8k views
ADD COMMENT
1
Entering edit mode
@ryan-c-thompson-5618
Last seen 8 months ago
Scripps Research, La Jolla, CA
Hello Heather, The way that edgeR, voom, and limma handle batch effects in a differential expression test is not by removing them, but simply including a batch effect term in the model. This is what you are doing when you use "~batch + dz_cat" as your model formula. When you test for differential expression using this design (i.e. by using lmFit, eBayes, and topTable), the test will be done in a way that corrects for the batch effect. If you just want to remove the batch effect for visualization or clustering purposes, limma provides the "removeBatchEffect" function for this purpose. -Ryan Thompson On Tue 21 Jan 2014 11:42:29 AM PST, Heather Estrella wrote: > > Hi, > > I have a miRNA-Seq dataset with clear batch effect that I'm trying to > correct for in order to run differential expression using voom/limma. > Based on the Limma User Guide, this can be done by adding batch as a > confounding factor to the design matrix used by voom and limma fit. > However, when I run an MDS plot on batch using the voom results, it's > still showing a clear batch effect. What am I missing? What is the > best way to correct for and test for batch effect after correction? > > In the design matrix, the "dz_cat" group to compare all the other > groups does not appear in the design matrix. My understanding from > reading other posting by Gordom Smyth is that it's because it is used > as the comparison group to all the other groups. I'm not sure why it > is all 1's though. > > FYI: I am able to correct for batch effect using the cpm-log2 values > and ComBat. However, for differential expression using voom/limma I > need to use raw counts for the normalization and differential comparison. > > Here is my code: > >> >> library(edgeR) >> library(limma) >> y = DGEList(counts=datamat,genes=rownames(datamat)) >> y = calcNormFactors(y) >> design <- model.matrix(~batch+dz_cat, data = samplemat) >> png(file=paste("../results/MV_plot_voom_",name,".png",sep=""),point size=11,bg="white") >> v <- voom(counts=y, design,plot=TRUE) >> dev.off() >> colrs = rainbow(length(unique(samplemat$batch))) >> mycol=colrs[samplemat$batch] >> o = whichis.na(mycol)) >> png(height=600,width=600,file=paste("../results/MDS_plot_voom_",nam e,".png",sep=""),pointsize=11,bg="white") >> plotMDS(v,top=50,labels=substring(samplemat$batch,1,1),col=mycol,ge ne.selection="common") >> dev.off() > > >> >> design > > (Intercept) batch2nd batch3rd dz_catF dz_catL dz_catO > 1 1 0 1 0 0 0 > 3 1 1 0 0 0 0 > 5 1 1 0 0 0 0 > 7 1 1 0 0 0 0 > 8 1 0 1 0 0 0 > 9 1 0 1 0 0 0 > 11 1 1 0 0 0 0 > 12 1 0 1 0 0 0 > 14 1 1 0 0 0 0 > 16 1 1 0 0 0 0 > 17 1 0 1 0 0 0 > 18 1 0 1 0 0 0 > 20 1 1 0 0 0 0 > 21 1 0 1 0 0 0 > 23 1 1 0 0 0 0 > 24 1 0 1 0 0 0 > 26 1 1 0 0 1 0 > 27 1 0 1 0 1 0 > 28 1 0 1 0 1 0 > 29 1 0 1 0 1 0 > 31 1 1 0 0 1 0 > 33 1 1 0 0 1 0 > 35 1 1 0 0 0 1 > 36 1 0 1 0 0 1 > 38 1 1 0 0 0 1 > 39 1 0 1 0 0 1 > 41 1 1 0 0 0 1 > 42 1 0 1 0 0 1 > 44 1 1 0 0 0 1 > 45 1 0 1 0 0 1 > attr(,"assign") > [1] 0 1 1 2 2 2 > attr(,"contrasts") > attr(,"contrasts")$batch > [1] "contr.treatment" > > attr(,"contrasts")$dz_cat > [1] "contr.treatment" > >> >> sessionInfo() > > R version 3.0.2 (2013-09-25) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] sva_3.8.0 mgcv_1.7-27 nlme_3.1-113 corpcor_1.6.6 > [5] edgeR_3.4.2 matrixStats_0.8.14 limma_3.18.9 gplots_2.12.1 > > loaded via a namespace (and not attached): > [1] bitops_1.0-6 caTools_1.16 gdata_2.13.2 grid_3.0.2 > [5] gtools_3.2.1 KernSmooth_2.23-10 lattice_0.20-24 Matrix_1.1-1.1 > [9] R.methodsS3_1.6.1 > > Many thanks, > Heather > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 719 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6