strange layering in PCA after removeBatchEffect()
2
0
Entering edit mode
Julien Roux ▴ 90
@julien-roux-2710
Last seen 4.9 years ago
Switzerland/Basel/University of Basel
Dear all, After using limma function removeBatchEffect() on RNA-seq data, I observe a strange behavior when I use PCA to visualize my data. Here are some more details: # dge is my DGEList object with RNA-seq count data y <- predFC(dge, prior.count=2) # When I run a PCA on this matrix, I can observe that PCs 1 and 2 are highly correlated with 2 technical variables (here variables 2 and 3) that I wich to remove. The main effect is in variable 1 y.corrected <- removeBatchEffect(y, batch=var2, batch2=var3, design=model.matrix(~ var1)) # I then run a centered and scaled PCA on this matrix pca1 <- prcomp(t(y.corrected[apply(y.corrected, 1, sd) > 0, ]), scale = T) When I plot the PCA scores, I observe that the different samples are scattered on discrete layers on PC1: https://dl.dropboxusercontent.com/u/828794/PCA_removeBatchEffect.pdf This is something unexpected as it does not correlate with any technical or biological variable... Didi you observe this behavior before? Do you have an idea about what could cause this pattern? Thanks for your input Julien > sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_3.2.4 limma_3.16.8 RColorBrewer_1.0-5 loaded via a namespace (and not attached): [1] tools_3.0.1 -- Julien Roux, PhD Gilad lab, Department of Human Genetics, University of Chicago http://giladlab.uchicago.edu/ 920 East 58th Street, CLSC 317, Chicago, IL 60637, USA tel: +1-773-834-1984 fax: +1-773-834-8470
Genetics limma Genetics limma • 1.5k views
ADD COMMENT
0
Entering edit mode
Julien Roux ▴ 10
@julien-roux-6261
Last seen 9.6 years ago
Dear all, After using limma function removeBatchEffect() on RNA-seq data, I observe a strange behavior when I use PCA to visualize my data. Here are some more details: # dge is my DGEList object with RNA-seq count data y <- predFC(dge, prior.count=2) # When I run a PCA on this matrix, I can observe that PCs 1 and 2 are highly correlated with 2 technical variables (here variables 2 and 3) that I wich to remove. The main effect is in variable 1 y.corrected <- removeBatchEffect(y, batch=var2, batch2=var3, design=model.matrix(~ var1)) # I then run a centered and scaled PCA on this matrix pca1 <- prcomp(t(y.corrected[apply(y.corrected, 1, sd) > 0, ]), scale = T) When I plot the PCA scores, I observe that the different samples are scattered on discrete layers on PC1: https://dl.dropboxusercontent.com/u/828794/PCA_removeBatchEffect.pdf This is something unexpected as it does not correlate with any technical or biological variable... Didi you observe this behavior before? Do you have an idea about what could cause this pattern? Thanks for your input Julien -- Julien Roux, PhD Gilad lab, Department of Human Genetics, University of Chicago http://giladlab.uchicago.edu/ 920 East 58th Street, CLSC 317, Chicago, IL 60637, USA tel: +1-773-834-1984 fax: +1-773-834-8470
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 52 minutes ago
WEHI, Melbourne, Australia
Hi Julien, Your data are presumably discrete counts, so discrete layers in a plot would not be unexpected, especially if you have not filtered out low count data. You might update to the current release sofware. The use of predFC() to get overall logCPM still works, but is being deprecated in favour of cpm(). Best wishes Gordon > Date: Tue, 26 Nov 2013 11:39:48 +0100 > From: Julien Roux <julien.roux at="" unil.ch=""> > To: <bioconductor at="" stat.math.ethz.ch=""> > Subject: [BioC] strange layering in PCA after removeBatchEffect() > > Dear all, > After using limma function removeBatchEffect() on RNA-seq data, I > observe a strange behavior when I use PCA to visualize my data. Here are > some more details: > > # dge is my DGEList object with RNA-seq count data > y <- predFC(dge, prior.count=2) > # When I run a PCA on this matrix, I can observe that PCs 1 and 2 are > highly correlated with 2 technical variables (here variables 2 and 3) > that I wich to remove. The main effect is in variable 1 > y.corrected <- removeBatchEffect(y, batch=var2, batch2=var3, > design=model.matrix(~ var1)) > # I then run a centered and scaled PCA on this matrix > pca1 <- prcomp(t(y.corrected[apply(y.corrected, 1, sd) > 0, ]), scale = T) > > When I plot the PCA scores, I observe that the different samples are > scattered on discrete layers on PC1: > https://dl.dropboxusercontent.com/u/828794/PCA_removeBatchEffect.pdf > This is something unexpected as it does not correlate with any technical > or biological variable... > Didi you observe this behavior before? Do you have an idea about what > could cause this pattern? > > Thanks for your input > Julien > > > sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > locale: > [1] C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] edgeR_3.2.4 limma_3.16.8 RColorBrewer_1.0-5 > > loaded via a namespace (and not attached): > [1] tools_3.0.1 > > -- > Julien Roux, PhD > Gilad lab, Department of Human Genetics, University of Chicago > http://giladlab.uchicago.edu/ > 920 East 58th Street, CLSC 317, Chicago, IL 60637, USA > tel: +1-773-834-1984 fax: +1-773-834-8470 ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT

Login before adding your answer.

Traffic: 1021 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6