Entering edit mode
Julien Roux
▴
90
@julien-roux-2710
Last seen 5.5 years ago
Switzerland/Basel/University of Basel
Dear all,
After using limma function removeBatchEffect() on RNA-seq data, I
observe a strange behavior when I use PCA to visualize my data. Here
are
some more details:
# dge is my DGEList object with RNA-seq count data
y <- predFC(dge, prior.count=2)
# When I run a PCA on this matrix, I can observe that PCs 1 and 2 are
highly correlated with 2 technical variables (here variables 2 and 3)
that I wich to remove. The main effect is in variable 1
y.corrected <- removeBatchEffect(y, batch=var2, batch2=var3,
design=model.matrix(~ var1))
# I then run a centered and scaled PCA on this matrix
pca1 <- prcomp(t(y.corrected[apply(y.corrected, 1, sd) > 0, ]), scale
= T)
When I plot the PCA scores, I observe that the different samples are
scattered on discrete layers on PC1:
https://dl.dropboxusercontent.com/u/828794/PCA_removeBatchEffect.pdf
This is something unexpected as it does not correlate with any
technical
or biological variable...
Didi you observe this behavior before? Do you have an idea about what
could cause this pattern?
Thanks for your input
Julien
> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] edgeR_3.2.4 limma_3.16.8 RColorBrewer_1.0-5
loaded via a namespace (and not attached):
[1] tools_3.0.1
--
Julien Roux, PhD
Gilad lab, Department of Human Genetics, University of Chicago
http://giladlab.uchicago.edu/
920 East 58th Street, CLSC 317, Chicago, IL 60637, USA
tel: +1-773-834-1984 fax: +1-773-834-8470