PCA of bladderdata ("Batch effects and confounders" tutorial)
1
2
Entering edit mode
nsakabe ▴ 20
@nsakabe-11071
Last seen 7.8 years ago
United States

I am following Jeff Leek's tutorial on batch effects (http://jtleek.com/genstats/inst/doc/02_13_batch-effects.html) and I wanted to plot PC1 vs PC2 of the test data.

After applying svaseq, I see better separation between Cancer and Normal than what I see with the uncorrected data (Biopsy clusters with Normal). ComBat doesn't seem to improve clustering. Is this correct?

Thank you!

My code:

library(ggplot2)
library(devtools)
library(Biobase)
library(limma)
library(sva)
library(bladderbatch)

pca_any <- function(counts, colorby, label, name, size, scale){
  pcax = prcomp(t( counts ), scale=scale)
  pcvar = pcax$sdev^2/sum(pcax$sdev^2)*100
  p = qplot(pcax$x[,1],pcax$x[,2], main=paste(name, ', scale=', scale, sep=''), colour=colorby,
            xlab=paste("PCA 1: ", round(pcvar[1], digits=1), "% variance", sep=""),
            xlim = c(min(pcax$x[,1])*2, max(pcax$x[,1])*1.2),
            ylab=paste("PCA 2: ", round(pcvar[2], digits=1), "% variance", sep=""), geom="text", label=label) +
    labs(colour='groups')
  png(file=paste("pca-", name, ".png", sep=''), res=200, width=size, height=size)
  print(p)
  dev.off()
}


data(bladderdata)
pheno = pData(bladderEset)
edata = exprs(bladderEset)

pca_any(counts=edata, colorby=pheno$cancer, rownames(pheno), name='uncorrected', size=1200, scale=FALSE)

mod = model.matrix(~cancer,data=pheno)
mod0 = model.matrix(~1, data=pheno)
sva1 = sva(edata,mod,mod0,n.sv=2)
cov = cbind(sva1$sv[,1], sva1$sv[,2])
counts.fixed <- removeBatchEffect(edata, covariates = cov)
pca_any(counts=counts.fixed, colorby=pheno$cancer, rownames(pheno), name='svaseq-removeBatchEffect', size=1200, scale=FALSE)

mod = model.matrix(~1, data=pheno)
combat = ComBat(dat=edata, batch=pheno$batch, mod=mod, par.prior=TRUE, prior.plots = FALSE)
pca_any(counts=combat, colorby=pheno$cancer, rownames(pheno), name='combat', size=1200, scale=FALSE)

sva • 2.0k views
ADD COMMENT
1
Entering edit mode
Keith Hughitt ▴ 180
@keith-hughitt-6740
Last seen 7 weeks ago
United States

Hi nsakabe,

That's correct. It might help to plot the samples as points instead of text labels so you can see exactly how they are overlapping (for example, is the separation between cancer and normal better in SVA compared with unadjusted?), but from the above it appears that SVA does the best job separating the three batches, with no adjustment being better than performing `ComBat`. 

One important thing to check is the amount of variance preserved after each transformation. Sometimes a good separation can come at the cost of a significant reduction in variance, which can make it harder to detect differences in downstream analyses. Here that doesn't seem to be the problem. With SVA, you do lost a bit of the variance in PC2 (14 -> 6%), but PC1 still retains a similar amount of variance.

Finally, if you are curious, you could also try plotting PC3 in each case. It won't carry a lot of the variance, but it's possible that ComBat is primarily adjusting along that PC.

ADD COMMENT
1
Entering edit mode
I would suggest that you try out our new BatchQC package. Its a Shiny app for batch effect exploration. The paper is under review, but the package is available. This will allow you to check out and compare ComBat and SVA using multiple different metrics. I haven’t seen all the results on Bladderbatch, so I don’t know if ComBat or SVA works better, but BatchQC will allow you to make that comparison using a user-friendly interface and multiple metrics. On Jul 9, 2016, at 7:23 AM, Keith Hughitt [bioc] <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> wrote: Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""/> User Keith Hughitt<https: support.bioconductor.org="" u="" 6740=""/> wrote Answer: PCA of bladderdata ("Batch effects and confounders" tutorial)<https: support.bioconductor.org="" p="" 84804="" #84860="">: Hi nsakabe, That's correct. It might help to plot the samples as points instead of text labels so you can see exactly how they are overlapping (for example, is the separation between cancer and normal better in SVA compared with unadjusted?), but from the above it appears that SVA does the best job separating the three batches, with no adjustment being better than performing `ComBat`. One important thing to check is the amount of variance preserved after each transformation. Sometimes a good separation can come at the cost of a significant reduction in variance, which can make it harder to detect differences in downstream analyses. Here that doesn't seem to be the problem. With SVA, you do lost a bit of the variance in PC2 (14 -> 6%), but PC1 still retains a similar amount of variance. Finally, if you are curious, you could also try plotting PC3 in each case. It won't carry a lot of the variance, but it's possible that ComBat is primarily adjusting along that PC. ________________________________ Post tags: sva You may reply via email or visit A: PCA of bladderdata ("Batch effects and confounders" tutorial)
ADD REPLY

Login before adding your answer.

Traffic: 842 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6