Question

PCA of bladderdata ("Batch effects and confounders" tutorial)

2

Entering edit mode

nsakabe ▴ 20

@nsakabe-11071

Last seen 7.8 years ago

United States

I am following Jeff Leek's tutorial on batch effects (http://jtleek.com/genstats/inst/doc/02_13_batch-effects.html) and I wanted to plot PC1 vs PC2 of the test data.

After applying svaseq, I see better separation between Cancer and Normal than what I see with the uncorrected data (Biopsy clusters with Normal). ComBat doesn't seem to improve clustering. Is this correct?

Thank you!

My code:

library(ggplot2)
library(devtools)
library(Biobase)
library(limma)
library(sva)
library(bladderbatch)

pca_any <- function(counts, colorby, label, name, size, scale){
  pcax = prcomp(t( counts ), scale=scale)
  pcvar = pcax$sdev^2/sum(pcax$sdev^2)*100
  p = qplot(pcax$x[,1],pcax$x[,2], main=paste(name, ', scale=', scale, sep=''), colour=colorby,
            xlab=paste("PCA 1: ", round(pcvar[1], digits=1), "% variance", sep=""),
            xlim = c(min(pcax$x[,1])*2, max(pcax$x[,1])*1.2),
            ylab=paste("PCA 2: ", round(pcvar[2], digits=1), "% variance", sep=""), geom="text", label=label) +
    labs(colour='groups')
  png(file=paste("pca-", name, ".png", sep=''), res=200, width=size, height=size)
  print(p)
  dev.off()
}


data(bladderdata)
pheno = pData(bladderEset)
edata = exprs(bladderEset)

pca_any(counts=edata, colorby=pheno$cancer, rownames(pheno), name='uncorrected', size=1200, scale=FALSE)

mod = model.matrix(~cancer,data=pheno)
mod0 = model.matrix(~1, data=pheno)
sva1 = sva(edata,mod,mod0,n.sv=2)
cov = cbind(sva1$sv[,1], sva1$sv[,2])
counts.fixed <- removeBatchEffect(edata, covariates = cov)
pca_any(counts=counts.fixed, colorby=pheno$cancer, rownames(pheno), name='svaseq-removeBatchEffect', size=1200, scale=FALSE)

mod = model.matrix(~1, data=pheno)
combat = ComBat(dat=edata, batch=pheno$batch, mod=mod, par.prior=TRUE, prior.plots = FALSE)
pca_any(counts=combat, colorby=pheno$cancer, rownames(pheno), name='combat', size=1200, scale=FALSE)

sva • 2.0k views

ADD COMMENT • link updated 7.8 years ago by Keith Hughitt ▴ 180 • written 7.8 years ago by nsakabe ▴ 20

score 1 · Answer 1 · 2016-07-09

1

Entering edit mode

Keith Hughitt ▴ 180

@keith-hughitt-6740

Last seen 7 weeks ago

United States

Hi nsakabe,

That's correct. It might help to plot the samples as points instead of text labels so you can see exactly how they are overlapping (for example, is the separation between cancer and normal better in SVA compared with unadjusted?), but from the above it appears that SVA does the best job separating the three batches, with no adjustment being better than performing `ComBat`.

One important thing to check is the amount of variance preserved after each transformation. Sometimes a good separation can come at the cost of a significant reduction in variance, which can make it harder to detect differences in downstream analyses. Here that doesn't seem to be the problem. With SVA, you do lost a bit of the variance in PC2 (14 -> 6%), but PC1 still retains a similar amount of variance.

Finally, if you are curious, you could also try plotting PC3 in each case. It won't carry a lot of the variance, but it's possible that ComBat is primarily adjusting along that PC.

ADD COMMENT • link 7.8 years ago Keith Hughitt ▴ 180

1

Entering edit mode

I would suggest that you try out our new BatchQC package. Its a Shiny app for batch effect exploration. The paper is under review, but the package is available. This will allow you to check out and compare ComBat and SVA using multiple different metrics. I haven’t seen all the results on Bladderbatch, so I don’t know if ComBat or SVA works better, but BatchQC will allow you to make that comparison using a user-friendly interface and multiple metrics. On Jul 9, 2016, at 7:23 AM, Keith Hughitt [bioc] <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> wrote: Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""/> User Keith Hughitt<https: support.bioconductor.org="" u="" 6740=""/> wrote Answer: PCA of bladderdata ("Batch effects and confounders" tutorial)<https: support.bioconductor.org="" p="" 84804="" #84860="">: Hi nsakabe, That's correct. It might help to plot the samples as points instead of text labels so you can see exactly how they are overlapping (for example, is the separation between cancer and normal better in SVA compared with unadjusted?), but from the above it appears that SVA does the best job separating the three batches, with no adjustment being better than performing `ComBat`. One important thing to check is the amount of variance preserved after each transformation. Sometimes a good separation can come at the cost of a significant reduction in variance, which can make it harder to detect differences in downstream analyses. Here that doesn't seem to be the problem. With SVA, you do lost a bit of the variance in PC2 (14 -> 6%), but PC1 still retains a similar amount of variance. Finally, if you are curious, you could also try plotting PC3 in each case. It won't carry a lot of the variance, but it's possible that ComBat is primarily adjusting along that PC. ________________________________ Post tags: sva You may reply via email or visit A: PCA of bladderdata ("Batch effects and confounders" tutorial)

ADD REPLY • link 7.8 years ago W. Evan Johnson ▴ 850