Question: Inspection of batch effect correction success with ComBat in R, regarding a merged dataset comprised of 4 different batches
0
2.8 years ago by
svlachavas660
Greece/Athens/National Hellenic Research Foundation
svlachavas660 wrote:

Dear Bioconductor Community,

Im currently analyzing 4 different affymetrix hgu133plus2 datasets, in order to test via machine learning algorithms in R, a 39-gene signature I have already developed in an independent training set, which could discriminate colorectal cancer from control samples. In my latest step, I would like to implement ComBat to correct for batch effect regarding the different dataset-study, and then apply my classifier. In detail, I merged the 4 different datasets (after normalizing separately with frma each)---all having the same affymetrix platform and annotation (also used customCDF annotation), & the same phenotype information i would like to predict in my later step (Status variable below). After merging (as all the datasets have the same affymetrix platform) my code is the following:

total.eset # my merged expressionSet ExpressionSet (storageMode: environment) assayData: 19764 features, 142 samples    element names: exprs  protocolData: none phenoData   rowNames: GSM93789.CEL GSM93920.CEL ... GSM549142.CEL (142 total)   varLabels: Status Dataset   varMetadata: labelDescription featureData: none experimentData: use 'experimentData(object)' Annotation: hgu133plus2hsentrezg 

## also my phenotype information

head(pData(total.eset))
Status Dataset
GSM93789.CEL Cancer GSE4107
GSM93920.CEL Cancer GSE4107
GSM93921.CEL Cancer GSE4107
GSM93922.CEL Cancer GSE4107
GSM93923.CEL Cancer GSE4107
GSM93924.CEL Cancer GSE4107

where Status is the binary factorial variable i would like to use for the prediction-later classification step, whereas the variable Dataset is the batch covariate. Moreover, there is also an obvious class inbalance regarding the Status information:

table(pData(total.eset)$Status) Cancer Normal 99 43  pheno <- pData(total.eset) batch <- total.eset$Dataset

class(batch) [1] "factor"

levels(batch) [1] "GSE4107"              "GSE4183"              "GSE18088"             [4] "Homogenized.GSE21510"

modcombat <- model.matrix(~1, data=pheno) combat_data <- ComBat(dat=exprs(total.eset), batch=batch, mod=modcombat, par.prior=TRUE, prior.plots=TRUE) 

....Found 4 batches
Adjusting for 0 covariate(s) or covariate level(s)
Standardizing Data across genes
Fitting L/S model and finding priors

exprs(total.eset) <- combat_data

My main issue-consideration, is than i created an MDS plot both before and after ComBat correction, to inspect the severity of batch effect and the possible improvement after. i used the following code for the MDS plot, both before and after correction:

plotMDS(exprs(total.eset), labels=total.eset$Status, col=as.numeric(as.factor(total.eset$Dataset)))

Assuming my above chunk code is correct, do you believe that ComBat correction improves the situation, or it makes things more "fuzzy" ? Based on the two plots ? Moreover, the fact that all probesets are used, might has a role in this ? (i intend to subset the corrected eSet with the 39 gene signature symbols i mentioned above). Or i can proceed to my next step ? Finally, It is important also to mention that one of the 4 datasets has no normal samples, just cancer samples.

My MDS plots, as also an hclust plot are below (the colors in the plots depict the different study-dataset)

https://www.dropbox.com/s/urdbl3d6qe5bq0a/beforeCoMbat.totaleSet.png?dl=0

https://www.dropbox.com/s/t4crrfv0ygjifpb/after.correction.ComBat.MDSplot.png?dl=0

https://www.dropbox.com/s/5hnrs81t628ylxr/hclust.plot.average.beforeComBat.png?dl=0

https://www.dropbox.com/s/k7v8ttbq27pk1xh/prior.plots.ComBat.png?dl=0 # also the plot produced from ComBat

Any help or suggestions on the above issue would be essential !!

modified 2.8 years ago by Jeff Leek550 • written 2.8 years ago by svlachavas660
Answer: Inspection of batch effect correction success with ComBat in R, regarding a merg
1
2.8 years ago by
Jeff Leek550
United States
Jeff Leek550 wrote:

Hi

It looks like Combat may have removed some of the batch effect, but it is pretty hard to detect from these plots. You might consider checking out the BatchQC package:

http://watson.nci.nih.gov/bioc_mirror/packages/release/bioc/html/BatchQC.html

Best

Jeff