Question

Unexpected clustering of samples after batch effect removal with ComBat

0

Entering edit mode

jaro.slamecka ▴ 140

@jaroslamecka-7419

Last seen 10 months ago

Mitchell Cancer Institute, Mobile AL, U…

Hi,

I'm trying to find differentially expressed genes between two expression microarray datasets, both Illumina HT-12 v4, one only has 3 samples (3 different patients), the other one has 5 in triplicates (biological, clones). I loaded the datasets using lumi and individually transformed them (vst) and normalized (rsn). Then I combined the datasets based on common features. The means in the first dataset are around 7.4, in the second around 8.4. So I tried to remove the batch effect using ComBat with the following:

pheno = pData(EDATA)
edata = exprs(EDATA)
batch = pheno$batch
modcombat = model.matrix(~1, data=pheno)
combat_edata = ComBat(dat=edata, batch=batch, mod=modcombat, numCovs=NULL, par.prior=TRUE, prior.plots=FALSE)
exprs(EDATA) = combat_edata ## to plug the expression values back into the original ExpressionSet

My problem is, after ComBat, the samples cluster unexpectedly. The 3 samples that make up the first dataset should be clustering together and separately from the rest of the samples. But instead, not even the triplicate samples cluster together anymore. Is there something wrong with this approach? Or could ComBat be destroying the differences? The RNAs come from cells of similar type - all iPSC - but very different culture systems. Any help would be greatly appreciated.

Jaro

ComBat • 2.5k views

ADD COMMENT • link updated 7.8 years ago by Gordon Smyth 53k • written 10.7 years ago by jaro.slamecka ▴ 140

1

Entering edit mode

If the data are all on the same array, why are you combining data based on common features. If they are the same array, by definition the features are all common.

In addition, if you are trying to make comparisons between batches and the biological replicates are all either in one or the other batch (e.g., if you are trying to compare the 3 samples in the first set vs the 5 in the second set), then you should note that it is not possible to remove the batch effect. In this situation, the biological differences are completely confounded with batch and there is no way to unscramble that egg.

ADD REPLY • link 10.7 years ago James W. MacDonald 68k