I'm trying to find differentially expressed genes between two expression microarray datasets, both Illumina HT-12 v4, one only has 3 samples (3 different patients), the other one has 5 in triplicates (biological, clones). I loaded the datasets using lumi and individually transformed them (vst) and normalized (rsn). Then I combined the datasets based on common features. The means in the first dataset are around 7.4, in the second around 8.4. So I tried to remove the batch effect using ComBat with the following:
pheno = pData(EDATA) edata = exprs(EDATA) batch = pheno$batch modcombat = model.matrix(~1, data=pheno) combat_edata = ComBat(dat=edata, batch=batch, mod=modcombat, numCovs=NULL, par.prior=TRUE, prior.plots=FALSE) exprs(EDATA) = combat_edata ## to plug the expression values back into the original ExpressionSet
My problem is, after ComBat, the samples cluster unexpectedly. The 3 samples that make up the first dataset should be clustering together and separately from the rest of the samples. But instead, not even the triplicate samples cluster together anymore. Is there something wrong with this approach? Or could ComBat be destroying the differences? The RNAs come from cells of similar type - all iPSC - but very different culture systems. Any help would be greatly appreciated.