Hi,
I'm trying to find differentially expressed genes between two expression microarray datasets, both Illumina HT-12 v4, one only has 3 samples (3 different patients), the other one has 5 in triplicates (biological, clones). I loaded the datasets using lumi and individually transformed them (vst) and normalized (rsn). Then I combined the datasets based on common features. The means in the first dataset are around 7.4, in the second around 8.4. So I tried to remove the batch effect using ComBat with the following:
pheno = pData(EDATA) edata = exprs(EDATA) batch = pheno$batch modcombat = model.matrix(~1, data=pheno) combat_edata = ComBat(dat=edata, batch=batch, mod=modcombat, numCovs=NULL, par.prior=TRUE, prior.plots=FALSE) exprs(EDATA) = combat_edata ## to plug the expression values back into the original ExpressionSet
My problem is, after ComBat, the samples cluster unexpectedly. The 3 samples that make up the first dataset should be clustering together and separately from the rest of the samples. But instead, not even the triplicate samples cluster together anymore. Is there something wrong with this approach? Or could ComBat be destroying the differences? The RNAs come from cells of similar type - all iPSC - but very different culture systems. Any help would be greatly appreciated.
Jaro
If the data are all on the same array, why are you combining data based on common features. If they are the same array, by definition the features are all common.
In addition, if you are trying to make comparisons between batches and the biological replicates are all either in one or the other batch (e.g., if you are trying to compare the 3 samples in the first set vs the 5 in the second set), then you should note that it is not possible to remove the batch effect. In this situation, the biological differences are completely confounded with batch and there is no way to unscramble that egg.