I am wondering if there is a way to remove batch effects from microarray data based on only a subset of samples. The thing is that the controls in the data are homogenous and the experimental samples are much more heterogenous, as seen from the PCA plot (created after removing batch effect with ComBat). This is expected since the experimental samples are a result of a more stochastic process and therefore I do not necessarily expect replicates to cluster together. The blue circles are control and green experimental samples.
For differential expression analysis in limma between the groups, I account for the batch in the design matrix, however, to create a heatmap, I would like to remove the batch effect (either using Combat or removeBatchEffect) as correctly as possible to allow for the control samples to cluster together, otherwise the batch still shows in the control branch. When I subset the data to only keep the controls and then remove the batch effect, the replicates do cluster together.
condition batch control 1 control 1 control 1 exp 1 exp 1 exp 1 control 2 control 2 control 2 control 2 control 2 control 2 exp 2 exp 2 exp 2 exp 2 exp 2 exp 2
Is there a way to do this? And should I be trying in the first place?
Thanks for any advice!