![enter image description here][1]Hello,
I am attempting to remove batch effects from my data using limma::removeBatchEffect()
. I have two batches of samples, and there are four conditions. In the figures below batches are color-coded. I'm wondering why the batch effect seems stronger after applying the limma::removeBatchEffect()
.
The functions were running with default parameters, as follows:
vst <- vst(dds)
plotPCA(vst, "Sac")
assay(vst) <- limma::removeBatchEffect(assay(vst), vst$Sac)
plotPCA(vst, "Sac")
Before correction: After correction:
Many thanks for your response Michael, I appreciate that. This was RNAseq of mouse brain regions- and cell-specific RNA immunoprecipitations. Groups denote the days the mice were sacrificed. Conditions were not divided further.
Since the outliers were overlapping with the time-points in which the specimens were sacrificed, I thought it's a sound approach to treat it as a batch effect (importantly, the mice sacked later were also born later, so it should not be related to age).
Of course it may be i) a coincidence or ii) tissue preparation (experimental) artifact (e.g. lack or reproducibility in brain region dissection). If I understand correctly, the shift between those samples is inconsistent and therefore does not resemble a typical batch effect, hence the observed output of the
removeBatchEffect
function. Would it be a good use of time to try other tools to handle this?If this is not a batch effect, I would hesitate between i) using the samples as they are for comparisons or ii) using only "red" ones, and tossing the "green" batch.
I might try SVA or RUV.
Another thing I would do is find a batch-y gene (via an LRT removing the batch variable) and look at plotCounts() for these genes to see if the batch effect is consistent. The important thing for DE analysis is what happens at the gene level, while the PCA is just a QC plot, to give an overview of the variation.