3.7 years ago by
Cambridge, United Kingdom
Well, if you expect similar results before and after batch removal, then what's the point of doing it at all?
In your case, I would suspect that the batch effect "muddies the water" by introducing additional variability between well-correlated samples in different batches. You haven't shown what your experimental design is, but let's consider a simple case; you have two libraries that are perfectly correlated to each other with log-normal expression values:
lib1 <- lib2 <- rnorm(1000, runif(1000, -2, 2))
cor(lib1, lib2) # gives 1, obviously.
Assuming that the two libraries belong in different batches, we end up introducing a normally-distributed batch effect:
lib1 <- lib1 + rnorm(1000)
lib2 <- lib2 + rnorm(1000)
cor(lib1, lib2) # should give something smaller.
This reduces the correlations as the effect of being in each batch is different. Thus, if you remove the batch effect, you'll recover the larger correlation. Note, however, that if two poorly-correlated libraries are in the same batch, then the correlation between them gets increased because of the shared batch effect:
lib1 <- rnorm(1000, runif(1000, -2, 2))
lib2 <- rnorm(1000, runif(1000, -2, 2))
cor(lib1, lib2) # close to zero.
batch <- rnorm(1000)
lib1 <- lib1 + batch
lib2 <- lib2 + batch
cor(lib1, lib2) # bigger.
So the effect of the batch on the correlations depends on which pairs of libraries you consider. In any case, removing the batch effect would seem to give the more appropriate results, by avoid deflated correlations due to difference between batches and inflated correlations due to the presence of libraries in the same batch.
modified 3.7 years ago
3.7 years ago by
Aaron Lun • 25k