I've been researching ways to remove batch effects from RNA-Seq count matrices. Basically, I'm starting with a counts matrix that includes batch effects, and want to generate a new matrix of counts that has the batch effects removed.
I'm looking to apply this to sets of RNA-Seq samples (~100 samples) that were sequenced in batches on different days (factor) and for which I also have other metadata with continuous values (covariates such as total sequenced reads in each sample, quality metrics, etc). I want to study all these samples in an unsupervised manner, and don't have a model for anything but the various batch effects that I want removed (ie. no cancer vs. normal labeling, instead they're all 'normal' and I'd like to see if they form clusters based on natural variation in the population, and perhaps identify subtypes).
From what I've read thus far, methods like sva (and the included Combat) require that you provide a model for the covariates that you do not want removed (biological factors) in addition to the ones you do want removed (batch effects). Is it not possible to use these methods in my scenario, where I don't have factors other than the specified batch effects?
In searching the bioconductor mailing list archive, I found:
edgeR package, removeBatchEffect() function
which seems to do exactly what I want, and I'll experiment with it shortly. I'm mostly curious about what other methods might be available to do this, and whether the SVA or other libraries contain functions that I should explore.
Many thanks in advance for any advice!