I realize that the topic of batch effect removal in RNA-seq has been addressed many times. I am not sure if this particular aspect of it was, but I may have missed it. I was looking at the DESeq2 vignette, which has grown substantially over the last few years. Specifically, there is a note:
If there is unwanted variation present in the data (e.g. batch effects) it is always recommend to correct for this, which can be accommodated in DESeq2 by including in the design any known batch variables or by using functions/packages such as svaseq in sva (Leek 2014) or the RUV functions in RUVSeq (Risso et al. 2014) to estimate variables that capture the unwanted variation. In addition, the ashr developers have a specific method for accounting for unwanted variation in combination with ashr (Gerard and Stephens 2017).
It is possible to visualize the transformed data with batch variation removed, using the removeBatchEffect function from limma. This simply removes any shifts in the log2-scale expression data that can be explained by batch.
I suppose one section addresses known and the other unknown variables. I am curious if there is a reason why multiple alternatives were mentioned in one, but not in the other (ComBat-seq, for example). Is it just for simplicity or is
removeBatchEffect the most optimal approach?