Should I perform batch correction of normalized RNA-seq counts prior to GSVA?
Entering edit mode
Lucy ▴ 60
Last seen 3 hours ago
United Kingdom


I am trying to determine whether batch correction is necessary before performing GSVA.

I have a bulk RNA-seq dataset that includes two batches due to library preparation performed on separate occasions. My conditions of interest are present in both batches. For PCA, I applied batch correction using the removeBatchEffect function from limma. Should I use this log-transformed, normalized, batch-corrected expression matrix as input for GSVA? Alternatively, is it possible to run GSVA on the log-transformed normalized expression matrix before batch correction and then include the batch variable in the differential expression model? Or does the ranking system used by GSVA internally account for batch effects?

Additionally, could you provide any recommendations on which differential expression analysis tool to use (e.g. limma, DESeq2, edgeR)? Do these tools perform similarly in this context?

Many thanks for your advice.

Best wishes,

GSVA GSVAdata RNAseq • 641 views
Entering edit mode
Robert Castelo ★ 3.3k
Last seen 6 days ago
Barcelona/Universitat Pompeu Fabra

Hi Lucy,

Sorry for the delay in getting back to you. GSVA does not do anything specifically to deal with batch effects, and they may affect the output of GSVA. What I would recommend you to do, is to input normalized and log-transformed expression values to GSVA and then explore through PCA or MDS plots the extent to what the batch effect you observed at gene level is also affecting GSVA enrichment scores at pathway level.

Depending on whether the batch effect is present at pathway level, and on what you want to do with the GSVA enrichment scores, you will have to take a decision about what to do next. For instance, if you want to do a differential expression analysis at pathway level (see section 6.2 of the GSVA vignette), and the batch effect is present, you could use limma and include the batch indicator variable in your design matrix to adjust for it. As illustrated in section 6.2 of the GSVA vignette, one advantage of using limma, is that you can use limma-trend to exploit the fact that GSVA enrichment scores have higher precision for larger gene sets.



Entering edit mode

Great, thank you. Just to clarify - do you recommend against using batch-corrected counts as input for GSVA, or is this a viable alternative?


Login before adding your answer.

Traffic: 1133 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6