I'm looking for guidance on RNA-seq data pre-processing for GSVA.
The GSVA package implements several methods for computing sample-wise gene set enrichments scores:
GSVA, ssGSEA, z-score and PLAGE
From reading about these methods, it's apparent to me that GSVA, z-score, and PLAGE require library size normalization per sample, whereas ssGSEA does not. Is this correct?
On the other hand, ssGSEA requires read counts per gene to be adjusted for gene length and GC content, whereas the other methods do not. Is this correct?
GSVA standardizes gene counts by mapping them to KCDF values estimated from the data. Two different kernels are offered by GSVA for gene-wise KCDF estimation: Poisson and Guassian
To use the Gaussian kernel, count data must be log-transformed (for example log2 count per million or log2 read per kilo-base per million). For the Poisson Kernel, log transformation should not be performed but count per million may be
multiplied by 1x10^6 rounded to the nearest integer before being passed to GSVA. Is this correct?
Thank you for your assistance!