Method and kcdf arguments in gsva package
1
0
Entering edit mode
tkapell ▴ 10
@tkapell-14647
Last seen 17 months ago
University of Bonn, Germany

Hi,

I want to run GSVA analysis on my RNA-seq experiment. I have normalized the raw counts with DESeq2 and the DESeq() function and used the counts(dds, normalized=T) slot as my input. I now wonder which method I should choose, as well as how I should set the kcdf argument for the GSVA. I assume that since my normalized data are continuous, I should use kcdf='Gaussian'. But how about method=c('gsva','ssgsea','zscore','plage')? There things are less straightforward about which method is more appropriate,

GSVA • 277 views
0
Entering edit mode
Robert Castelo ★ 2.7k
@rcastelo
Last seen 12 weeks ago
Barcelona/Universitat Pompeu Fabra

hi,

If your expression data is continuous, then you should be fine with the default settings of the gsva() function. If you want to understand a bit more the options, then you should check at least the manual page for gsva(), which says the following for the kcdf argument:

    kcdf: Character string denoting the kernel to use during the
non-parametric estimation of the cumulative distribution
function of expression levels across samples when
‘method="gsva"’.  By default, ‘kcdf="Gaussian"’ which is
suitable when input expression values are continuous, such as
microarray fluorescent units in logarithmic scale, RNA-seq
log-CPMs, log-RPKMs or log-TPMs.  When input expression
values are integer counts, such as those derived from RNA-seq
experiments, then this argument should be set to
‘kcdf="Poisson"’.


which means that it is only relevant when method="gsva", its default value, while it says the following about the method argument:

  method: Method to employ in the estimation of gene-set enrichment
scores per sample. By default this is set to ‘gsva’
(Hänzelmann et al, 2013) and other options are ‘ssgsea’
(Barbie et al, 2009), ‘zscore’ (Lee et al, 2008) or ‘plage’
(Tomfohr et al, 2005). The latter two standardize first [...]


and at the end of the help page you can find the cited references. So, you can check those references to decide what method do you think is more appropriate for your data. The GSVA paper contains a comparison between them and the recommendation of the paper is obviously to use method="gsva". However, you can try each of them in your data and decide by yourself which one gives you more sensible results. In general terms, PLAGE and z-score are parametric and should perform well with close-to-Gaussian expression profiles, and ssGSEA and GSVA are non-parametric and more robust to departures of Gaussianity in gene expression data.