Question

Method and kcdf arguments in gsva package

0

Entering edit mode

thkapell ▴ 10

@tkapell-14647

Last seen 23 months ago

Helmholtz Center Munich, Germany

Hi,

I want to run GSVA analysis on my RNA-seq experiment. I have normalized the raw counts with DESeq2 and the DESeq() function and used the counts(dds, normalized=T) slot as my input. I now wonder which method I should choose, as well as how I should set the kcdf argument for the GSVA. I assume that since my normalized data are continuous, I should use kcdf='Gaussian'. But how about method=c('gsva','ssgsea','zscore','plage')? There things are less straightforward about which method is more appropriate,

Thanks in advance!

GSVA • 2.2k views

ADD COMMENT • link updated 4.7 years ago by Robert Castelo ★ 3.4k • written 4.7 years ago by thkapell ▴ 10

score 1 · Answer 1 · 2020-04-27

hi,

If your expression data is continuous, then you should be fine with the default settings of the gsva() function. If you want to understand a bit more the options, then you should check at least the manual page for gsva(), which says the following for the kcdf argument:

    kcdf: Character string denoting the kernel to use during the
          non-parametric estimation of the cumulative distribution
          function of expression levels across samples when
          ‘method="gsva"’.  By default, ‘kcdf="Gaussian"’ which is
          suitable when input expression values are continuous, such as
          microarray fluorescent units in logarithmic scale, RNA-seq
          log-CPMs, log-RPKMs or log-TPMs.  When input expression
          values are integer counts, such as those derived from RNA-seq
          experiments, then this argument should be set to
          ‘kcdf="Poisson"’.

which means that it is only relevant when method="gsva", its default value, while it says the following about the method argument:

  method: Method to employ in the estimation of gene-set enrichment
          scores per sample. By default this is set to ‘gsva’
          (Hänzelmann et al, 2013) and other options are ‘ssgsea’
          (Barbie et al, 2009), ‘zscore’ (Lee et al, 2008) or ‘plage’
          (Tomfohr et al, 2005). The latter two standardize first [...]

and at the end of the help page you can find the cited references. So, you can check those references to decide what method do you think is more appropriate for your data. The GSVA paper contains a comparison between them and the recommendation of the paper is obviously to use method="gsva". However, you can try each of them in your data and decide by yourself which one gives you more sensible results. In general terms, PLAGE and z-score are parametric and should perform well with close-to-Gaussian expression profiles, and ssGSEA and GSVA are non-parametric and more robust to departures of Gaussianity in gene expression data.