GSVA package normalization on gene-length ?
xanthexu • 0
Last seen 3 months ago
Hong Kong

Hi, I am trying to understand the difference of GSVA output from different normalization method. For example, both logCPM and logTPM is accepted with kcdf="Gaussian". Hover, TPM has normalized gene length while CPM hasn't . I am not sure if this will cause any difference in terms of the output result. Really appreciate if someone could help with the problem. Thanks in advance.

GSVA • 219 views
Robert Castelo ★ 2.8k
Last seen 16 hours ago
Barcelona/Universitat Pompeu Fabra

hi, the fact that logCPM and logTPM units of expression should be processed with the default argument kcdf="Gaussian" has to do with the fact that they are _continuous_ units of expression, as opposed to the discrete nature of integer counts. The default argument method="gsva" and the alternative method="ssgsea" are non-parametric, which in practice means that the results will not change much due to small fluctuations in the input values, such as those that may arise from using different normalization methods. Whether using logCPM or logTPM may lead to different results depends on the data you are analyzing and actually what are you doing downstream of GSVA. Just try the alternatives that you'd like to consider and look up whether the results really change in your analyses.


