Are singscore scores comparable between gene sets?
1
1
Entering edit mode
Pietro ▴ 30
@pietro-13029
Last seen 5 months ago
Italy

Hello everyone

Regarding the package singscore, in particular the function simpleScore, that allows to score a gene expression dataset based on one or two gene sets, I was wondering if the signature scores from different gene sets are comparable.

Say I have 4 gene sets that I use to classify tumor samples to molecular subtypes, my idea is to score the gene expression dataset with each one of the 4 gene sets separately, and then compare the signature scores across the 4 gene sets for each sample.

I would like to know if the scores are comparable in absolute terms.

Here an example output (note that I run each gene set separately and then merged the results)

id  Gene_set_1  Gene_set_2  Gene_set_3  Gene_set_4
sample_1    -0.0625 -0.194  0.298   0.182
sample_2    -0.0706 -0.211  0.273   0.218
sample_3    0.0366  -0.204  0.183   0.263
sample_4    -0.0219 -0.221  0.325   0.215
sample_5    -0.0215 -0.232  0.267   0.2
sample_6    -0.00629    -0.186  0.205   0.255
sample_7    -0.0425 -0.202  0.177   0.217
sample_8    -0.0985 -0.219  0.252   0.191
sample_9    -0.0726 -0.194  0.272   0.154
sample_10   -0.0513 -0.226  0.245   0.161


Can I say for example that for sample_1 the gene sets scores ranked are: Gene_set_3 > Gene_set_4 > Gene_set_1 > Gene_set_2 ?

Thanks

PS: cross-posted to biostars

singscore GSVA GSEA subtypes geneset • 1.1k views
0
Entering edit mode

Hi Pietro,

Any luck in finding the answer ?

0
Entering edit mode

Hi,

I apologise for not having answered this question. I am the current maintainer of singscore and have just started getting notifications for questions regarding the package. To answer the previous question, singscores can be compared between genesets, however, this depends on the context.

The problem Pietro referred to in his question would require standardisation of scores across samples to ensure the dynamic range of scores for each geneset is the same. This is the case with any transcriptomic analysis, whereby, when comparing expression values across samples, standardisation is required. Genes have different dynamic ranges therefore when comparing two genes for the purpose of subtyping, it may be useful to ensure the dynamic ranges are comparable. You could either assume that the dynamic range is equivalent or you could normalise the expression (e.g. using a z-transformation).

Likewise, singscores provide a quantification of the absolute expression of genes in a geneset relative to other genes for any given sample. A perfect positive score indicates that genes in the geneset all have the highest expression within the sample. To compare scores between genesets for the purpose of subtyping, you would need to ensure that the dynamic range of scores is equivalent therefore you would need to normalise scores. If you are interested in a comparison of absolute scores, you could use the scores as they are and it should be fine.

I hope this helps you with your analysis and please do not hesitate to ask for further help.

Cheers, Dharmesh

1
Entering edit mode
Robert Castelo ★ 2.8k
@rcastelo
Last seen 22 days ago
Barcelona/Universitat Pompeu Fabra

Hi,

I can't comment on singscore, but the "gsva" default method implemented in the GSVA package, tries to make scores comparable across gene sets by first bringing gene expression profiles to a common scale, before summarizing the expression at gene set level. You can learn about the full method in the GSVA paper. However, as explained in the discussion of the paper, to ensure that this step I mentioned works well, you should have at least 10 samples in your data set.

cheers,

robert.

0
Entering edit mode

Hi Robert

Thanks for the answer. I know very well the GSVA package, I use it everyday for the same purpose. My goal was trying to do the same with the singscore package and compare the results to see how the latter performs.

Thanks anyway

Pietro