GSVA on gene sets with 2 genes
1
0
Entering edit mode
@endre-sebestyen-2707
Last seen 8 months ago
Hungary, Budapest

I'm using GSVA to get an overall activity/expression pattern of various gene sets using RNA-seq data. This is generally OK, but in my case, about half of the gene sets have only two genes, and I definitely don't want to discard them. As the GSVA help documentation recommends a minimum of 5 genes in the set, I was wondering if I'm getting useful/meaningful results at all for these very small gene sets. What do you think? If GSVA is not good in this case, are there any alternative methods that might be useful or should I just go with a summary of normalized TPM values for example?

Thanks for any suggestion!

GSVA gene sets RNA-seq • 526 views
0
Entering edit mode
Robert Castelo ★ 2.9k
@rcastelo
Last seen 20 hours ago
Barcelona/Universitat Pompeu Fabra

hi Endre!!! :)

the problem with two genes is that the summary measure of expression is not going to be very robust with respect to the function you are representing with these two genes. so, you can go on and do the calculations but be cautions when interpreting the results. try to also to do some graphical exploration of the gsva scores to check whether they are sensible with respect to the genes inside the gene set.

cheers,

robert.

0
Entering edit mode

Hi Robert!

Yes, they do look a bit problematic. Just checking the distribution of GSVA scores for sets with two genes, shows that most of them have a value close to -1 or 1, but nothing in between.

I've seen that you had some other suggestions for summarizing expression valus, in this post.

mean, z-score or first right-singular vector from SVD

Do you know any review paper or benchmarks on this topic? I think we are going to try a few, besides various options in GSVA.

0
Entering edit mode

We compared z-scores, first right-singular vector and ssGSEA in the GSVA paper and you will find them implemented in the GSVA package via the method argument. I've seen quite a few papers on benchmarking GSEA methods but I don't think they benchmark against having two genes per gene set, so I'm not sure how useful those papers may be for you.