Question

GSVA on gene sets with 2 genes

0

Entering edit mode

Endre Sebestyén ▴ 70

@endre-sebestyen-2707

Last seen 14 months ago

Hungary, Budapest

I'm using GSVA to get an overall activity/expression pattern of various gene sets using RNA-seq data. This is generally OK, but in my case, about half of the gene sets have only two genes, and I definitely don't want to discard them. As the GSVA help documentation recommends a minimum of 5 genes in the set, I was wondering if I'm getting useful/meaningful results at all for these very small gene sets. What do you think? If GSVA is not good in this case, are there any alternative methods that might be useful or should I just go with a summary of normalized TPM values for example?

Thanks for any suggestion!

GSVA gene sets RNA-seq • 1.9k views

ADD COMMENT • link updated 4.3 years ago by Robert Castelo ★ 3.4k • written 4.3 years ago by Endre Sebestyén ▴ 70

score 0 · Answer 1 · 2020-10-15

0

Entering edit mode

Robert Castelo ★ 3.4k

@rcastelo

Last seen 2 days ago

Barcelona/Universitat Pompeu Fabra

hi Endre!!! :)

the problem with two genes is that the summary measure of expression is not going to be very robust with respect to the function you are representing with these two genes. so, you can go on and do the calculations but be cautions when interpreting the results. try to also to do some graphical exploration of the gsva scores to check whether they are sensible with respect to the genes inside the gene set.

cheers,

robert.

ADD COMMENT • link 4.3 years ago Robert Castelo ★ 3.4k

0

Entering edit mode

Hi Robert!

Yes, they do look a bit problematic. Just checking the distribution of GSVA scores for sets with two genes, shows that most of them have a value close to -1 or 1, but nothing in between.

I've seen that you had some other suggestions for summarizing expression valus, in this post.

mean, z-score or first right-singular vector from SVD

Do you know any review paper or benchmarks on this topic? I think we are going to try a few, besides various options in GSVA.

ADD REPLY • link 4.3 years ago Endre Sebestyén ▴ 70

0

Entering edit mode

We compared z-scores, first right-singular vector and ssGSEA in the GSVA paper and you will find them implemented in the GSVA package via the method argument. I've seen quite a few papers on benchmarking GSEA methods but I don't think they benchmark against having two genes per gene set, so I'm not sure how useful those papers may be for you.

ADD REPLY • link 4.3 years ago Robert Castelo ★ 3.4k