I'm re-analyzing data from Yuan et al. 2018 (https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-018-0567-9) with 8 high grade glioma samples. For a particular sample, I log normalize data using the method of Lun et al. (2016) and run GSVA on a subset of cells (putative cancer cells) using some gene signatures, and in for one gene signature, I see what appear to be consistently positive values. Here's the plot:
As you can see, for the gene set 'RNA.GSC.c2' (which is composed of about 1200 genes out of 4800 used in this analysis), we have very few samples below 0. Since GSVA's rank based score is deals with genes ranked by relative expression in a dataset, I was a bit surprised by this result. Do you think this could be due to the existence of outliers with extremely low log counts?
Here are the sample means for the gene set