I would like to know what exactly should be the set of genes that go as input in the expression matrix to GSVA (not the gene set but the expression matrix). I am working particularly within the context of single-cell data, and have marker genes for two groups. I would like to find out the differential pathways between these two groups. First, I was thinking, since I have already identified a set of marker genes (about a 100 genes for each group), it would make sense to use the union of the two marker gene sets to form an expression matrix and use that as the GSVA input. But now after reading more about how GSVA/GSEA work, I feel that the entire raw set of genes (about 14000 of them) should be used to form the gene expression matrix so that the enrichment results would be stronger, and would be carried out with more appropriate background distributions.
Is my understanding correct?
It would be great if somebody could explain what exactly should be the set of genes going into the GSVA expression matrix, is it better to give a restricted list or the entire list? Does "the more the merrier" apply here?
Thanks in advance for any responses!