The expression matrix and gene sets for pathway analysis usually come from different sources. For GSVA/ssGSEA, how reasonable is it to filter gene sets for only the genes that are present in the expression matrix? If certain genes (or gene symbols) are not in your reference or are not being detected for technical reasons, it makes sense to remove them. It looks like it's done in the original publication:
We further filtered genes with low expression by discarding those with a mean of less than 0.5 counts per million calculated in log2 scale ... After mapping genes from an experiment to the gene set database, we ignore all gene sets with fewer than 10 genes or more than 500 genes.
And it looks like it's done in the code automatically.
On the other hand, if you remove all the non-expressed genes, wouldn't that automatically make the gene set more enriched? For example, you get a certain score if 10 of 100 genes in a gene set are highly expressed. If you remove the other 90, now all genes in that gene set are highly expressed, which should increase the score. Is it just prioritizing false positives over false negatives?