hi, i guess when you say that you are using "RNA-Seq Row Count data" you mean "RNA-Seq raw count data" in the sense that the count data is not normalized. About that, our recommendation is that you give normalized data to GSVA. In your code you seem to be using the edgeR pipeline. In that pipeline, you should do these steps to get normalized logCPM units of expression:
## assuming there is an object called 'count' that contains your matrix of "raw" counts
dge <- DGEList(count)
dge.norm <- calcNormFactors(dge)
normLogCPMs <- cpm(dge.norm, log=TRUE, normalized.lib.sizes=TRUE)
gsva_es <- gsva(normLogCPMs, gset.idx.list=hallmark)
with respect to whether you can get negative GSVA scores, this has been asked a number of times in different contexts in this forum, you can check these posts (1, 2, 3, 4). In essence, positive scores are produced by gene sets where most genes have higher values of expression in the corresponding sample and negative with lower values. If you have doubts about why a particular gene set gets positive or negative scores, plot the expression of the genes forming that gene set for that sample. The GSVA package has a shiny app that you can call with the function
igsva(), which can produce such plots and help you understand this. With regards to 'ssgsea', this is a different method, so it's not surprising that it may give different results, but you can check again for a given gene set and corresponding enrichment score, whether the score reflects what you see in the expression of the genes.