Why Negative value from GSVA
1
0
Entering edit mode
Yang Shi • 0
@ea61ff7a
Last seen 1 hour ago
Zheng Zhou

Dear Communities

GSVA was performed based on RNA-Seq Row Count data, which was normalized by edgeR::cpm(count, log = T). However, the enrichment score show many negative value, which was not observed when parameter method is 'ssgsea'. (1) Is it correct for the aboved method for row count?

(2) Is it correct for the negative enrichment score conducted by GSVA?

(3) Why 'ssgsea' do not get negative value?

Any suggestions would be appreciated!

gsva(expr = as.matrix(edgeR::cpm(count, log = T)),
     gset.idx.list = hallmark,
     method = 'gsva',
     kcdf = 'Gaussian',
     mx.diff = T) %>%
  t() %>%
  as.data.frame()

gsva(expr = as.matrix(edgeR::cpm(count, log = T)),
     gset.idx.list = hallmark,
     method = 'ssgsea',
     kcdf = 'Gaussian',
     mx.diff = T) %>%
  t() %>%
  as.data.frame()
GSVA RNASeq edgeR • 235 views
ADD COMMENT
2
Entering edit mode
Robert Castelo ★ 2.9k
@rcastelo
Last seen 16 days ago
Barcelona/Universitat Pompeu Fabra

hi, i guess when you say that you are using "RNA-Seq Row Count data" you mean "RNA-Seq raw count data" in the sense that the count data is not normalized. About that, our recommendation is that you give normalized data to GSVA. In your code you seem to be using the edgeR pipeline. In that pipeline, you should do these steps to get normalized logCPM units of expression:

library(edgeR)
## assuming there is an object called 'count' that contains your matrix of "raw" counts
dge <- DGEList(count)
dge.norm <- calcNormFactors(dge)
normLogCPMs <- cpm(dge.norm, log=TRUE, normalized.lib.sizes=TRUE)
gsva_es <- gsva(normLogCPMs, gset.idx.list=hallmark)

with respect to whether you can get negative GSVA scores, this has been asked a number of times in different contexts in this forum, you can check these posts (1, 2, 3, 4). In essence, positive scores are produced by gene sets where most genes have higher values of expression in the corresponding sample and negative with lower values. If you have doubts about why a particular gene set gets positive or negative scores, plot the expression of the genes forming that gene set for that sample. The GSVA package has a shiny app that you can call with the function igsva(), which can produce such plots and help you understand this. With regards to 'ssgsea', this is a different method, so it's not surprising that it may give different results, but you can check again for a given gene set and corresponding enrichment score, whether the score reflects what you see in the expression of the genes.

ADD COMMENT
0
Entering edit mode

Thanks for your detailed reply sir! Sorry for the typo 'Row'. Actually, the raw count data is from TCGA database (STAR - Count). The Log2(FPKM + 1) was used as the input matrix for GSVA. I read some thread which suggested to use normalized count data for GSVA, although you didn't demonstrate which kind of data type is the most suitable input (GSVA on RNAseq data).

As you suggested here, the raw count should be transform DGEList object. But if the cancer didn't have control samples, how to deal with the parameter 'group' of DGEList(counts = counts, group = group)?

Thanks very much!

ADD REPLY
1
Entering edit mode

You can use log2(FPKM+1) units of expression as input matrix for GSVA with default parameters. We have not assessed whether there is a best way to normalize RNA-seq data to input into GSVA. What you want to achieve with a normalization step is to remove systematic technical differences between the samples and this is a question that is independent of GSVA and you should read papers on normalizing RNA-seq data if you want to have a deeper understanding of this question. GSVA is a non-parametric method and, as such, it will be robust to small differences in expression units resulting from different normalization methods. Regarding the question about DGEList objects, please consult the documentation in the edgeR user's manual and the help page of the DGEList() function.

ADD REPLY
0
Entering edit mode

Got that sir! It's a great help to me!

ADD REPLY

Login before adding your answer.

Traffic: 199 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6