Dear Communities
GSVA was performed based on RNA-Seq Row Count data, which was normalized by edgeR::cpm(count, log = T). However, the enrichment score show many negative value, which was not observed when parameter method is 'ssgsea'. (1) Is it correct for the aboved method for row count?
(2) Is it correct for the negative enrichment score conducted by GSVA?
(3) Why 'ssgsea' do not get negative value?
Any suggestions would be appreciated!
gsva(expr = as.matrix(edgeR::cpm(count, log = T)),
gset.idx.list = hallmark,
method = 'gsva',
kcdf = 'Gaussian',
mx.diff = T) %>%
t() %>%
as.data.frame()
gsva(expr = as.matrix(edgeR::cpm(count, log = T)),
gset.idx.list = hallmark,
method = 'ssgsea',
kcdf = 'Gaussian',
mx.diff = T) %>%
t() %>%
as.data.frame()
Thanks for your detailed reply sir! Sorry for the typo 'Row'. Actually, the raw count data is from TCGA database (STAR - Count). The Log2(FPKM + 1) was used as the input matrix for GSVA. I read some thread which suggested to use normalized count data for GSVA, although you didn't demonstrate which kind of data type is the most suitable input (GSVA on RNAseq data).
As you suggested here, the raw count should be transform DGEList object. But if the cancer didn't have control samples, how to deal with the parameter 'group' of DGEList(counts = counts, group = group)?
Thanks very much!
You can use log2(FPKM+1) units of expression as input matrix for GSVA with default parameters. We have not assessed whether there is a best way to normalize RNA-seq data to input into GSVA. What you want to achieve with a normalization step is to remove systematic technical differences between the samples and this is a question that is independent of GSVA and you should read papers on normalizing RNA-seq data if you want to have a deeper understanding of this question. GSVA is a non-parametric method and, as such, it will be robust to small differences in expression units resulting from different normalization methods. Regarding the question about
DGEList
objects, please consult the documentation in the edgeR user's manual and the help page of theDGEList()
function.Got that sir! It's a great help to me!