Question

Differential expression on GSVA scores using LIMMA

1

Entering edit mode

rocanja ▴ 60

@rocanja-5374

Last seen 12 months ago

Australia

I am using limma on GSVA scores to assess differential expression of gene sets (in microarray and RNAseq data). Since GSVA scores can be negative, I am wondering how limma calculates the fold changes between a negative and a positive GSVA score and how meaningful a fold change cutoff would be to define differential expression of gene sets (in addition to applying a p.value cutoff).

I am familiar with using limma on gene level data and how the fold changes are derived from log2 intensities in the case of microarray data. However, extrapolating this formula onto the scenario of negative and positive GSVA scores doesn't seem to provide any meaningful result, since they are not log transformed values to start with. I am wondering whether the GSVA scores require some kind of pre-processing before being passed onto limma, although in the GSVA vignette, scores appear to be passed onto limma without any pre-processing, as far as I can see? However, in the GSVA vignette, no fold change cutoff is applied to define differential expression for gene sets, while a fold change cutoff is applied on gene level data.

limma gsva • 4.2k views

ADD COMMENT • link updated 6.0 years ago by Robert Castelo ★ 3.4k • written 6.0 years ago by rocanja ▴ 60

score 4 · Accepted Answer · 2019-04-03

hi,

Negative values are not problematic with limma, in the same way that they are not when they derive from log CPM units in RNA-seq and you use the limma-trend pipeline, or with some microarray platforms (see this post about it).

Regarding the interpretation of the GSVA units of expression, these are scores derived from a random walk through a ranking (see Eqs. (4) and (5) from the GSVA article) and therefore, they cannot be interpreted beyond the gene set having its genes towards the top of the ranking (positive values), towards the bottom (negative values) or uniformly distributed (around 0). There are many ways in which you can summarize the expression of a set of genes in a sample, such as calculating their mean, z-score or first right-singular vector from SVD to name a few, and each of them has its own advantages and caveats. The interpretation of a difference in gene-set level summaries of expression as fold-changes is tricky because to say whether a gene set or a pathway is two-fold over-expressed you have first to define what are the units of expression at that gene or pathway level and, as far as I know, there is no independent biochemical assay that allows you to measure the expression of a gene set or a pathway, deriving a fold-change against which you could compare or calibrate the difference or fold-change estimated from a high-throughput profiling assay.

cheers,

robert.