Question

Gene set analysis: GSVA, Z-Score, and fold-changes

0

Entering edit mode

Jonathan ▴ 10

@c31cf0e5

Last seen 5 months ago

United States

I have a relatively simple experiment, from which I have Bulk RNASeq samples from two time-points ("before" & "after") and two treatments ("drug" or "vehicle"). Recently, I tried to employ pathway analysis using BioConductor's GSVA package, mainly using the 'gsva' and 'zscore' methods.

Several questions occurred to me:

Is the DGEList object's counts (DGEList$counts) the proper input for the gsva method as data? Should I use voom-transformed values instead, or something else?
Can GSVA values of two groups be compared meaningfully? I mean, can I compare the distance/ratio/fold-changes between GSVA "after" and "before", utilize emmeans, etc?
Comparing before-vs-after GSVA values difference is possible via t-test/Mann-Whitney; Is there any way to compare a "difference-of-differences", i.e., to test whether "drug after" change from "drug before" is more or less than "vehicle after" change from "vehicle before"?
When using the ZSCORE method, can I convert the Z-score values to a uniform distribution (via pnorm) and then calculate distance/ratio/fold-changes?
In their paper describing the ZSCORE method, Lee et al. (2008) suggest, after calculating the Z-score per sample, filtering the geneset while keeping only "key genes which yield most discriminative activities". As far as I could tell, this part was not implemented as part of the 'zscore' method of the gsva() function; Should it be implemented? How important is this step?
For a given pathway, a greedy search was performed to identify a subset of member genes in the pathway for which S(G) was locally maximal. We refer to this subset as the set of "condition-responsive genes" (CORGs) representing the majority of the pathway activation under the relevant conditions. To identify the CORG set, member genes were first ranked by their t-test scores, in ascending order if the average t-score among all member genes was negative, and in descending order otherwise. The CORG set G was initialized to contain only the top member gene and iteratively expanded. At each iteration, addition of the gene with the next best t-test score was considered, and the search was terminated when no addition increased the discriminative score S(G). The activity vector a of the final CORG set was regarded as the pathway activity across the samples.
What is the mainstream approach for Geneset/Pathway analysis nowadays? Is GSVA still common or the "community" has moved toward more novel approaches?

Thank you all!

GSVA RNASeq limma • 3.2k views

ADD COMMENT • link updated 2.3 years ago by Gordon Smyth 53k • written 2.4 years ago by Jonathan ▴ 10

1

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 13 hours ago

WEHI, Melbourne, Australia

The limma RNA-seq workflow RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR demonstrates gene set testing / pathway analysis using camera. camera can conduct analyses for any linear model contrasts including differences of diffferences.

ADD COMMENT • link 2.3 years ago Gordon Smyth 53k

score 3 · Accepted Answer · 2023-07-07

The recommended input for the gsva method are normalized continuous units of expression, such as log-CPM values, which you can obtain using edgeR's implementation of the TMM method by the function calcNormFactors() and then calling cpm() with arguments normalized.lib.sizes=TRUE and log=TRUE. You also may use normalized and voom-transformed values instead.
The output of gsva() with the default arguments is amenable for being analyzed with linear models. I'm not familiar with the package emmeans, but in the GSVA vignette you'll find an example using the limma package.
You may use t-test/Mann-Whitney, but for high-dimensional data they are suboptimal in comparison to packages such as limma or DESeq2. You may find exemples of factorial designs, interaction models, etc., in chapter 9 of limma user's guide.
I do not see how one can convert z-score values to a uniform distribution using the pnorm() function.
This is not implemented in GSVA. If you are interested, please file a feature request by opening an issue at the GSVA GitHub repo, giving details on your use case.
There are plenty of methods for pathway analysis and the one that may suit you best depends on what question do you want to address. You can take a look through the thousands of articles citing the GSVA package to see whether any of those is using GSVA to address a question that is similar to the one you want to tackle.