I have a relatively simple experiment, from which I have Bulk RNASeq samples from two time-points ("before" & "after") and two treatments ("drug" or "vehicle"). Recently, I tried to employ pathway analysis using BioConductor's GSVA package, mainly using the 'gsva' and 'zscore' methods.
Several questions occurred to me:
- Is the DGEList object's counts (DGEList$counts) the proper input for the gsva method as data? Should I use voom-transformed values instead, or something else?
- Can GSVA values of two groups be compared meaningfully? I mean, can I compare the distance/ratio/fold-changes between GSVA "after" and "before", utilize emmeans, etc?
- Comparing before-vs-after GSVA values difference is possible via t-test/Mann-Whitney; Is there any way to compare a "difference-of-differences", i.e., to test whether "drug after" change from "drug before" is more or less than "vehicle after" change from "vehicle before"?
- When using the ZSCORE method, can I convert the Z-score values to a uniform distribution (via pnorm) and then calculate distance/ratio/fold-changes?
- In their paper describing the ZSCORE method, Lee et al. (2008) suggest, after calculating the Z-score per sample, filtering the geneset while keeping only "key genes which yield most discriminative activities". As far as I could tell, this part was not implemented as part of the 'zscore' method of the gsva() function; Should it be implemented? How important is this step?
For a given pathway, a greedy search was performed to identify a subset of member genes in the pathway for which S(G) was locally maximal. We refer to this subset as the set of "condition-responsive genes" (CORGs) representing the majority of the pathway activation under the relevant conditions. To identify the CORG set, member genes were first ranked by their t-test scores, in ascending order if the average t-score among all member genes was negative, and in descending order otherwise. The CORG set G was initialized to contain only the top member gene and iteratively expanded. At each iteration, addition of the gene with the next best t-test score was considered, and the search was terminated when no addition increased the discriminative score S(G). The activity vector a of the final CORG set was regarded as the pathway activity across the samples.
- What is the mainstream approach for Geneset/Pathway analysis nowadays? Is GSVA still common or the "community" has moved toward more novel approaches?
Thank you all!