Question: GSEA for RNA-seq analysis
gravatar for imalumberjack
15 months ago by
imalumberjack0 wrote:

Hello everyone, 

I was wondering if anyone could offer some clarity on the appropriate GSEA settings to use with RNA-seq data? 

In brief, I have two groups (consisting of n= 17 in group 1 and n= 13 in group 2) I am interested in testing for the enrichment of a signature. 

My data has been filtered on a mean absolute deviance cutoff to exclude genes with low variance, and I've used limma (and specifically voomWithQualityWeights) to fit a linear model to my data and generate differentially expressed gene lists. 

Additionally, I'd exported the entire dataset to input as a .gct file into GSEA with a .cls phenotype file and analysed with the Signal2Noise ranking metric, but I was reading that using the GSEApreranked might be better? Is this a more valid approach? As I've read in a few places that this might inflate my p values and should only be used under certain circumstances (e.g. low numbers of replicates,

In which case, there appears to be little consensus on the best way to rank my genesets (by p value or by FC?) and I'd very much appreciate some guidance as well... 

Kind regards and many thanks, in advance, for your help!

limma gsea ttest preranked gsea • 458 views
ADD COMMENTlink modified 15 months ago by Gordon Smyth39k • written 15 months ago by imalumberjack0

I recommend either ROAST (from the limma package) or QuSAGE for gene set analysis.

ADD REPLYlink written 15 months ago by chris86380
Answer: GSEA for RNA-seq analysis
gravatar for Gordon Smyth
15 months ago by
Gordon Smyth39k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth39k wrote:

GSEA isn't a Bioconductor program. If you have questions about how to use it, then you should send them to the GSEA authors or to a GSEA forum. I will make a few comments though:

  1. I've never heard anyone claim that GSEApreranked is better than the standard GSEA methodology, so I don't know where you would have read that.
  2. I don't see how you could export the dataset from voom and input it into GSEA because voom produces precision weights and GSEA can't use precision weights.
  3. It isn't correct to filter by MAD and then do a limma analysis. You should never filter by variance before doing an empirical Bayes analysis.
  4. If you wanted to try the Bioconductor gene set enrichment functionality, then this forum would be right place.
ADD COMMENTlink modified 15 months ago • written 15 months ago by Gordon Smyth39k

1. Actually, some time ago I implemented label-permuting GSEA test in fgsea package with a difference of calculating adjusted p-values with BH method, as opposed to ad-hoc NES-based method in Broad's version, and it turned out to be too conservative: I couldn't find a dataset where there were any significan results after multiple hypothesis correction. On the other hand, pre-ranked GSEA works, however requires a caution in result interpretation.

ADD REPLYlink written 15 months ago by assaron150

You seem to be comparing your own modified permutation method to your own modified pre-ranked method, so that doesn't seem very relevant to OP's question, which was about GSEA itself.

It's already known on theoretical grounds that BH can't work with permutation methods. Same goes for any p-value correction algorithm. It's also known that the pre-ranked method doesn't control the error rate, not even remotely, unless you adjust for inter-gene correlations as do camera or QuSAGE.

ADD REPLYlink modified 15 months ago • written 15 months ago by Gordon Smyth39k

Dear Gordon,

Without wasting too much time, could you provide an easily comprehensible reference about "p-value correction algorithm can't work with permutation methods"? I am wondering if this applies to the SAM methodology.

Sorry for being out of the scope of the OP.

ADD REPLYlink written 15 months ago by SamGG190

What I mean is that all the p-value adjustment methods require some of the p-values to be very small in order to survive multiple testing adjustment when the number of gene sets is large, and getting very small p-values requires a prohibitively number of permutations.

For example, suppose you are testing the MSigDb C2 collection with about 5000 gene sets. You need the smallest p-values to be about 0.05/5000 or smaller in order to get an FDR below 0.05, and this requires 10^5 permutations. To get a worthwhile number of DE sets, the number of permutations needs to be much larger again, which is prohibitively slow. Even then, all the gene sets with the smallest p-value will be equally ranked because permutation can't resolve small p-values. It's all quite unsatisfying.

The same considerations would apply to SAM or to any permutation method, which is why SAM instead uses a FDR estimate based on the global permutation distribution of the test statistic. SAM is often applied to very small samples sizes, so there will only be a limited number of distinct permutations anyway.

The same sort of considerations also apply to my own mroast() rotation method, which is why we recommend fry() or camera() instead when dealing with large collections of gene sets.

ADD REPLYlink modified 15 months ago • written 15 months ago by Gordon Smyth39k

Gordon, could you, please, provide one or two links about p-value correction methods being incompatible with permutation tests? It'd love to read more about it.

ADD REPLYlink written 13 months ago by assaron150
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 217 users visited in the last hour