Question

pre-ranked GSEA within R? + Best DESeq2/limma-voom

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 49 minutes ago

WEHI, Melbourne, Australia

Dear Garcia and Daniel, If you must do a GSEA pre-ranked analysis, then I think the best statistic to use is a shrunken fold change, as it is the best predictor of the size of the biological effect for each gene. (Statistical significance meauses something slightly different.) The predFC() and glmFit() functions of the edgeR package have offered "predictive fold changes" for quite a few years, and they would still be my choice for this purpose. voom and DESeq2 are more recent alternatives, but I still recommend predFC() with prior.count=3 or 5 for this purpose. And predFC() will work even without biological replicates. However a GSEA pre-ranked analysis does not (and cannot) give correct p-values, regardless of the metric used, because it does not take account of inter-gene correlations. This has been shown by a number of published papers. For example, Wu et al (2012) showed that pre-ranked gene set analyses give spectactularly wrong p-values even when the inter-gene correlation is as low as 0.05. Have a look at Table 1 of: http://nar.oxfordjournals.org/content/40/17/e133 The GSEA pre-ranked analysis has similar behaviour to "geneSetTest" in this table. See the remark about GSEA in the last sentence of page 9. If your data has biological replicates, then more reliable methods are readily available. I don't know whether you want to test one gene set or the whole MSigDb collection. Either way, the mroast() or camera() methods of the limma package work seamlessly with either voom or edgeR for RNA-seq data, and either will take account inter-gene correlations. See for example the roast() and camera() tests presented in the first case study of: http://www.statsci.org/smyth/pubs/VoomPreprint.pdf (This paper will appear in Genome Biology next month.) Best wishes Gordon > Date: Fri, 17 Jan 2014 09:30:47 +0100 > From: Garcia Manteiga Jose Manuel <garciamanteiga.josemanuel at="" hsr.it=""> > To: Daniel Schmolze <bioconductor at="" schmolze.com=""> > Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> > Subject: Re: [BioC] pre-ranked GSEA within R? + Best DESeq2/limma- voom > metric? > > Hi, > I wanted the same and this is what gsea people replied to me some months ago: > > > Hi, > > Thank you for your interest in GSEA. > > R GSEA does accept ranked list as an input. > > Please note that R GSEA has not been actively maintained since 2005. > > Regards, > > > I have not gone through the R GSEA again to check how pre-ranked lists > can be accepted, but the last sentence in their message made me hold a > minute and think in other possibilities. > For the moment I am still doing as you mentioned, using the rnk file > with java, then getting back again from results of gsea to R. > > By the way, I would like to known which metric should be best to be used > for that kind of analysis when using RNA-Seq data coming from DESeq2 > analysis. log2FC? p-values? Are they considered to be weighted, as GSEA > pre-ranked names them? > Thanks > Regards > > > > On 17 Jan 2014, at 00:18, Daniel Schmolze <bioconductor at="" schmolze.com<mailto:bioconductor="" at="" schmolze.com="">> wrote: > > I want to do a GSEA entirely from within R, using genes ranked by my > own metric. At the moment I'm saving my ranked genes to a .rnk file, > then calling Broad's GSEA java program, then reading the resulting > output back into R (all I care about are the p-values in > gsea_report_for_na_pos_####.xls and gsea_report_for_na_neg_####.xls). > Cumbersome to say the least. > > As far as I can tell, the Broad GSEA R script won't accept pre- ranked > genes, but maybe I'm wrong? If not, I'm interested in other options. > I'd like to specifically stick with the Broad GSEA algorithm if > possible. ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

limma edgeR limma edgeR • 2.7k views

ADD COMMENT • link 10.3 years ago Gordon Smyth 50k