gene set enrichment
6
0
Entering edit mode
Alpesh Querer ▴ 220
@alpesh-querer-4895
Last seen 4 weeks ago
United States
Hello all, I have list of differentially expressed genes from an rna-seq analysis. Also, I have a two-column annotation file for the organism with the columns being gene and goterm. please guide me towards a bioconductor package or any other tool that I could use my list and annotation file as input and do gene set enrichment analysis. Thanks, Al [[alternative HTML version deleted]]
Annotation Organism Annotation Organism • 3.2k views
ADD COMMENT
0
Entering edit mode
@michael-salbaum-5309
Last seen 10.2 years ago
If I may chime in: GSEA does work with pre-sorted gene lists; human gene names were required last time I looked. I found ranking by fold-change alone not to be satisfactory for GSEA, as this ignores the statistics outcome of a differential expression test. Ranking by (-log10(padj))*(log2(ratio)) works a bit better but still lets fold-change outliers (high fold-change but not significant) pass through. I ended up constructing my ranked gene list in four parts: 1. Statistically significant (padj derived from either DESeq or edgeR), up-regulated ranked descending by fold change 2. Not significant, expression increased or no change, ranked ascending by p value 3. Not significant, expression decreased or no change, ranked descending by p value 4. Statistically significant, down-regulated, ranked descending by fold change Not elegant, but somewhat workable; GSEA calls have to be scrutinized at the 1-2 and 3-4 boundary. Cheers, michael J. Michael Salbaum, Ph.D. Associate Professor Pennington Biomedical Research Center Louisiana State University System 6400 Perkins Road Baton Rouge, LA 70808 (225) 763-2782 -----Original Message----- From: bioconductor-bounces@r-project.org on behalf of Steve Lianoglou Sent: Sun 12/2/2012 4:41 AM To: Gordon K Smyth Cc: Bioconductor mailing list Subject: Re: [BioC] gene set enrichment Hi Gordon, When an expert comments on a topic I'm interested in, it's hard for me not to press for more insight so I hope you don't mind, but also ... you know .. take your time :-) On Sat, Dec 1, 2012 at 8:39 PM, Gordon K Smyth <smyth@wehi.edu.au> wrote: [snip] > The term "gene set enrichment analysis" was coined by the Broad Institute: > > http://www.broadinstitute.org/gsea/ > > but you certainly can't simply give a list of genes to GSEA. It requires > complete data and is designed for microarrays rather than RNA-Seq anyway. I'm curious if you say so because GSEA doesn't account for something like length bias? The GSEA folks seem to suggest that one could do this like any other "pre-processed" GSEA analysis by simply providing a ranked list of genes (presumably by fold-change): http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/FAQ# Can_I_use_GSEA_to_analyze_SNP.2C_SAGE.2C_ChIP-Seq_or_RNA-Seq_data.3F Would you mind (briefly) elaborating a bit on why you disagree? Thanks, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Reema Singh ▴ 570
@reema-singh-4373
Last seen 10.2 years ago
Hi If you have the entrez id for your differentially expressed genes you can try GeneAnswer and ClusterProfiler. Once you have the idea (how to use these packages for GSEA) of these packages, then you would be able to utilize your go terms along with the differentailly expressed gene list for further analysis. Regards Reema Singh On Sun, Dec 2, 2012 at 5:57 AM, Alpesh Querer <alpeshq@gmail.com> wrote: > Hello all, > > I have list of differentially expressed genes from an rna-seq analysis. > Also, I have a two-column annotation file for the organism with the > columns being gene and goterm. > please guide me towards a bioconductor package or any other tool that I > could use my list and annotation file as > input and do gene set enrichment analysis. > > Thanks, > Al > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 1 minute ago
WEHI, Melbourne, Australia

Dear Al,

The obvious answer is the goseq package. However you have already received assistance with goseq:

https://stat.ethz.ch/pipermail/bioconductor/2012-February/043779.html

So if you are not trying to do a Gene Ontology analysis like goseq does, what is it that you are trying to do?

The term "gene set enrichment analysis" was coined by the Broad Institute:

   http://www.broadinstitute.org/gsea/

but you certainly can't simply give a list of genes to GSEA. It requires
complete data and is designed for microarrays rather than RNA-Seq anyway.

Best wishes
Gordon

ADD COMMENT
0
Entering edit mode
Hi Gordon, When an expert comments on a topic I'm interested in, it's hard for me not to press for more insight so I hope you don't mind, but also ... you know .. take your time :-) On Sat, Dec 1, 2012 at 8:39 PM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: [snip] > The term "gene set enrichment analysis" was coined by the Broad Institute: > > http://www.broadinstitute.org/gsea/ > > but you certainly can't simply give a list of genes to GSEA. It requires > complete data and is designed for microarrays rather than RNA-Seq anyway. I'm curious if you say so because GSEA doesn't account for something like length bias? The GSEA folks seem to suggest that one could do this like any other "pre-processed" GSEA analysis by simply providing a ranked list of genes (presumably by fold-change): http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/FAQ# Can_I_use_GSEA_to_analyze_SNP.2C_SAGE.2C_ChIP-Seq_or_RNA-Seq_data.3F Would you mind (briefly) elaborating a bit on why you disagree? Thanks, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD REPLY
0
Entering edit mode
Perhaps Dr. Smyth is referring to the uncorrected Type I inflation that can be introduced by correlation within gene sets, and which seems to remain uncorrected in typical gene set analyses? Di Wu wrote a nice paper on this, centered on the 'camera' function, which indicated that severe type I inflation could be reined in by empirically correcting for the correlation within sets. http://nar.oxfordjournals.org/content/40/17/e133 I am not an expert but I found the paper interesting, moreso in light of papers from Rick Young's lab at the Whitehead Institute which, in so many words, suggest that widespread transcription amplification by (e.g.) c-Myc may render many assumptions underlying quantile normalization invalid. It would seem that many assumptions from microarray analysis are due for re-examination if my observations are not far off base. But, I am not an expert and would love to hear from those who are. On Sun, Dec 2, 2012 at 2:41 AM, Steve Lianoglou < mailinglist.honeypot@gmail.com> wrote: > Hi Gordon, > > When an expert comments on a topic I'm interested in, it's hard for me > not to press for more insight so I hope you don't mind, but also ... > you know .. take your time :-) > > On Sat, Dec 1, 2012 at 8:39 PM, Gordon K Smyth <smyth@wehi.edu.au> wrote: > [snip] > > The term "gene set enrichment analysis" was coined by the Broad > Institute: > > > > http://www.broadinstitute.org/gsea/ > > > > but you certainly can't simply give a list of genes to GSEA. It requires > > complete data and is designed for microarrays rather than RNA-Seq anyway. > > I'm curious if you say so because GSEA doesn't account for something > like length bias? The GSEA folks seem to suggest that one could do > this like any other "pre-processed" GSEA analysis by simply providing > a ranked list of genes (presumably by fold-change): > > > http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/FA Q#Can_I_use_GSEA_to_analyze_SNP.2C_SAGE.2C_ChIP-Seq_or_RNA-Seq_data.3F > > Would you mind (briefly) elaborating a bit on why you disagree? > > Thanks, > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode

Hi Steve,

Thanks for correcting me.

I said that GSEA requires full data because this is true of the published GSEA algorithm (Subramanian et al 2005). The published GSEA approach permutes arrays and therefore requires all the data. I forgot that the GSEA software provides an alternative short-cut approach (permuting genes) that can be used when there are no replicates or one just has a ranked gene list.

The GSEA ranked gene list approach is similar in principle to the geneSetTest() function in the limma package. This approach has the disadvantage that it does not correct for intra-gene correlations, as we pointed out in our recent camera paper (thanks to Tim Triche for giving the reference).

However the same criticism (that intra-gene correlation is ignored) can be made of all GO overlap analysis softwares as well including goseq. So the only clear advantage of goseq over GSEA here is the adjustment for gene length. As compensation, GSEA-ranked-list uses the rankings of the DE genes that goseq ignores.

As you probably know, the whole area of gene set testing is a hot area of research, and the inter-relationships between the many different
approaches are still imperfectly understood. Methods like geneSetTest and GSEA-ranked-list are anti-conservative. Methods like roast, camera or classic GSEA are conservative and safe. GO overlap analyses like goseq, GOStat, DAVID etc are anti-conservative in principle but, in practice, multiple testing conservatism tends to make them conservative. Different approaches test different hypotheses and emphasise different aspects of the data.

Best wishes
Gordon

ADD REPLY
0
Entering edit mode
WATSON Mick ▴ 50
@watson-mick-5575
Last seen 9.9 years ago
United Kingdom
The function phyper() can help you with this. We also have a package called CORNA (http://corna.sourceforge.net/tutorial.html) that might help, but this needs to be updated for the latest version of R. Mick -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. -----Original Message----- From: bioconductor-bounces@r-project.org [mailto:bioconductor- bounces@r-project.org] On Behalf Of Alpesh Querer Sent: 02 December 2012 00:28 To: Bioconductor mailing list Subject: [BioC] gene set enrichment Hello all, I have list of differentially expressed genes from an rna-seq analysis. Also, I have a two-column annotation file for the organism with the columns being gene and goterm. please guide me towards a bioconductor package or any other tool that I could use my list and annotation file as input and do gene set enrichment analysis. Thanks, Al [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Alpesh Querer ▴ 220
@alpesh-querer-4895
Last seen 4 weeks ago
United States
Thanks Gordon. I was trying to install the latest version of R and goseq, but it wouldn't load anymore. do you have an insight on why this would happen? maybe i`m doing something not right. > biocLite("goseq") BioC_mirror: http://bioconductor.org Using Bioconductor version 2.11 (BiocInstaller 1.8.3), R version 2.15. Installing package(s) 'goseq' trying URL ' http://bioconductor.org/packages/2.11/bioc/bin/windows/contrib/2.15/go seq_1.10.0.zip ' Content type 'application/zip' length 751702 bytes (734 Kb) opened URL downloaded 734 Kb package ‘goseq’ successfully unpacked and MD5 sums checked > library(goseq) Loading required package: BiasedUrn Loading required package: geneLenDataBase Error in loadNamespace(i[[1L]], c(lib.loc, .libPaths())) : there is no package called ‘Biobase’ Error: package ‘geneLenDataBase’ could not be loaded > sessionInfo() R version 2.15.2 (2012-10-26) Platform: i386-w64-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] BiasedUrn_1.04 BiocInstaller_1.8.3 loaded via a namespace (and not attached): [1] BiocGenerics_0.4.0 Biostrings_2.26.2 bitops_1.0-5 BSgenome_1.26.1 DBI_0.2-5 GenomicRanges_1.10.5 IRanges_1.16.4 [8] parallel_2.15.2 RCurl_1.95-3 Rsamtools_1.10.2 RSQLite_0.11.2 rtracklayer_1.18.1 stats4_2.15.2 tools_2.15.2 [15] XML_3.95-0.1 zlibbioc_1.4.0 Thanks, Al On Sat, Dec 1, 2012 at 7:39 PM, Gordon K Smyth <smyth@wehi.edu.au> wrote: > Dear Al, > > The obvious answer is the goseq package. However you have already > received assistance with goseq: > > https://stat.ethz.ch/**pipermail/bioconductor/2012-** > February/043779.html<https: stat.ethz.ch="" pipermail="" bioconductor="" 201="" 2-february="" 043779.html=""> > > So if you are not trying to do a Gene Ontology analysis like goseq does, > what is it that you are trying to do? > > The term "gene set enrichment analysis" was coined by the Broad Institute: > > http://www.broadinstitute.org/**gsea/<http: www.broadinstitute.or="" g="" gsea=""/> > > but you certainly can't simply give a list of genes to GSEA. It requires > complete data and is designed for microarrays rather than RNA-Seq anyway. > > Best wishes > Gordon > > ----------------- original message ----------------- > [BioC] gene set enrichment > Alpesh Querer alpeshq at gmail.com > Sun Dec 2 01:27:41 CET 2012 > > Hello all, > > I have list of differentially expressed genes from an rna-seq analysis. > Also, I have a two-column annotation file for the organism with the columns > being gene and goterm. please guide me towards a bioconductor package or > any other tool that I could use my list and annotation file as input and do > gene set enrichment analysis. > > Thanks, > Al > > ______________________________**______________________________**____ ______ > The information in this email is confidential and inte...{{dropped:10}}
ADD COMMENT
0
Entering edit mode
On 12/03/2012 09:05 AM, Alpesh Querer wrote: > Thanks Gordon. > > I was trying to install the latest version of R and goseq, but it wouldn't > load anymore. > do you have an insight on why this would happen? maybe i`m doing something > not right. > > >> biocLite("goseq") > BioC_mirror: http://bioconductor.org > Using Bioconductor version 2.11 (BiocInstaller 1.8.3), R version 2.15. > Installing package(s) 'goseq' > trying URL ' > http://bioconductor.org/packages/2.11/bioc/bin/windows/contrib/2.15/ goseq_1.10.0.zip > ' > Content type 'application/zip' length 751702 bytes (734 Kb) > opened URL > downloaded 734 Kb > > package ?goseq? successfully unpacked and MD5 sums checked > > >> library(goseq) > Loading required package: BiasedUrn > Loading required package: geneLenDataBase > Error in loadNamespace(i[[1L]], c(lib.loc, .libPaths())) : > there is no package called ?Biobase? > Error: package ?geneLenDataBase? could not be loaded geneLenDataBase (?) seems to be missing a dependency. Try biocLite("Biobase") first. Martin > > >> sessionInfo() > R version 2.15.2 (2012-10-26) > Platform: i386-w64-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C LC_TIME=English_United > States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] BiasedUrn_1.04 BiocInstaller_1.8.3 > > loaded via a namespace (and not attached): > [1] BiocGenerics_0.4.0 Biostrings_2.26.2 bitops_1.0-5 > BSgenome_1.26.1 DBI_0.2-5 GenomicRanges_1.10.5 > IRanges_1.16.4 > [8] parallel_2.15.2 RCurl_1.95-3 Rsamtools_1.10.2 > RSQLite_0.11.2 rtracklayer_1.18.1 stats4_2.15.2 > tools_2.15.2 > [15] XML_3.95-0.1 zlibbioc_1.4.0 > > > Thanks, > Al > > > > > On Sat, Dec 1, 2012 at 7:39 PM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: > >> Dear Al, >> >> The obvious answer is the goseq package. However you have already >> received assistance with goseq: >> >> https://stat.ethz.ch/**pipermail/bioconductor/2012-** >> February/043779.html<https: stat.ethz.ch="" pipermail="" bioconductor="" 20="" 12-february="" 043779.html=""> >> >> So if you are not trying to do a Gene Ontology analysis like goseq does, >> what is it that you are trying to do? >> >> The term "gene set enrichment analysis" was coined by the Broad Institute: >> >> http://www.broadinstitute.org/**gsea/<http: www.broadinstitute.="" org="" gsea=""/> >> >> but you certainly can't simply give a list of genes to GSEA. It requires >> complete data and is designed for microarrays rather than RNA-Seq anyway. >> >> Best wishes >> Gordon >> >> ----------------- original message ----------------- >> [BioC] gene set enrichment >> Alpesh Querer alpeshq at gmail.com >> Sun Dec 2 01:27:41 CET 2012 >> >> Hello all, >> >> I have list of differentially expressed genes from an rna-seq analysis. >> Also, I have a two-column annotation file for the organism with the columns >> being gene and goterm. please guide me towards a bioconductor package or >> any other tool that I could use my list and annotation file as input and do >> gene set enrichment analysis. >> >> Thanks, >> Al >> >> ______________________________**______________________________**___ _______ >> The information in this email is confidential and inte...{{dropped:10}} > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD REPLY
0
Entering edit mode
@gordon-smyth
Last seen 1 minute ago
WEHI, Melbourne, Australia

Dear Alpesh,

Please keep questions on the Bioconductor mailing list.

The error message says "there is no package called Biobase", which tells you that Biobase is required but you haven't installed it.

Best wishes
Gordon

ADD COMMENT

Login before adding your answer.

Traffic: 502 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6