Gene Lists and Genomes
1
0
Entering edit mode
@radhouane-aniba-4409
Last seen 7.9 years ago
Hello everyone, Well I am aware of some packages in Bioconductor that are useful for measuring the GO or KEGG gene enrichment in a given file for a given genome, GOstats, GSEA etc .. My question is : I am working with 4 differents genomes, I have gene lists for each of them, and I want for a given gene for each list for each geneome : - Extract the GO with its pvalue (what does the pvalue actually mean here, enrichment ok but how is it calculated ? ) - Extract the KEGG pathway and its pvalue as well Thanks Rad [[alternative HTML version deleted]]
GO genomes GO genomes • 1.6k views
ADD COMMENT
0
Entering edit mode
Tim Triche ★ 4.2k
@tim-triche-3561
Last seen 3.6 years ago
United States
Hi Radhouane, You can get more specific answers if you ask more specific questions. The mathematical formulation of the test(s), and therefore the meaning of your results, will depend directly on 1) the logic of the package you use to test for GO enrichment in a gene list or lists 2) the logic of the package you use to test for KEGG enrichment in a gene list or lists A concise and useful description of the logical basis for hypergeometric and binomial tests: http://great.stanford.edu/help/index.php/Statistics#What_is_the_hyperg eometric_test_formally.3F You mention GSEA simultaneously with GO/KEGG enrichment, thus perhaps it would be best if you provide examples making your question more concrete, so that others may benefit. For example, the logic behind the similarly- named GSA and GSEA procedures differs subtly. On that note, you might find the following two discussions helpful: http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/FAQ# What_is_the_difference_between_GSEA_and_an_overlap_statistic_.28hyperg eometric.29_analysis_tool.3F http://www-stat.stanford.edu/~tibs/ftp/GSA.pdf If you haven't already, you will want to read the original GSEA paper in PNAS for background. Best regards, --t On Thu, May 5, 2011 at 8:20 AM, Radhouane Aniba <aradwen@gmail.com> wrote: > Hello everyone, > > Well I am aware of some packages in Bioconductor that are useful for > measuring the GO or KEGG gene enrichment in a given file for a given > genome, > GOstats, GSEA etc .. > > My question is : I am working with 4 differents genomes, I have gene lists > for each of them, and I want for a given gene for each list for each > geneome > : > > - Extract the GO with its pvalue (what does the pvalue actually mean here, > enrichment ok but how is it calculated ? ) > - Extract the KEGG pathway and its pvalue as well > > Thanks > > Rad > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- If people do not believe that mathematics is simple, it is only because they do not realize how complicated life is. John von Neumann<http: www-groups.dcs.st-="" and.ac.uk="" ~history="" biographies="" von_neumann.html=""> [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Thanks Tim, Actually I have a predicted list of miRNA binding sites targetting specific genes in 4 genomes. What I am trying to find is the characteristics of these Genes {g} for a specific genome {G} in term of GO enrichement and KEGG enrichment. My starting point is a file with a list of ENSEMBL IDs, just that, no annotation and no scores, just the gene names. I am looking for the right package to do that, topGO for example seems to not accept only gene names, annotation is needed as well as other details, I am acually reading about all these packages that make gene enrichment analyses. Rad 2011/5/5 Tim Triche, Jr. <tim.triche@gmail.com> > Hi Radhouane, > > You can get more specific answers if you ask more specific questions. The > mathematical formulation of the test(s), and therefore the meaning of your > results, will depend directly on > > 1) the logic of the package you use to test for GO enrichment in a gene > list or lists > 2) the logic of the package you use to test for KEGG enrichment in a gene > list or lists > > A concise and useful description of the logical basis for hypergeometric > and binomial tests: > > > http://great.stanford.edu/help/index.php/Statistics#What_is_the_hype rgeometric_test_formally.3F > > You mention GSEA simultaneously with GO/KEGG enrichment, thus perhaps it > would be best if you provide examples making your question more concrete, so > that others may benefit. For example, the logic behind the similarly-named > GSA and GSEA procedures differs subtly. On that note, you might find the > following two discussions helpful: > > > http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/FA Q#What_is_the_difference_between_GSEA_and_an_overlap_statistic_.28hype rgeometric.29_analysis_tool.3F > > http://www-stat.stanford.edu/~tibs/ftp/GSA.pdf > > If you haven't already, you will want to read the original GSEA paper in > PNAS for background. > > Best regards, > > --t > > > On Thu, May 5, 2011 at 8:20 AM, Radhouane Aniba <aradwen@gmail.com> wrote: > >> Hello everyone, >> >> Well I am aware of some packages in Bioconductor that are useful for >> measuring the GO or KEGG gene enrichment in a given file for a given >> genome, >> GOstats, GSEA etc .. >> >> My question is : I am working with 4 differents genomes, I have gene lists >> for each of them, and I want for a given gene for each list for each >> geneome >> : >> >> - Extract the GO with its pvalue (what does the pvalue actually mean here, >> enrichment ok but how is it calculated ? ) >> - Extract the KEGG pathway and its pvalue as well >> >> Thanks >> >> Rad >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > > -- > If people do not believe that mathematics is simple, it is only because > they do not realize how complicated life is. > John von Neumann<http: www-groups.dcs.st-="" and.ac.uk="" %7ehistory="" biographies="" von_neumann.html=""> > > -- *Radhouane Aniba* *Bioinformatics Postdoctoral Research Scientist* *Institute for Advanced Computer Studies Center for Bioinformatics and Computational Biology* *(CBCB)* *University of Maryland, College Park MD 20742* [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
No ranks at all for the lists? If you're using (say) MEME or HOMER or the like, it would seem like the resulting ranks would come in handy. Otherwise how do you test against a null (or a model that does not depart from known biology, if you're taking a Bayesian approach)? This sounds like an interesting project! I will be interested to hear other peoples' responses. Good luck, and thank you for making the example concrete, --t On Thu, May 5, 2011 at 9:25 AM, Radhouane Aniba <aradwen@gmail.com> wrote: > Thanks Tim, > > Actually I have a predicted list of miRNA binding sites targetting specific > genes in 4 genomes. > > What I am trying to find is the characteristics of these Genes {g} for a > specific genome {G} in term of GO enrichement and KEGG enrichment. > > My starting point is a file with a list of ENSEMBL IDs, just that, no > annotation and no scores, just the gene names. > > I am looking for the right package to do that, topGO for example seems to > not accept only gene names, annotation is needed as well as other details, I > am acually reading about all these packages that make gene enrichment > analyses. > > Rad > > > 2011/5/5 Tim Triche, Jr. <tim.triche@gmail.com> > > Hi Radhouane, >> >> You can get more specific answers if you ask more specific questions. The >> mathematical formulation of the test(s), and therefore the meaning of your >> results, will depend directly on >> >> 1) the logic of the package you use to test for GO enrichment in a gene >> list or lists >> 2) the logic of the package you use to test for KEGG enrichment in a gene >> list or lists >> >> A concise and useful description of the logical basis for hypergeometric >> and binomial tests: >> >> >> http://great.stanford.edu/help/index.php/Statistics#What_is_the_hyp ergeometric_test_formally.3F >> >> You mention GSEA simultaneously with GO/KEGG enrichment, thus perhaps it >> would be best if you provide examples making your question more concrete, so >> that others may benefit. For example, the logic behind the similarly-named >> GSA and GSEA procedures differs subtly. On that note, you might find the >> following two discussions helpful: >> >> >> http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/F AQ#What_is_the_difference_between_GSEA_and_an_overlap_statistic_.28hyp ergeometric.29_analysis_tool.3F >> >> http://www-stat.stanford.edu/~tibs/ftp/GSA.pdf >> >> If you haven't already, you will want to read the original GSEA paper in >> PNAS for background. >> >> Best regards, >> >> --t >> >> >> On Thu, May 5, 2011 at 8:20 AM, Radhouane Aniba <aradwen@gmail.com>wrote: >> >>> Hello everyone, >>> >>> Well I am aware of some packages in Bioconductor that are useful for >>> measuring the GO or KEGG gene enrichment in a given file for a given >>> genome, >>> GOstats, GSEA etc .. >>> >>> My question is : I am working with 4 differents genomes, I have gene >>> lists >>> for each of them, and I want for a given gene for each list for each >>> geneome >>> : >>> >>> - Extract the GO with its pvalue (what does the pvalue actually mean >>> here, >>> enrichment ok but how is it calculated ? ) >>> - Extract the KEGG pathway and its pvalue as well >>> >>> Thanks >>> >>> Rad >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> >> >> -- >> If people do not believe that mathematics is simple, it is only because >> they do not realize how complicated life is. >> John von Neumann<http: www-groups.dcs.st-="" and.ac.uk="" %7ehistory="" biographies="" von_neumann.html=""> >> >> > > > -- > *Radhouane Aniba* > *Bioinformatics Postdoctoral Research Scientist* > *Institute for Advanced Computer Studies > Center for Bioinformatics and Computational Biology* *(CBCB)* > *University of Maryland, College Park > MD 20742* > > -- * They laughed at Columbus, they laughed at Fulton, they laughed at the Wright Brothers. But they also laughed at Bozo the Clown. Carl Sagan <http: www.humanistsofutah.org="" 1997="" brocasbrain_oct-97.html="">, Broca's Brain (1974) * [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Radhouane, As a starting point, you might want to read the vignette for the GOstats package called "Hypergeometric Tests Using GOstats": http://www.bioconductor.org/packages/2.8/bioc/vignettes/GOstats/inst/d oc/GOstatsHyperG.pdf If you are starting with Ensembl gene IDs, you can easily convert those to entrez gene IDs, and then use that with GOstats to run a set of hypergeometric tests for each organism. Once you get that done, interpreting the other comparisons you suggested sound a bit more murky because different organisms have been annotated differently depending on how well studied they are. Marc On 05/05/2011 09:25 AM, Radhouane Aniba wrote: > Thanks Tim, > > Actually I have a predicted list of miRNA binding sites targetting specific > genes in 4 genomes. > > What I am trying to find is the characteristics of these Genes {g} for a > specific genome {G} in term of GO enrichement and KEGG enrichment. > > My starting point is a file with a list of ENSEMBL IDs, just that, no > annotation and no scores, just the gene names. > > I am looking for the right package to do that, topGO for example seems to > not accept only gene names, annotation is needed as well as other details, I > am acually reading about all these packages that make gene enrichment > analyses. > > Rad > > > 2011/5/5 Tim Triche, Jr.<tim.triche at="" gmail.com=""> > >> Hi Radhouane, >> >> You can get more specific answers if you ask more specific questions. The >> mathematical formulation of the test(s), and therefore the meaning of your >> results, will depend directly on >> >> 1) the logic of the package you use to test for GO enrichment in a gene >> list or lists >> 2) the logic of the package you use to test for KEGG enrichment in a gene >> list or lists >> >> A concise and useful description of the logical basis for hypergeometric >> and binomial tests: >> >> >> http://great.stanford.edu/help/index.php/Statistics#What_is_the_hyp ergeometric_test_formally.3F >> >> You mention GSEA simultaneously with GO/KEGG enrichment, thus perhaps it >> would be best if you provide examples making your question more concrete, so >> that others may benefit. For example, the logic behind the similarly-named >> GSA and GSEA procedures differs subtly. On that note, you might find the >> following two discussions helpful: >> >> >> http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/F AQ#What_is_the_difference_between_GSEA_and_an_overlap_statistic_.28hyp ergeometric.29_analysis_tool.3F >> >> http://www-stat.stanford.edu/~tibs/ftp/GSA.pdf >> >> If you haven't already, you will want to read the original GSEA paper in >> PNAS for background. >> >> Best regards, >> >> --t >> >> >> On Thu, May 5, 2011 at 8:20 AM, Radhouane Aniba<aradwen at="" gmail.com=""> wrote: >> >>> Hello everyone, >>> >>> Well I am aware of some packages in Bioconductor that are useful for >>> measuring the GO or KEGG gene enrichment in a given file for a given >>> genome, >>> GOstats, GSEA etc .. >>> >>> My question is : I am working with 4 differents genomes, I have gene lists >>> for each of them, and I want for a given gene for each list for each >>> geneome >>> : >>> >>> - Extract the GO with its pvalue (what does the pvalue actually mean here, >>> enrichment ok but how is it calculated ? ) >>> - Extract the KEGG pathway and its pvalue as well >>> >>> Thanks >>> >>> Rad >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> >> -- >> If people do not believe that mathematics is simple, it is only because >> they do not realize how complicated life is. >> John von Neumann<http: www-groups.dcs.st-="" and.ac.uk="" %7ehistory="" biographies="" von_neumann.html=""> >> >> >
ADD REPLY
0
Entering edit mode
Does GOstats give the possibility to calculate the corrected Pvalue in addition to the enrichment Pvalue ? Sometimes in some Cytoscape plugins we find extra pvalues generally Benferroni correction, is it possible with GOstats ? Regards Radhouane 2011/5/5 Marc Carlson <mcarlson@fhcrc.org> > Hi Radhouane, > > As a starting point, you might want to read the vignette for the GOstats > package called "Hypergeometric Tests Using GOstats": > > > http://www.bioconductor.org/packages/2.8/bioc/vignettes/GOstats/inst /doc/GOstatsHyperG.pdf > > If you are starting with Ensembl gene IDs, you can easily convert those to > entrez gene IDs, and then use that with GOstats to run a set of > hypergeometric tests for each organism. Once you get that done, > interpreting the other comparisons you suggested sound a bit more murky > because different organisms have been annotated differently depending on how > well studied they are. > > > Marc > > > > On 05/05/2011 09:25 AM, Radhouane Aniba wrote: > >> Thanks Tim, >> >> Actually I have a predicted list of miRNA binding sites targetting >> specific >> genes in 4 genomes. >> >> What I am trying to find is the characteristics of these Genes {g} for a >> specific genome {G} in term of GO enrichement and KEGG enrichment. >> >> My starting point is a file with a list of ENSEMBL IDs, just that, no >> annotation and no scores, just the gene names. >> >> I am looking for the right package to do that, topGO for example seems to >> not accept only gene names, annotation is needed as well as other details, >> I >> am acually reading about all these packages that make gene enrichment >> analyses. >> >> Rad >> >> >> 2011/5/5 Tim Triche, Jr.<tim.triche@gmail.com> >> >> Hi Radhouane, >>> >>> You can get more specific answers if you ask more specific questions. >>> The >>> mathematical formulation of the test(s), and therefore the meaning of >>> your >>> results, will depend directly on >>> >>> 1) the logic of the package you use to test for GO enrichment in a gene >>> list or lists >>> 2) the logic of the package you use to test for KEGG enrichment in a gene >>> list or lists >>> >>> A concise and useful description of the logical basis for hypergeometric >>> and binomial tests: >>> >>> >>> >>> http://great.stanford.edu/help/index.php/Statistics#What_is_the_hy pergeometric_test_formally.3F >>> >>> You mention GSEA simultaneously with GO/KEGG enrichment, thus perhaps it >>> would be best if you provide examples making your question more concrete, >>> so >>> that others may benefit. For example, the logic behind the >>> similarly-named >>> GSA and GSEA procedures differs subtly. On that note, you might find the >>> following two discussions helpful: >>> >>> >>> >>> http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/ FAQ#What_is_the_difference_between_GSEA_and_an_overlap_statistic_.28hy pergeometric.29_analysis_tool.3F >>> >>> http://www-stat.stanford.edu/~tibs/ftp/GSA.pdf >>> >>> If you haven't already, you will want to read the original GSEA paper in >>> PNAS for background. >>> >>> Best regards, >>> >>> --t >>> >>> >>> On Thu, May 5, 2011 at 8:20 AM, Radhouane Aniba<aradwen@gmail.com> >>> wrote: >>> >>> Hello everyone, >>>> >>>> Well I am aware of some packages in Bioconductor that are useful for >>>> measuring the GO or KEGG gene enrichment in a given file for a given >>>> genome, >>>> GOstats, GSEA etc .. >>>> >>>> My question is : I am working with 4 differents genomes, I have gene >>>> lists >>>> for each of them, and I want for a given gene for each list for each >>>> geneome >>>> : >>>> >>>> - Extract the GO with its pvalue (what does the pvalue actually mean >>>> here, >>>> enrichment ok but how is it calculated ? ) >>>> - Extract the KEGG pathway and its pvalue as well >>>> >>>> Thanks >>>> >>>> Rad >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor@r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>> >>> -- >>> If people do not believe that mathematics is simple, it is only because >>> they do not realize how complicated life is. >>> John von Neumann< >>> http://www-groups.dcs.st- and.ac.uk/%7Ehistory/Biographies/Von_Neumann.html >>> > >>> >>> >>> >> > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *Radhouane Aniba* *Bioinformatics Postdoctoral Research Scientist* *Institute for Advanced Computer Studies Center for Bioinformatics and Computational Biology* *(CBCB)* *University of Maryland, College Park MD 20742* [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi, On Thu, May 5, 2011 at 12:25 PM, Radhouane Aniba <aradwen at="" gmail.com=""> wrote: > Thanks Tim, > > Actually I have a predicted list of miRNA binding sites targetting specific > genes in 4 genomes. > > What I am trying to find is the characteristics of these Genes {g} for a > specific genome {G} in term of GO enrichement and KEGG enrichment. > > My starting point is a file with a list of ENSEMBL IDs, just that, no > annotation and no scores, just the gene names. > > I am looking for the right package to do that, topGO for example seems to > not accept only gene names, annotation is needed as well as other details, I > am acually reading about all these packages that make gene enrichment > analyses. It seems as if you can use GOstats for this purpose, but you'd first need to convert these IDs to entrez id (which you can do using the appropriate org.*.eg.db pacakge (org.Hs.eg.db if we're talking about human, for example). In addition to this 'target list' you have, you'll need a appropriate way to define "the universe" of gene id's to use for testing. Given those two lists (targets and universe) you'll be able to run GOstats to get GO enrichment for your targets. Incidentally, you should also be able to use GOstats for KEGG ... see the vignettes here: http://www.bioconductor.org/packages/release/bioc/html/GOstats.html -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD REPLY
0
Entering edit mode
2011/5/5 Steve Lianoglou <mailinglist.honeypot@gmail.com> > Hi, > > On Thu, May 5, 2011 at 12:25 PM, Radhouane Aniba <aradwen@gmail.com> > wrote: > > Thanks Tim, > > > > Actually I have a predicted list of miRNA binding sites targetting > specific > > genes in 4 genomes. > > > > What I am trying to find is the characteristics of these Genes {g} for a > > specific genome {G} in term of GO enrichement and KEGG enrichment. > > > > My starting point is a file with a list of ENSEMBL IDs, just that, no > > annotation and no scores, just the gene names. > > > > I am looking for the right package to do that, topGO for example seems to > > not accept only gene names, annotation is needed as well as other > details, I > > am acually reading about all these packages that make gene enrichment > > analyses. > > It seems as if you can use GOstats for this purpose, but you'd first > need to convert these IDs to entrez id (which you can do using the > appropriate org.*.eg.db pacakge (org.Hs.eg.db if we're talking about > human, for example). > > In addition to this 'target list' you have, you'll need a appropriate > way to define "the universe" of gene id's to use for testing. > What is the universe of gene ids ? what does is it means ? > > Given those two lists (targets and universe) you'll be able to run > GOstats to get GO enrichment for your targets. Incidentally, you > should also be able to use GOstats for KEGG ... see the vignettes > here: > > http://www.bioconductor.org/packages/release/bioc/html/GOstats.html > > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > -- *Radhouane Aniba* *Bioinformatics Postdoctoral Research Scientist* *Institute for Advanced Computer Studies Center for Bioinformatics and Computational Biology* *(CBCB)* *University of Maryland, College Park MD 20742* [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
On Thu, May 5, 2011 at 1:06 PM, Radhouane Aniba <aradwen at="" gmail.com=""> wrote: > > > 2011/5/5 Steve Lianoglou <mailinglist.honeypot at="" gmail.com=""> >> >> Hi, >> >> On Thu, May 5, 2011 at 12:25 PM, Radhouane Aniba <aradwen at="" gmail.com=""> >> wrote: >> > Thanks Tim, >> > >> > Actually I have a predicted list of miRNA binding sites targetting >> > specific >> > genes in 4 genomes. >> > >> > What I am trying to find is the characteristics of these Genes {g} for a >> > specific genome {G} in term of GO enrichement and KEGG enrichment. >> > >> > My starting point is a file with a list of ENSEMBL IDs, just that, no >> > annotation and no scores, just the gene names. >> > >> > I am looking for the right package to do that, topGO for example seems >> > to >> > not accept only gene names, annotation is needed as well as other >> > details, I >> > am acually reading about all these packages that make gene enrichment >> > analyses. >> >> It seems as if you can use GOstats for this purpose, but you'd first >> need to convert these IDs to entrez id (which you can do using the >> appropriate org.*.eg.db pacakge (org.Hs.eg.db if we're talking about >> human, for example). >> >> In addition to this 'target list' you have, you'll need a appropriate >> way to define "the universe" of gene id's to use for testing. > > > What is the universe of gene ids ? what does is it means ? You are supplying some set of genes that have been "picked" (you hope) non randomly. From what set of genes did you use to pick these from? That set of all possible genes you could have "picked" from is the universe -- the (size and makeup of the) universe you pick will affect your results. For instance ... were you looking at miRNA's that were expressed only in a particular cell type? If so, did you look for targets that were only expressed in that cell type, or did you look for targets from the set of all "known" genes? Why did you pick your universe in the way you did? You get the idea ... -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD REPLY

Login before adding your answer.

Traffic: 772 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6