GenomicFeatures: makeGeneDbFromBiomart()
1
0
Entering edit mode
Guido Hooiveld ★ 3.9k
@guido-hooiveld-2020
Last seen 13 hours ago
Wageningen University, Wageningen, the …
I noticed that the library GenomicFeatures provides a set of very powerful functions to create databases with transcript-centered annotations from e.g. the BioMart database (makeTranscriptDbFromBiomart). I was wondering whether a function could be added that will allow the build of a gene-centered annotation database? E.g: 'makeGeneDbFromBiomart()' and/or 'makeGeneDb'. I am asking because I would like to easily retrieve AND store the annotation info of all Ensembl mouse genes. I already had a look at the source code to see whether I could modify some parts of the code myselves to create such function, but to me the code is too complicated to feel comfortable adapting it, but i have the *feeling* that this is rather straight-forward for the more knowledgable R-gurus, hence my question. Thanks in advance for considering, Guido ------------------------------------------------ Guido Hooiveld, PhD Nutrition, Metabolism & Genomics Group Division of Human Nutrition Wageningen University Biotechnion, Bomenweg 2 NL-6703 HD Wageningen the Netherlands tel: (+)31 317 485788 fax: (+)31 317 483342 email: guido.hooiveld@wur.nl<mailto:guido.hooiveld@wur.nl> internet: http://nutrigene.4t.com<http: nutrigene.4t.com=""/> http://www.researcherid.com/rid/F-4912-2010 [[alternative HTML version deleted]]
Annotation biomaRt GenomicFeatures Annotation biomaRt GenomicFeatures • 1.1k views
ADD COMMENT
0
Entering edit mode
Marc Carlson ★ 7.2k
@marc-carlson-2264
Last seen 7.7 years ago
United States
Hi Guido, If you just want gene information, then the makeTranscriptDbFromBiomart() function should already give you gene IDs affiliated with the transcripts along with grouping information in convenient GRangesList objects. Yes, this database is focused on the transcripts and their components, but it is not meant to be isolated from proper gene IDs. And if you want to then link that information to more classic gene-centric annotations then you might want to look at something like the org.Hs.eg.db package (which includes IDs for ensembl IDs). Using these two resources together, our hope was that it should be possible to do a large number of meaningful things. So what specifically was it that you needed to do? Marc On 03/01/2011 02:56 AM, Hooiveld, Guido wrote: > I noticed that the library GenomicFeatures provides a set of very powerful functions to create databases with transcript-centered annotations from e.g. the BioMart database (makeTranscriptDbFromBiomart). > I was wondering whether a function could be added that will allow the build of a gene-centered annotation database? E.g: 'makeGeneDbFromBiomart()' and/or 'makeGeneDb'. > I am asking because I would like to easily retrieve AND store the annotation info of all Ensembl mouse genes. I already had a look at the source code to see whether I could modify some parts of the code myselves to create such function, but to me the code is too complicated to feel comfortable adapting it, but i have the *feeling* that this is rather straight-forward for the more knowledgable R-gurus, hence my question. > > Thanks in advance for considering, > Guido > > ------------------------------------------------ > Guido Hooiveld, PhD > Nutrition, Metabolism & Genomics Group > Division of Human Nutrition > Wageningen University > Biotechnion, Bomenweg 2 > NL-6703 HD Wageningen > the Netherlands > tel: (+)31 317 485788 > fax: (+)31 317 483342 > email: guido.hooiveld at wur.nl<mailto:guido.hooiveld at="" wur.nl=""> > internet: http://nutrigene.4t.com<http: nutrigene.4t.com=""/> > http://www.researcherid.com/rid/F-4912-2010 > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT
0
Entering edit mode
Hi Marc, Thank you for your suggestion. However, the combination of makeTranscriptDb + org.xx.eg.db packages won't work in all cases. As you likely will know, a substantial part of our array analyses is performed with models that are not- or less-well studied in biomedical research, e.g. pig or a variety of plants (medicago, tomato). As a consequence, the annotation efforts are much less well thorough and standardized compared to e.g. human, mouse or rat, and in turn the BioC annotation infrastructure for these less-standard species is (understandably) less well developped. Taking pig as an example, although an org.db package is available (org.Ss.eg.db; build Sept 2010), this doesn't (yet?) contain Ensembl- based gene information. Moreover, until very recently (end of Dec 2010) it was Ensembl that had considerably more gene annotation info on the pig genome available than NCBI. I was hoping that by having such makeGeneDbFromBiomart() function available it could save me the hassle of always going through the process of manually querying the biomart website, because a BioC-compliant, Ensembl gene-centered database could be created (and saved!). For plants basically the situation is even 'worse', by this i mean that in the case there is annotation info available, it is often limited and in such a format it is impossible for me to easily access it in BioC. I noticed the low level function makeTranscriptDb is able to create a db object from text files, hence ideal for my purpose, except that is transcript-centered. Often only gene-centered annotation info is available for plants, and then I expect I run into problems since e.g. info on splicing (required for dataframe 'splicings') is lacking. I hope you got the reasoning for my question. Regards, Guido -----Original Message----- From: bioconductor-bounces@r-project.org [mailto:bioconductor- bounces@r-project.org] On Behalf Of Marc Carlson Sent: Thursday, March 03, 2011 01:16 To: bioconductor at r-project.org Subject: Re: [BioC] GenomicFeatures: makeGeneDbFromBiomart() Hi Guido, If you just want gene information, then the makeTranscriptDbFromBiomart() function should already give you gene IDs affiliated with the transcripts along with grouping information in convenient GRangesList objects. Yes, this database is focused on the transcripts and their components, but it is not meant to be isolated from proper gene IDs. And if you want to then link that information to more classic gene- centric annotations then you might want to look at something like the org.Hs.eg.db package (which includes IDs for ensembl IDs). Using these two resources together, our hope was that it should be possible to do a large number of meaningful things. So what specifically was it that you needed to do? Marc On 03/01/2011 02:56 AM, Hooiveld, Guido wrote: > I noticed that the library GenomicFeatures provides a set of very powerful functions to create databases with transcript-centered annotations from e.g. the BioMart database (makeTranscriptDbFromBiomart). > I was wondering whether a function could be added that will allow the build of a gene-centered annotation database? E.g: 'makeGeneDbFromBiomart()' and/or 'makeGeneDb'. > I am asking because I would like to easily retrieve AND store the annotation info of all Ensembl mouse genes. I already had a look at the source code to see whether I could modify some parts of the code myselves to create such function, but to me the code is too complicated to feel comfortable adapting it, but i have the *feeling* that this is rather straight-forward for the more knowledgable R-gurus, hence my question. > > Thanks in advance for considering, > Guido > > ------------------------------------------------ > Guido Hooiveld, PhD > Nutrition, Metabolism & Genomics Group Division of Human Nutrition > Wageningen University Biotechnion, Bomenweg 2 > NL-6703 HD Wageningen > the Netherlands > tel: (+)31 317 485788 > fax: (+)31 317 483342 > email: guido.hooiveld at wur.nl<mailto:guido.hooiveld at="" wur.nl=""> > internet: http://nutrigene.4t.com<http: nutrigene.4t.com=""/> > http://www.researcherid.com/rid/F-4912-2010 > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Hi Guido, If you really want to go down the road of making a custom database to store your biomaRt annotations, then I think you will find that it is easy enough to do and can be pretty rewarding. You might find it helpful to see my slides from the "Using Databaases in R" talk at our most recent course (there are some exercises there as well). http://www.bioconductor.org/help/course- materials/2011/AdvancedRFeb2011Seattle/ Alternatively (depending on how much data you have to store and how organized you need it to be), you may just want to save your annotations in a data.frame as a local .Rda file? It all depends on your use case, whether or not a database is really called for or not. Marc On 03/03/2011 05:34 AM, Hooiveld, Guido wrote: > Hi Marc, > > Thank you for your suggestion. However, the combination of makeTranscriptDb + org.xx.eg.db packages won't work in all cases. > As you likely will know, a substantial part of our array analyses is performed with models that are not- or less-well studied in biomedical research, e.g. pig or a variety of plants (medicago, tomato). As a consequence, the annotation efforts are much less well thorough and standardized compared to e.g. human, mouse or rat, and in turn the BioC annotation infrastructure for these less-standard species is (understandably) less well developped. > Taking pig as an example, although an org.db package is available (org.Ss.eg.db; build Sept 2010), this doesn't (yet?) contain Ensembl- based gene information. Moreover, until very recently (end of Dec 2010) it was Ensembl that had considerably more gene annotation info on the pig genome available than NCBI. I was hoping that by having such makeGeneDbFromBiomart() function available it could save me the hassle of always going through the process of manually querying the biomart website, because a BioC-compliant, Ensembl gene-centered database could be created (and saved!). > > For plants basically the situation is even 'worse', by this i mean that in the case there is annotation info available, it is often limited and in such a format it is impossible for me to easily access it in BioC. I noticed the low level function makeTranscriptDb is able to create a db object from text files, hence ideal for my purpose, except that is transcript-centered. Often only gene-centered annotation info is available for plants, and then I expect I run into problems since e.g. info on splicing (required for dataframe 'splicings') is lacking. > > I hope you got the reasoning for my question. > > Regards, > Guido > > -----Original Message----- > From: bioconductor-bounces at r-project.org [mailto:bioconductor- bounces at r-project.org] On Behalf Of Marc Carlson > Sent: Thursday, March 03, 2011 01:16 > To: bioconductor at r-project.org > Subject: Re: [BioC] GenomicFeatures: makeGeneDbFromBiomart() > > Hi Guido, > > If you just want gene information, then the > makeTranscriptDbFromBiomart() function should already give you gene IDs affiliated with the transcripts along with grouping information in convenient GRangesList objects. Yes, this database is focused on the transcripts and their components, but it is not meant to be isolated from proper gene IDs. > > And if you want to then link that information to more classic gene- centric annotations then you might want to look at something like the org.Hs.eg.db package (which includes IDs for ensembl IDs). > > Using these two resources together, our hope was that it should be possible to do a large number of meaningful things. So what specifically was it that you needed to do? > > > Marc > > > On 03/01/2011 02:56 AM, Hooiveld, Guido wrote: > >> I noticed that the library GenomicFeatures provides a set of very powerful functions to create databases with transcript-centered annotations from e.g. the BioMart database (makeTranscriptDbFromBiomart). >> I was wondering whether a function could be added that will allow the build of a gene-centered annotation database? E.g: 'makeGeneDbFromBiomart()' and/or 'makeGeneDb'. >> I am asking because I would like to easily retrieve AND store the annotation info of all Ensembl mouse genes. I already had a look at the source code to see whether I could modify some parts of the code myselves to create such function, but to me the code is too complicated to feel comfortable adapting it, but i have the *feeling* that this is rather straight-forward for the more knowledgable R-gurus, hence my question. >> >> Thanks in advance for considering, >> Guido >> >> ------------------------------------------------ >> Guido Hooiveld, PhD >> Nutrition, Metabolism & Genomics Group Division of Human Nutrition >> Wageningen University Biotechnion, Bomenweg 2 >> NL-6703 HD Wageningen >> the Netherlands >> tel: (+)31 317 485788 >> fax: (+)31 317 483342 >> email: guido.hooiveld at wur.nl<mailto:guido.hooiveld at="" wur.nl=""> >> internet: http://nutrigene.4t.com<http: nutrigene.4t.com=""/> >> http://www.researcherid.com/rid/F-4912-2010 >> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > >
ADD REPLY

Login before adding your answer.

Traffic: 651 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6