Rebuild GO.sqlite from GO.db using complete GO database
1
0
Entering edit mode
@eric-fournier-6661
Last seen 9.6 years ago
Hello, I am performing GO term enrichment analysis in my organism of interest (bos taurus) using the org.Bt.eg.db and GO.db package. However, since the GO.db package uses the "lite" version of the Gene Onthology database, all IEA (Inferred from Electronic Annotation) terms are absent. In cattle, this makes the annotation pretty barren (Over 60% of my genes have no GO annotation at all). Therefore, I am looking for ways to rebuild the GO.sqlite file used by GO.db using the full GO database. However I cannot find any indication on how to do so, either from the package source (where the file is already packaged) or from its manual. Could anyone point me in the right direction? Thank you, ________________________________________________________ Eric Fournier, B. Sc. Research Assistant in Bioinformatics Université Laval, Qc, Canada eric.fournier.4@ulaval.ca 418-656-2131 x 11465 [[alternative HTML version deleted]]
Annotation GO Organism Annotation GO Organism • 1.4k views
ADD COMMENT
0
Entering edit mode
Marc Carlson ★ 7.2k
@marc-carlson-2264
Last seen 7.7 years ago
United States
Hi Eric, At 1st your questions confused me because the GO.sqlite package does not actually record the information that you appear to think it does. That is, the GO.sqlite db is really only for storing information about the GO hierarchy itself. It does not actually know anything about which terms are associated with which genes (or whether the association was IEA or something else). That kind of information (gene to GO term associations) is actually stored in the database from your 'org.Bt.eg.db package'. And you have several options for rebuilding that DB if you need a different one. The most 'hands on' way to rebuild it, is to use the makeOrgPackage() function from the AnnotationForge package. That function will allow you to make an organism package (with populated database etc.) from a set of data.frames objects. Using that, you could easily supply your own preferred GO information for Bovine and be as liberal as you feel is appropriate. Hope that clarifies things, please let me know if you have more questions! Marc On 07/22/2014 08:31 AM, Eric Fournier wrote: > Hello, > > I am performing GO term enrichment analysis in my organism of interest (bos taurus) using the org.Bt.eg.db and GO.db package. However, since the GO.db package uses the "lite" version of the Gene Onthology database, all IEA (Inferred from Electronic Annotation) terms are absent. In cattle, this makes the annotation pretty barren (Over 60% of my genes have no GO annotation at all). Therefore, I am looking for ways to rebuild the GO.sqlite file used by GO.db using the full GO database. However I cannot find any indication on how to do so, either from the package source (where the file is already packaged) or from its manual. Could anyone point me in the right direction? > > Thank you, > ________________________________________________________ > Eric Fournier, B. Sc. > Research Assistant in Bioinformatics > Université Laval, Qc, Canada > eric.fournier.4@ulaval.ca > 418-656-2131 x 11465 > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi, it seems I was wrong about almost everything. For the sake of anyone who stumbles on this thread with a similar problem, here's what I found out: 1) As Marc points out, the Entrez ID -> GO annotations are not part of GO.db, but of the individual org.Xx.eg.db. 2) The org.Xx.eg.db mappings DO include IEA annotations. 3) The dearth of annotated genes stem from the lackluster mappings provided by the NCBI Gene database (ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ ). Only genes with GO annotations linked directly within their Gene entries are reported by org.Xx.eg.db. As such, rebuilding new libraries using the markOrgPackageFromNCBI function will not help. 4) A better, more complete species-specific mapping can be obtained directly from the Gene Onthology database: http://www.geneontology.org/page/download-annotations . However, this does not map to Entrez gene IDs, but mostly to UniProt and Ensembl IDs. I am in the process of using those mappings to solve my issue. Cheers, -Eric -----Message d'origine----- De?: bioconductor-bounces at r-project.org [mailto:bioconductor- bounces at r-project.org] De la part de Marc Carlson Envoy??: Tuesday, July 22, 2014 1:35 PM ??: bioconductor at r-project.org Objet?: Re: [BioC] Rebuild GO.sqlite from GO.db using complete GO database Hi Eric, At 1st your questions confused me because the GO.sqlite package does not actually record the information that you appear to think it does. That is, the GO.sqlite db is really only for storing information about the GO hierarchy itself. It does not actually know anything about which terms are associated with which genes (or whether the association was IEA or something else). That kind of information (gene to GO term associations) is actually stored in the database from your 'org.Bt.eg.db package'. And you have several options for rebuilding that DB if you need a different one. The most 'hands on' way to rebuild it, is to use the makeOrgPackage() function from the AnnotationForge package. That function will allow you to make an organism package (with populated database etc.) from a set of data.frames objects. Using that, you could easily supply your own preferred GO information for Bovine and be as liberal as you feel is appropriate. Hope that clarifies things, please let me know if you have more questions! Marc On 07/22/2014 08:31 AM, Eric Fournier wrote: > Hello, > > I am performing GO term enrichment analysis in my organism of interest (bos taurus) using the org.Bt.eg.db and GO.db package. However, since the GO.db package uses the "lite" version of the Gene Onthology database, all IEA (Inferred from Electronic Annotation) terms are absent. In cattle, this makes the annotation pretty barren (Over 60% of my genes have no GO annotation at all). Therefore, I am looking for ways to rebuild the GO.sqlite file used by GO.db using the full GO database. However I cannot find any indication on how to do so, either from the package source (where the file is already packaged) or from its manual. Could anyone point me in the right direction? > > Thank you, > ________________________________________________________ > Eric Fournier, B. Sc. > Research Assistant in Bioinformatics > Universit? Laval, Qc, Canada > eric.fournier.4 at ulaval.ca > 418-656-2131 x 11465 > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 761 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6