problem creating own org.Ss.eg.db
0
0
Entering edit mode
Guido Hooiveld ★ 3.9k
@guido-hooiveld-2020
Last seen 19 hours ago
Wageningen University, Wageningen, the …
Hi, Triggered by a recent comment of Herve on this list [stating that it would be relatively easy to create your own org.xx.eg.db annotation info using the function 'makeOrgPackageFromNCBI'], I decided to create my own instance of the annotation library org.Ss.eg.db. Reason for this is that after the latest BioC release in October 2011, NCBI has made available a major update on annotation info for pig which I already would like to make use of (to be precise, scrofa10.2 has been released earlier this year http://www.ncbi.nlm.nih.gov/mapview/stats/B uildStats.cgi?taxid=9823&build=4&ver=1). However, by doing so some issues arose: - my instance of the org.Ss.eg database is apparently incomplete; some fields are dropped when creating the db, and I also noticed this when comparing the content of the 'official' BioC-provided org.db with that of mine (KEGG info seems to be lacking). Also an error is reported when listing the content of my org.db (RefSeq 2 EG mappings are not included). However, with respect to e.g. Gene Ontology mappings my instance of the org.db seems to be more complete, since more genes do have an GO mapping now (33506 out of 33506 vs 5730 out of 34804). However, I don't fully trust this because of the before-mentioned dropping of fields. More/complete output below. - during the creation of the db, some GO terms are apparently too new. Would it somehow be possible to also include these 'too new' terms in the org.db? Any feedback would be appreciated. Thanks, Guido > library(AnnotationDbi) Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material. To view, type 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")' and for packages 'citation("pkgname")'. > makeOrgPackageFromNCBI(version = "0.1", + author = "Guido Hooiveld <guido.hooiveld@wur.nl>", + maintainer = "Guido Hooiveld <guido.hooiveld@wur.nl>", + outputDir = ".", + tax_id = "9823", + genus = "Sus", + species = "scrofaGH") Loading required package: RSQLite Loading required package: DBI Loading required package: GO.db Getting data for gene2pubmed.gz Loading required package: RCurl Loading required package: bitops Populating gene2pubmed table: table gene2pubmed filled Getting data for gene2accession.gz Populating gene2accession table: table gene2accession filled Getting data for gene2refseq.gz Populating gene2refseq table: table gene2refseq filled Getting data for gene2unigene Populating gene2unigene table: table gene2unigene filled Getting data for gene_info.gz Populating gene_info table: table gene_info filled Getting data for gene2go.gz Populating gene2go table: Getting blast2GO data as a substitute for gene2go table metadata filled table map_metadata filled table gene2go filled table metadata filled table map_metadata filled Populating genes table: genes table filled Populating gene_info_temp table: gene_info_temp table filled Populating alias table: alias table filled Populating chromosomes table: chromosomes table filled Populating pubmed table: pubmed table filled Populating refseq table: refseq table filled Populating accessions table: accessions table filled Populating unigene table: unigene table filled Dropping GO IDs that are too new for the current GO.db Dropping GO IDs that are too new for the current GO.db Dropping GO IDs that are too new for the current GO.db Populating go_bp table: go_bp table filled Populating go_mf table: go_mf table filled Populating go_cc table: go_cc table filled Populating go_bp_all table: go_bp_all table filled Populating go_mf_all table: go_mf_all table filled Populating go_cc_all table: go_cc_all table filled dropping table gene2pubmeddropping table gene2accessiondropping table gene2refseqdropping table gene2unigenedropping table gene_infodropping table gene2go SELECT count(DISTINCT g.gene_id) FROM gene_info AS t, genes as g WHERE t._id=g._id AND t.gene_name NOT NULL SELECT count(DISTINCT g.gene_id) FROM gene_info AS t, genes as g WHERE t._id=g._id AND t.symbol NOT NULL SELECT count(DISTINCT t.symbol) FROM gene_info AS t, genes as g WHERE t._id=g._id AND t.symbol NOT NULL SELECT count(DISTINCT g.gene_id) FROM chromosomes AS t, genes as g WHERE t._id=g._id AND t.chromosome NOT NULL SELECT count(DISTINCT g.gene_id) FROM refseq AS t, genes as g WHERE t._id=g._id AND t.accession NOT NULL SELECT count(DISTINCT t.accession) FROM refseq AS t, genes as g WHERE t._id=g._id AND t.accession NOT NULL SELECT count(DISTINCT g.gene_id) FROM unigene AS t, genes as g WHERE t._id=g._id AND t.unigene_id NOT NULL SELECT count(DISTINCT t.unigene_id) FROM unigene AS t, genes as g WHERE t._id=g._id AND t.unigene_id NOT NULL SELECT count(DISTINCT g.gene_id) FROM accessions AS t, genes as g WHERE t._id=g._id AND t.accession NOT NULL SELECT count(DISTINCT t.accession) FROM accessions AS t, genes as g WHERE t._id=g._id AND t.accession NOT NULL SELECT count(DISTINCT g.gene_id) FROM alias AS t, genes as g WHERE t._id=g._id AND t.alias_symbol NOT NULL table map_counts filled Creating package in ./org.SscrofaGH.eg.db [1] TRUE <<content of="" my="" instance="" of="" org.ss.eg.db="">> > library(org. SscrofaGH.eg.db) Loading required package: AnnotationDbi Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material. To view, type 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")' and for packages 'citation("pkgname")'. Loading required package: DBI > org.SscrofaGH.eg.db OrgDb object: | BL2GOSOURCEDATE: Tue Feb 28 12:50:25 2012 | BL2GOSOURCENAME: blast2GO | BL2GOSOURCEURL: http://www.blast2go.de/ | DBSCHEMAVERSION: 2.1 | DBSCHEMA: ORGANISM_DB | ORGANISM: Sus scrofaGH | SPECIES: Sus ScrofaGH | CENTRALID: EG | TAXID: 9823 | EGSOURCEDATE: Tue Feb 28 12:50:27 2012 | EGSOURCENAME: Entrez Gene | EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA | GOSOURCEDATE: 20110910 | GOSOURCENAME: Gene Ontology | GOSOURCEURL: ftp://ftp.geneontology.org/pub/go/godata | GOEGSOURCEDATE: Tue Feb 28 12:50:27 2012 | GOEGSOURCENAME: Entrez Gene | GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA | Db type: OrgDb | package: AnnotationDbi > org. SscrofaGH.eg() Quality control information for org. SscrofaGH.eg: This package has the following mappings: org.SscrofaGH.egALIAS2EG has 33506 mapped keys (of 33506 keys) org.SscrofaGH.egCHR has 33506 mapped keys (of 33506 keys) org.SscrofaGH.egGENENAME has 33506 mapped keys (of 33506 keys) org.SscrofaGH.egGO has 33506 mapped keys (of 33506 keys) org.SscrofaGH.egGO2ALLEGS has 33506 mapped keys (of 10755 keys) org.SscrofaGH.egGO2EG has 33506 mapped keys (of 7256 keys) org.SscrofaGH.egREFSEQ has 33506 mapped keys (of 33506 keys) Error in get(mapname) : object 'org.SscrofaGH.egREFSEQ2EG' not found <<content of="" original,="" bioc-provided="" org.ss.eg.db)=""> library(org.Ss.eg.db) > org.Ss.eg.db OrgDb object: | DBSCHEMAVERSION: 2.1 | Db type: OrgDb | package: AnnotationDbi | DBSCHEMA: PIG_DB | ORGANISM: Sus scrofa | SPECIES: Pig | EGSOURCEDATE: 2011-Sep14 | EGSOURCENAME: Entrez Gene | EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA | CENTRALID: EG | TAXID: 9823 | GOSOURCENAME: Gene Ontology | GOSOURCEURL: ftp://ftp.geneontology.org/pub/go/godatabase/archive /latest-lite/ | GOSOURCEDATE: 20110910 | GOEGSOURCEDATE: 2011-Sep14 | GOEGSOURCENAME: Entrez Gene | GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA | KEGGSOURCENAME: KEGG GENOME | KEGGSOURCEURL: ftp://ftp.genome.jp/pub/kegg/genomes | KEGGSOURCEDATE: 2011-Mar15 | BL2GOSOURCENAME: blast2GO | BL2GOSOURCEURL: http://www.blast2go.de/ | BL2GOSOURCEDATE: 2011-Mar2 > org.Ss.eg Quality control information for org.Ss.eg: This package has the following mappings: org.Ss.egACCNUM has 24639 mapped keys (of 34084 keys) org.Ss.egACCNUM2EG has 74012 mapped keys (of 74012 keys) org.Ss.egALIAS2EG has 29916 mapped keys (of 29916 keys) org.Ss.egCHR has 33656 mapped keys (of 34084 keys) org.Ss.egENZYME has 1657 mapped keys (of 34084 keys) org.Ss.egENZYME2EG has 818 mapped keys (of 818 keys) org.Ss.egGENENAME has 34084 mapped keys (of 34084 keys) org.Ss.egGO has 5730 mapped keys (of 34084 keys) org.Ss.egGO2ALLEGS has 11689 mapped keys (of 11689 keys) org.Ss.egGO2EG has 8215 mapped keys (of 8215 keys) org.Ss.egPATH has 4458 mapped keys (of 34084 keys) org.Ss.egPATH2EG has 225 mapped keys (of 225 keys) org.Ss.egPMID has 10966 mapped keys (of 34084 keys) org.Ss.egPMID2EG has 3938 mapped keys (of 3938 keys) org.Ss.egREFSEQ has 24384 mapped keys (of 34084 keys) org.Ss.egREFSEQ2EG has 53138 mapped keys (of 53138 keys) org.Ss.egSYMBOL has 34084 mapped keys (of 34084 keys) org.Ss.egSYMBOL2EG has 28138 mapped keys (of 28138 keys) org.Ss.egUNIGENE has 8798 mapped keys (of 34084 keys) org.Ss.egUNIGENE2EG has 8912 mapped keys (of 8912 keys) org.Ss.egUNIPROT has 6660 mapped keys (of 34084 keys) Additional Information about this package: DB schema: PIG_DB DB schema version: 2.1 Organism: Sus scrofa Date for NCBI data: 2011-Sep14 Date for GO data: 20110910 Date for KEGG data: 2011-Mar15 > sessionInfo() <<session when="" creating="" org.db="">> R version 2.14.0 (2011-10-31) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] RCurl_1.9-5 bitops_1.0-4.1 GO.db_2.6.1 [4] RSQLite_0.11.1 DBI_0.2-5 AnnotationDbi_1.16.11 [7] Biobase_2.14.0 loaded via a namespace (and not attached): [1] IRanges_1.12.5 tools_2.14.0 > > sessionInfo() <<session when="" comparing="" the="" 2="" org.dbs="">> R version 2.14.0 (2011-10-31) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] org.Ss.eg.db_2.6.4 org.SscrofaGH.eg.db_1.0 RSQLite_0.11.1 [4] DBI_0.2-5 AnnotationDbi_1.16.11 Biobase_2.14.0 loaded via a namespace (and not attached): [1] IR Gr, Guido --------------------------------------------------------- Guido Hooiveld, PhD Nutrition, Metabolism & Genomics Group Division of Human Nutrition Wageningen University Biotechnion, Bomenweg 2 NL-6703 HD Wageningen the Netherlands tel: (+)31 317 485788 fax: (+)31 317 483342 email: guido.hooiveld@wur.nl internet: http://nutrigene.4t.com http://scholar.google.com/citations?user=qFHaMnoAAAAJ http://www.researcherid.com/rid/F-4912-2010 [[alternative HTML version deleted]]
Annotation GO Annotation GO • 1.1k views
ADD COMMENT

Login before adding your answer.

Traffic: 673 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6