problem with makeOrgPackageFromNCBI (for Chinese hamster)

0

Entering edit mode

Guido Hooiveld ★ 4.1k

@guido-hooiveld-2020

Last seen 4 weeks ago

Wageningen University, Wageningen, the …

Hi Marc and others, I am using makeOrgPackageFromNCBI() to create an annotation package for Chinese hamster (Cricetulus griseus), but experience some problems during this process. Please see code below for details. It could be very well that I miss something obvious, so any suggestion what may cause this would be appreciated! Thanks, Guido 1) I am using R on Win7, have admin rights, and also start R through 'Run as administrator'. Why can the file 'org.Cgriseus.eg.sqlite' then not be removed? (Reason 'Permission denied'). Note: I understand this is just a warning but it may be relevant. 2a) Despite no *.db package was produced, I still tried to install the database from the directory the files were generated (i.e. D:\\org.Cgriseus.eg.db). This *seemed* to go OK, but when I check they number of mapped egids it failed at the org.Cgriseus.egREFSEQ mapping... 2b) Interestingly, when I manually load the sqlite database (that could not be removed) these org.Cgriseus.egREFSEQ mappings are present! See code at bottom. 2c) --> How to make a *.db from an *.sqlite? # Create db0 for Chinese hamster using makeOrgPackageFromNCBI() > library(AnnotationForge) > makeOrgPackageFromNCBI( + version="0.1", + maintainer="Guido Hooiveld <guido.hooiveld@wur.nl>", + author="Guido Hooiveld <guido.hooiveld@wur.nl>", + outputDir=".", + tax_id=10029, + genus="Cricetulus", + species="griseus") Loading required package: GO.db Getting data for gene2pubmed.gz Loading required package: RCurl Loading required package: bitops discarding data from other organisms Populating gene2pubmed table: table gene2pubmed filled Getting data for gene2accession.gz discarding data from other organisms Populating gene2accession table: table gene2accession filled Getting data for gene2refseq.gz discarding data from other organisms Populating gene2refseq table: table gene2refseq filled Getting data for gene2unigene discarding data from other organisms Populating gene2unigene table: table gene2unigene filled Getting data for gene_info.gz discarding data from other organisms Populating gene_info table: table gene_info filled Getting data for gene2go.gz discarding data from other organisms Populating gene2go table: Getting blast2GO data as a substitute for gene2go table metadata filled table map_metadata filled table gene2go filled table metadata filled table map_metadata filled Populating genes table: genes table filled Populating gene_info_temp table: gene_info_temp table filled Populating alias table: alias table filled Populating chromosomes table: chromosomes table filled Populating pubmed table: pubmed table filled Populating refseq table: refseq table filled Populating accessions table: accessions table filled Populating unigene table: Dropping GO IDs that are too new for the current GO.db Dropping GO IDs that are too new for the current GO.db Dropping GO IDs that are too new for the current GO.db Populating go_bp table: go_bp table filled Populating go_mf table: go_mf table filled Populating go_cc table: go_cc table filled Populating go_bp_all table: go_bp_all table filled Populating go_mf_all table: go_mf_all table filled Populating go_cc_all table: go_cc_all table filled dropping table gene2pubmeddropping table gene2accessiondropping table gene2refseqdropping table gene2unigenedropping table gene_infodropping table gene2go Making GO views SELECT count(DISTINCT g.gene_id) FROM gene_info AS t, genes as g WHERE t._id=g._id AND t.gene_name NOT NULL SELECT count(DISTINCT g.gene_id) FROM gene_info AS t, genes as g WHERE t._id=g._id AND t.symbol NOT NULL SELECT count(DISTINCT t.symbol) FROM gene_info AS t, genes as g WHERE t._id=g._id AND t.symbol NOT NULL SELECT count(DISTINCT g.gene_id) FROM chromosomes AS t, genes as g WHERE t._id=g._id AND t.chromosome NOT NULL SELECT count(DISTINCT g.gene_id) FROM refseq AS t, genes as g WHERE t._id=g._id AND t.accession NOT NULL SELECT count(DISTINCT t.accession) FROM refseq AS t, genes as g WHERE t._id=g._id AND t.accession NOT NULL SELECT count(DISTINCT g.gene_id) FROM unigene AS t, genes as g WHERE t._id=g._id AND t.unigene_id NOT NULL SELECT count(DISTINCT t.unigene_id) FROM unigene AS t, genes as g WHERE t._id=g._id AND t.unigene_id NOT NULL SELECT count(DISTINCT g.gene_id) FROM accessions AS t, genes as g WHERE t._id=g._id AND t.accession NOT NULL SELECT count(DISTINCT t.accession) FROM accessions AS t, genes as g WHERE t._id=g._id AND t.accession NOT NULL SELECT count(DISTINCT g.gene_id) FROM alias AS t, genes as g WHERE t._id=g._id AND t.alias_symbol NOT NULL table map_counts filled Creating package in ./org.Cgriseus.eg.db [1] FALSE Warning messages: 1: In .makeSimpleTable(ug, table = "unigene", con) : no values found for table unigene in this data chunk. 2: In file.remove(dbfile) : cannot remove file 'org.Cgriseus.eg.sqlite', reason 'Permission denied' > > # Now manually install files from DIR that has been generated. > > install.packages(repos=NULL, pkgs="D:\\org.Cgriseus.eg.db", type="source") * installing *source* package 'org.Cgriseus.eg.db' ... ** R ** inst ** preparing package for lazy loading ** help *** installing help indices ** building package indices ** testing if installed package can be loaded *** arch - i386 *** arch - x64 * DONE (org.Cgriseus.eg.db) > library(org.Cgriseus.eg.db) > org.Cgriseus.eg() Quality control information for org.Cgriseus.eg: This package has the following mappings: org.Cgriseus.egALIAS2EG has 25227 mapped keys (of 25227 keys) org.Cgriseus.egCHR has 25227 mapped keys (of 25227 keys) org.Cgriseus.egGENENAME has 25227 mapped keys (of 25227 keys) org.Cgriseus.egGO has 25227 mapped keys (of 25227 keys) org.Cgriseus.egGO2ALLEGS has 25227 mapped keys (of 16020 keys) org.Cgriseus.egGO2EG has 25227 mapped keys (of 12124 keys) org.Cgriseus.egREFSEQ has 25227 mapped keys (of 25227 keys) Error in get(mapname) : object 'org.Cgriseus.egREFSEQ2EG' not found > > > #load sqlite to check that REFSEQ mappings are included > CHO.db <- loadDb("org.Cgriseus.eg.sqlite") > CHO.db OrgDb object: | BL2GOSOURCEDATE: Thu Aug 22 18:47:20 2013 | BL2GOSOURCENAME: blast2GO | BL2GOSOURCEURL: http://www.blast2go.de/ | DBSCHEMAVERSION: 2.1 | DBSCHEMA: ORGANISM_DB | ORGANISM: Cricetulus griseus | SPECIES: Cricetulus griseus | CENTRALID: EG | TAXID: 10029 | EGSOURCEDATE: Thu Aug 22 18:47:24 2013 | EGSOURCENAME: Entrez Gene | EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA | GOSOURCEDATE: 20130302 | GOSOURCENAME: Gene Ontology | GOSOURCEURL: ftp://ftp.geneontology.org/pub/go/godata | GOEGSOURCEDATE: Thu Aug 22 18:47:24 2013 | GOEGSOURCENAME: Entrez Gene | GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA | Db type: OrgDb | Supporting package: AnnotationDbi > cols(CHO.db) [1] "ENTREZID" "ACCNUM" "ALIAS" "CHR" "PMID" "REFSEQ" [7] "SYMBOL" "UNIGENE" "GENENAME" "GO" "EVIDENCE" "ONTOLOGY" > > keys <- head( keys(CHO.db)) > keys [1] "100682525" "100682526" "100682527" "100682528" "100682529" "100682530" > > select(CHO.db, keys=keys, cols = c("SYMBOL","REFSEQ","UNIGENE")) ENTREZID SYMBOL REFSEQ UNIGENE 1 100682525 P53 NM_001243976 <na> 2 100682525 P53 NP_001230905 <na> 3 100682526 Tuba1c NM_001243977 <na> 4 100682526 Tuba1c NP_001230906 <na> 5 100682527 Tuba1a NM_001243978 <na> 6 100682527 Tuba1a NP_001230907 <na> 7 100682528 Tuba1b NM_001243979 <na> 8 100682528 Tuba1b NP_001230908 <na> 9 100682529 Mgat1 NM_001243980 <na> 10 100682529 Mgat1 NP_001230909 <na> 11 100682530 Plec XM_003507629 <na> 12 100682530 Plec XP_003507677 <na> Warning message: In .generateExtraRows(tab, keys, jointype) : 'select' resulted in 1:many mapping between keys and return rows > > sessionInfo() R version 3.0.1 Patched (2013-06-05 r62877) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] org.Cgriseus.eg.db_0.1 RCurl_1.95-4.1 bitops_1.0-6 GO.db_2.9.0 [5] AnnotationForge_1.2.2 org.Hs.eg.db_2.9.0 RSQLite_0.11.4 DBI_0.2-7 [9] AnnotationDbi_1.22.6 Biobase_2.20.1 BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] IRanges_1.18.3 stats4_3.0.1 tools_3.0.1 > [[alternative HTML version deleted]]

Annotation GO db0 PROcess Annotation GO db0 PROcess • 2.5k views

ADD COMMENT • link updated 12.3 years ago by Marc Carlson ★ 7.2k • written 12.3 years ago by Guido Hooiveld ★ 4.1k

0

Entering edit mode

Marc Carlson ★ 7.2k

@marc-carlson-2264

Last seen 9.4 years ago

United States

Hi Guido, I have (so far) been unable to reproduce your initial issue here. I have no issues generating this package with either release or devel. But even though I can't use your package directly myself, I am almost certain that your package is actually just fine, and that the only reason is says FALSE is because of the 2nd warning given (R will say FALSE when you call file.remove and it can't actually remove something). Now the 1st warning just means that you don't have any unigene data (and that's actually good in this case, since there are no unigenes for this critter). While the 2nd warning has to do with R feeling it is not allowed to remove the generated .sqlite file after copying it into the new package directory. I don't know why that 2nd warning is happening on Windows and I plan to investigate it, but the crucial thing is that this happens AFTER it has already generated the package. Looking down a bit farther you did find a problem with the org.Cgriseus.eg() function. Now I think that is a real bug (not a serious one, but one I intend to look into shortly), with the org.Cgriseus.eg() function. Basically your package does not have (and should not have) a org.Cgriseus.egREFSEQ2EG mapping, and yet this silly function is trying to ask about it. But that is not actually a problem that exists within your package since the offending code for that actually lives in AnnotationDbi. Now you're correct that your package does have the data that could be used for the org.Cgriseus.egREFSEQ2EG mapping, and that this data is exposed via the select method(). It is also available via the org.Cgriseus.egREFSEQ mapping. But it is still not supposed to have that specific reverse mapping (and it also does not need it since you have a revmap() method). In fact, none of the old mappings are really needed for anything. We just generated a few of them for the purposes of maintaining some backwards compatibility. And to answer your other question the package is actually "made" by just putting the database into the inst/exdata of a very minimalist package template found in AnnotationForge (you can look at in in inst/AnnDbPkg-templates/ORGANISM.DB/ if you want to see it). The template is altered slightly based on some inputs that are generated from your initial arguments so that the manual pages etc. are all matched to the source material. So really, the most complicated thing that happens (after the database is made) is actually just generating all the manual pages. If you could send me a tarball for the package that you generated, I would like to look at it and verify that there are not any peculiarities with it compared to the one that I made here. Marc On 08/22/2013 12:33 PM, Hooiveld, Guido wrote: > Hi Marc and others, > > I am using makeOrgPackageFromNCBI() to create an annotation package for Chinese hamster (Cricetulus griseus), but experience some problems during this process. Please see code below for details. It could be very well that I miss something obvious, so any suggestion what may cause this would be appreciated! > > Thanks, > Guido > > > 1) I am using R on Win7, have admin rights, and also start R through 'Run as administrator'. Why can the file 'org.Cgriseus.eg.sqlite' then not be removed? (Reason 'Permission denied'). Note: I understand this is just a warning but it may be relevant. > > 2a) Despite no *.db package was produced, I still tried to install the database from the directory the files were generated (i.e. D:\\org.Cgriseus.eg.db). This *seemed* to go OK, but when I check they number of mapped egids it failed at the org.Cgriseus.egREFSEQ mapping... > 2b) Interestingly, when I manually load the sqlite database (that could not be removed) these org.Cgriseus.egREFSEQ mappings are present! See code at bottom. > 2c) --> How to make a *.db from an *.sqlite? > > > # Create db0 for Chinese hamster using makeOrgPackageFromNCBI() >> library(AnnotationForge) >> makeOrgPackageFromNCBI( > + version="0.1", > + maintainer="Guido Hooiveld <guido.hooiveld at="" wur.nl="">", > + author="Guido Hooiveld <guido.hooiveld at="" wur.nl="">", > + outputDir=".", > + tax_id=10029, > + genus="Cricetulus", > + species="griseus") > Loading required package: GO.db > > Getting data for gene2pubmed.gz > Loading required package: RCurl > Loading required package: bitops > discarding data from other organisms > Populating gene2pubmed table: > table gene2pubmed filled > Getting data for gene2accession.gz > discarding data from other organisms > Populating gene2accession table: > table gene2accession filled > Getting data for gene2refseq.gz > discarding data from other organisms > Populating gene2refseq table: > table gene2refseq filled > Getting data for gene2unigene > discarding data from other organisms > Populating gene2unigene table: > table gene2unigene filled > Getting data for gene_info.gz > discarding data from other organisms > Populating gene_info table: > table gene_info filled > Getting data for gene2go.gz > discarding data from other organisms > Populating gene2go table: > Getting blast2GO data as a substitute for gene2go > table metadata filled > table map_metadata filled > table gene2go filled > table metadata filled > table map_metadata filled > Populating genes table: > genes table filled > Populating gene_info_temp table: > gene_info_temp table filled > Populating alias table: > alias table filled > Populating chromosomes table: > chromosomes table filled > Populating pubmed table: > pubmed table filled > Populating refseq table: > refseq table filled > Populating accessions table: > accessions table filled > Populating unigene table: > Dropping GO IDs that are too new for the current GO.db > Dropping GO IDs that are too new for the current GO.db > Dropping GO IDs that are too new for the current GO.db > Populating go_bp table: > go_bp table filled > Populating go_mf table: > go_mf table filled > Populating go_cc table: > go_cc table filled > Populating go_bp_all table: > go_bp_all table filled > Populating go_mf_all table: > go_mf_all table filled > Populating go_cc_all table: > go_cc_all table filled > dropping table gene2pubmeddropping table gene2accessiondropping table gene2refseqdropping table gene2unigenedropping table gene_infodropping table gene2go > Making GO views > > > SELECT count(DISTINCT g.gene_id) FROM gene_info AS t, genes as g WHERE t._id=g._id AND t.gene_name NOT NULL > SELECT count(DISTINCT g.gene_id) FROM gene_info AS t, genes as g WHERE t._id=g._id AND t.symbol NOT NULL > SELECT count(DISTINCT t.symbol) FROM gene_info AS t, genes as g WHERE t._id=g._id AND t.symbol NOT NULL > SELECT count(DISTINCT g.gene_id) FROM chromosomes AS t, genes as g WHERE t._id=g._id AND t.chromosome NOT NULL > SELECT count(DISTINCT g.gene_id) FROM refseq AS t, genes as g WHERE t._id=g._id AND t.accession NOT NULL > SELECT count(DISTINCT t.accession) FROM refseq AS t, genes as g WHERE t._id=g._id AND t.accession NOT NULL > SELECT count(DISTINCT g.gene_id) FROM unigene AS t, genes as g WHERE t._id=g._id AND t.unigene_id NOT NULL > SELECT count(DISTINCT t.unigene_id) FROM unigene AS t, genes as g WHERE t._id=g._id AND t.unigene_id NOT NULL > SELECT count(DISTINCT g.gene_id) FROM accessions AS t, genes as g WHERE t._id=g._id AND t.accession NOT NULL > SELECT count(DISTINCT t.accession) FROM accessions AS t, genes as g WHERE t._id=g._id AND t.accession NOT NULL > SELECT count(DISTINCT g.gene_id) FROM alias AS t, genes as g WHERE t._id=g._id AND t.alias_symbol NOT NULL > table map_counts filled > Creating package in ./org.Cgriseus.eg.db > [1] FALSE > Warning messages: > 1: In .makeSimpleTable(ug, table = "unigene", con) : > no values found for table unigene in this data chunk. > 2: In file.remove(dbfile) : > cannot remove file 'org.Cgriseus.eg.sqlite', reason 'Permission denied' >> # Now manually install files from DIR that has been generated. >> >> install.packages(repos=NULL, pkgs="D:\\org.Cgriseus.eg.db", type="source") > * installing *source* package 'org.Cgriseus.eg.db' ... > ** R > ** inst > ** preparing package for lazy loading > ** help > *** installing help indices > ** building package indices > ** testing if installed package can be loaded > *** arch - i386 > *** arch - x64 > * DONE (org.Cgriseus.eg.db) >> library(org.Cgriseus.eg.db) >> org.Cgriseus.eg() > Quality control information for org.Cgriseus.eg: > > > This package has the following mappings: > > org.Cgriseus.egALIAS2EG has 25227 mapped keys (of 25227 keys) > org.Cgriseus.egCHR has 25227 mapped keys (of 25227 keys) > org.Cgriseus.egGENENAME has 25227 mapped keys (of 25227 keys) > org.Cgriseus.egGO has 25227 mapped keys (of 25227 keys) > org.Cgriseus.egGO2ALLEGS has 25227 mapped keys (of 16020 keys) > org.Cgriseus.egGO2EG has 25227 mapped keys (of 12124 keys) > org.Cgriseus.egREFSEQ has 25227 mapped keys (of 25227 keys) > Error in get(mapname) : object 'org.Cgriseus.egREFSEQ2EG' not found >> > > >> #load sqlite to check that REFSEQ mappings are included >> CHO.db <- loadDb("org.Cgriseus.eg.sqlite") >> CHO.db > OrgDb object: > | BL2GOSOURCEDATE: Thu Aug 22 18:47:20 2013 > | BL2GOSOURCENAME: blast2GO > | BL2GOSOURCEURL: http://www.blast2go.de/ > | DBSCHEMAVERSION: 2.1 > | DBSCHEMA: ORGANISM_DB > | ORGANISM: Cricetulus griseus > | SPECIES: Cricetulus griseus > | CENTRALID: EG > | TAXID: 10029 > | EGSOURCEDATE: Thu Aug 22 18:47:24 2013 > | EGSOURCENAME: Entrez Gene > | EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA > | GOSOURCEDATE: 20130302 > | GOSOURCENAME: Gene Ontology > | GOSOURCEURL: ftp://ftp.geneontology.org/pub/go/godata > | GOEGSOURCEDATE: Thu Aug 22 18:47:24 2013 > | GOEGSOURCENAME: Entrez Gene > | GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA > | Db type: OrgDb > | Supporting package: AnnotationDbi > >> cols(CHO.db) > [1] "ENTREZID" "ACCNUM" "ALIAS" "CHR" "PMID" "REFSEQ" > [7] "SYMBOL" "UNIGENE" "GENENAME" "GO" "EVIDENCE" "ONTOLOGY" >> keys <- head( keys(CHO.db)) >> keys > [1] "100682525" "100682526" "100682527" "100682528" "100682529" "100682530" >> select(CHO.db, keys=keys, cols = c("SYMBOL","REFSEQ","UNIGENE")) > ENTREZID SYMBOL REFSEQ UNIGENE > 1 100682525 P53 NM_001243976 <na> > 2 100682525 P53 NP_001230905 <na> > 3 100682526 Tuba1c NM_001243977 <na> > 4 100682526 Tuba1c NP_001230906 <na> > 5 100682527 Tuba1a NM_001243978 <na> > 6 100682527 Tuba1a NP_001230907 <na> > 7 100682528 Tuba1b NM_001243979 <na> > 8 100682528 Tuba1b NP_001230908 <na> > 9 100682529 Mgat1 NM_001243980 <na> > 10 100682529 Mgat1 NP_001230909 <na> > 11 100682530 Plec XM_003507629 <na> > 12 100682530 Plec XP_003507677 <na> > Warning message: > In .generateExtraRows(tab, keys, jointype) : > 'select' resulted in 1:many mapping between keys and return rows >> sessionInfo() > R version 3.0.1 Patched (2013-06-05 r62877) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods base > > other attached packages: > [1] org.Cgriseus.eg.db_0.1 RCurl_1.95-4.1 bitops_1.0-6 GO.db_2.9.0 > [5] AnnotationForge_1.2.2 org.Hs.eg.db_2.9.0 RSQLite_0.11.4 DBI_0.2-7 > [9] AnnotationDbi_1.22.6 Biobase_2.20.1 BiocGenerics_0.6.0 > > loaded via a namespace (and not attached): > [1] IRanges_1.18.3 stats4_3.0.1 tools_3.0.1 > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 12.3 years ago Marc Carlson ★ 7.2k

0

Entering edit mode

Hi Marc, Sorry for the delayed reply. Meanwhile I re-ran makeOrgPackageFromNCBI() on our lunix server (after updating all packages to its latest version; initially it didn't work either...), and now the generation of the org.Cgriseus.eg.db went OK. So no complaints about removing files, etc. So this must be something Windows7 (or my machine)-specific, and is therefore less of an issue. However, the QC function gives the same error, but that is something you apparently already identified in AnnotationDbi. BTW, does this also explain the (IMO) switched numbers? org.Cgriseus.egGO2EG has 25227 mapped keys (of 12124 keys) --> actually means that 12,124 keys [EGIDs] (of in total 25,227) have a revmap (to GO)? I will send you the two organism packages that I created under Windows7 and Linux in a separate mail. Thanks for all your support! Guido Linux output: Getting data for gene2pubmed.gz Loading required package: RCurl Loading required package: bitops discarding data from other organisms Populating gene2pubmed table: <<snip> go_cc_all table filled dropping table gene2pubmeddropping table gene2accessiondropping table gene2refseqdropping table gene2unigenedropping table gene_infodropping table gene2go Making GO views SELECT count(DISTINCT g.gene_id) FROM gene_info AS t, genes as g WHERE t._id=g._id AND t.gene_name NOT NULL SELECT count(DISTINCT g.gene_id) FROM gene_info AS t, genes as g WHERE t._id=g._id AND t.symbol NOT NULL SELECT count(DISTINCT t.symbol) FROM gene_info AS t, genes as g WHERE t._id=g._id AND t.symbol NOT NULL SELECT count(DISTINCT g.gene_id) FROM chromosomes AS t, genes as g WHERE t._id=g._id AND t.chromosome NOT NULL SELECT count(DISTINCT g.gene_id) FROM refseq AS t, genes as g WHERE t._id=g._id AND t.accession NOT NULL SELECT count(DISTINCT t.accession) FROM refseq AS t, genes as g WHERE t._id=g._id AND t.accession NOT NULL SELECT count(DISTINCT g.gene_id) FROM unigene AS t, genes as g WHERE t._id=g._id AND t.unigene_id NOT NULL SELECT count(DISTINCT t.unigene_id) FROM unigene AS t, genes as g WHERE t._id=g._id AND t.unigene_id NOT NULL SELECT count(DISTINCT g.gene_id) FROM accessions AS t, genes as g WHERE t._id=g._id AND t.accession NOT NULL SELECT count(DISTINCT t.accession) FROM accessions AS t, genes as g WHERE t._id=g._id AND t.accession NOT NULL SELECT count(DISTINCT g.gene_id) FROM alias AS t, genes as g WHERE t._id=g._id AND t.alias_symbol NOT NULL table map_counts filled Creating package in ./org.Cgriseus.eg.db [1] TRUE Warning message: In .makeSimpleTable(ug, table = "unigene", con) : no values found for table unigene in this data chunk. > # QC-ing: > org.Cgriseus.eg() Quality control information for org.Cgriseus.eg: This package has the following mappings: org.Cgriseus.egALIAS2EG has 25227 mapped keys (of 25227 keys) org.Cgriseus.egCHR has 25227 mapped keys (of 25227 keys) org.Cgriseus.egGENENAME has 25227 mapped keys (of 25227 keys) org.Cgriseus.egGO has 25227 mapped keys (of 25227 keys) org.Cgriseus.egGO2ALLEGS has 25227 mapped keys (of 16020 keys) org.Cgriseus.egGO2EG has 25227 mapped keys (of 12124 keys) org.Cgriseus.egREFSEQ has 25227 mapped keys (of 25227 keys) Error in get(mapname) : object 'org.Cgriseus.egREFSEQ2EG' not found > > > sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] RCurl_1.95-4.1 bitops_1.0-6 GO.db_2.9.0 [4] AnnotationForge_1.2.2 org.Hs.eg.db_2.9.0 RSQLite_0.11.4 [7] DBI_0.2-7 AnnotationDbi_1.22.6 Biobase_2.20.1 [10] BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] IRanges_1.18.3 stats4_3.0.1 tools_3.0.1 > -----Original Message----- From: bioconductor-bounces@r-project.org [mailto:bioconductor- bounces@r-project.org] On Behalf Of Marc Carlson Sent: Friday, August 23, 2013 02:32 To: bioconductor at r-project.org Subject: Re: [BioC] problem with makeOrgPackageFromNCBI (for Chinese hamster) Hi Guido, I have (so far) been unable to reproduce your initial issue here. I have no issues generating this package with either release or devel. But even though I can't use your package directly myself, I am almost certain that your package is actually just fine, and that the only reason is says FALSE is because of the 2nd warning given (R will say FALSE when you call file.remove and it can't actually remove something). Now the 1st warning just means that you don't have any unigene data (and that's actually good in this case, since there are no unigenes for this critter). While the 2nd warning has to do with R feeling it is not allowed to remove the generated .sqlite file after copying it into the new package directory. I don't know why that 2nd warning is happening on Windows and I plan to investigate it, but the crucial thing is that this happens AFTER it has already generated the package. Looking down a bit farther you did find a problem with the org.Cgriseus.eg() function. Now I think that is a real bug (not a serious one, but one I intend to look into shortly), with the org.Cgriseus.eg() function. Basically your package does not have (and should not have) a org.Cgriseus.egREFSEQ2EG mapping, and yet this silly function is trying to ask about it. But that is not actually a problem that exists within your package since the offending code for that actually lives in AnnotationDbi. Now you're correct that your package does have the data that could be used for the org.Cgriseus.egREFSEQ2EG mapping, and that this data is exposed via the select method(). It is also available via the org.Cgriseus.egREFSEQ mapping. But it is still not supposed to have that specific reverse mapping (and it also does not need it since you have a revmap() method). In fact, none of the old mappings are really needed for anything. We just generated a few of them for the purposes of maintaining some backwards compatibility. And to answer your other question the package is actually "made" by just putting the database into the inst/exdata of a very minimalist package template found in AnnotationForge (you can look at in in inst/AnnDbPkg- templates/ORGANISM.DB/ if you want to see it). The template is altered slightly based on some inputs that are generated from your initial arguments so that the manual pages etc. are all matched to the source material. So really, the most complicated thing that happens (after the database is made) is actually just generating all the manual pages. If you could send me a tarball for the package that you generated, I would like to look at it and verify that there are not any peculiarities with it compared to the one that I made here. Marc On 08/22/2013 12:33 PM, Hooiveld, Guido wrote: > Hi Marc and others, > > I am using makeOrgPackageFromNCBI() to create an annotation package for Chinese hamster (Cricetulus griseus), but experience some problems during this process. Please see code below for details. It could be very well that I miss something obvious, so any suggestion what may cause this would be appreciated! > > Thanks, > Guido > > > 1) I am using R on Win7, have admin rights, and also start R through 'Run as administrator'. Why can the file 'org.Cgriseus.eg.sqlite' then not be removed? (Reason 'Permission denied'). Note: I understand this is just a warning but it may be relevant. > > 2a) Despite no *.db package was produced, I still tried to install the database from the directory the files were generated (i.e. D:\\org.Cgriseus.eg.db). This *seemed* to go OK, but when I check they number of mapped egids it failed at the org.Cgriseus.egREFSEQ mapping... > 2b) Interestingly, when I manually load the sqlite database (that could not be removed) these org.Cgriseus.egREFSEQ mappings are present! See code at bottom. > 2c) --> How to make a *.db from an *.sqlite? > > > # Create db0 for Chinese hamster using makeOrgPackageFromNCBI() >> library(AnnotationForge) >> makeOrgPackageFromNCBI( > + version="0.1", > + maintainer="Guido Hooiveld <guido.hooiveld at="" wur.nl="">", > + author="Guido Hooiveld <guido.hooiveld at="" wur.nl="">", > + outputDir=".", > + tax_id=10029, > + genus="Cricetulus", > + species="griseus") > Loading required package: GO.db > > Getting data for gene2pubmed.gz > Loading required package: RCurl > Loading required package: bitops > discarding data from other organisms > Populating gene2pubmed table: > table gene2pubmed filled > Getting data for gene2accession.gz > discarding data from other organisms > Populating gene2accession table: > table gene2accession filled > Getting data for gene2refseq.gz > discarding data from other organisms > Populating gene2refseq table: > table gene2refseq filled > Getting data for gene2unigene > discarding data from other organisms > Populating gene2unigene table: > table gene2unigene filled > Getting data for gene_info.gz > discarding data from other organisms > Populating gene_info table: > table gene_info filled > Getting data for gene2go.gz > discarding data from other organisms > Populating gene2go table: > Getting blast2GO data as a substitute for gene2go table metadata > filled table map_metadata filled table gene2go filled table metadata > filled table map_metadata filled Populating genes table: > genes table filled > Populating gene_info_temp table: > gene_info_temp table filled > Populating alias table: > alias table filled > Populating chromosomes table: > chromosomes table filled > Populating pubmed table: > pubmed table filled > Populating refseq table: > refseq table filled > Populating accessions table: > accessions table filled > Populating unigene table: > Dropping GO IDs that are too new for the current GO.db Dropping GO IDs > that are too new for the current GO.db Dropping GO IDs that are too > new for the current GO.db Populating go_bp table: > go_bp table filled > Populating go_mf table: > go_mf table filled > Populating go_cc table: > go_cc table filled > Populating go_bp_all table: > go_bp_all table filled > Populating go_mf_all table: > go_mf_all table filled > Populating go_cc_all table: > go_cc_all table filled > dropping table gene2pubmeddropping table gene2accessiondropping table > gene2refseqdropping table gene2unigenedropping table gene_infodropping > table gene2go Making GO views > > > SELECT count(DISTINCT g.gene_id) FROM gene_info AS t, genes as g WHERE t._id=g._id AND t.gene_name NOT NULL > SELECT count(DISTINCT g.gene_id) FROM gene_info AS t, genes as g WHERE t._id=g._id AND t.symbol NOT NULL > SELECT count(DISTINCT t.symbol) FROM gene_info AS t, genes as g WHERE t._id=g._id AND t.symbol NOT NULL > SELECT count(DISTINCT g.gene_id) FROM chromosomes AS t, genes as g WHERE t._id=g._id AND t.chromosome NOT NULL > SELECT count(DISTINCT g.gene_id) FROM refseq AS t, genes as g WHERE t._id=g._id AND t.accession NOT NULL > SELECT count(DISTINCT t.accession) FROM refseq AS t, genes as g WHERE t._id=g._id AND t.accession NOT NULL > SELECT count(DISTINCT g.gene_id) FROM unigene AS t, genes as g WHERE t._id=g._id AND t.unigene_id NOT NULL > SELECT count(DISTINCT t.unigene_id) FROM unigene AS t, genes as g WHERE t._id=g._id AND t.unigene_id NOT NULL > SELECT count(DISTINCT g.gene_id) FROM accessions AS t, genes as g WHERE t._id=g._id AND t.accession NOT NULL > SELECT count(DISTINCT t.accession) FROM accessions AS t, genes as g WHERE t._id=g._id AND t.accession NOT NULL > SELECT count(DISTINCT g.gene_id) FROM alias AS t, genes as g WHERE t._id=g._id AND t.alias_symbol NOT NULL > table map_counts filled > Creating package in ./org.Cgriseus.eg.db > [1] FALSE > Warning messages: > 1: In .makeSimpleTable(ug, table = "unigene", con) : > no values found for table unigene in this data chunk. > 2: In file.remove(dbfile) : > cannot remove file 'org.Cgriseus.eg.sqlite', reason 'Permission denied' >> # Now manually install files from DIR that has been generated. >> >> install.packages(repos=NULL, pkgs="D:\\org.Cgriseus.eg.db", type="source") > * installing *source* package 'org.Cgriseus.eg.db' ... > ** R > ** inst > ** preparing package for lazy loading > ** help > *** installing help indices > ** building package indices > ** testing if installed package can be loaded > *** arch - i386 > *** arch - x64 > * DONE (org.Cgriseus.eg.db) >> library(org.Cgriseus.eg.db) >> org.Cgriseus.eg() > Quality control information for org.Cgriseus.eg: > > > This package has the following mappings: > > org.Cgriseus.egALIAS2EG has 25227 mapped keys (of 25227 keys) > org.Cgriseus.egCHR has 25227 mapped keys (of 25227 keys) > org.Cgriseus.egGENENAME has 25227 mapped keys (of 25227 keys) > org.Cgriseus.egGO has 25227 mapped keys (of 25227 keys) > org.Cgriseus.egGO2ALLEGS has 25227 mapped keys (of 16020 keys) > org.Cgriseus.egGO2EG has 25227 mapped keys (of 12124 keys) > org.Cgriseus.egREFSEQ has 25227 mapped keys (of 25227 keys) > Error in get(mapname) : object 'org.Cgriseus.egREFSEQ2EG' not found >> > > >> #load sqlite to check that REFSEQ mappings are included >> CHO.db <- loadDb("org.Cgriseus.eg.sqlite") >> CHO.db > OrgDb object: > | BL2GOSOURCEDATE: Thu Aug 22 18:47:20 2013 > | BL2GOSOURCENAME: blast2GO > | BL2GOSOURCEURL: http://www.blast2go.de/ > | DBSCHEMAVERSION: 2.1 > | DBSCHEMA: ORGANISM_DB > | ORGANISM: Cricetulus griseus > | SPECIES: Cricetulus griseus > | CENTRALID: EG > | TAXID: 10029 > | EGSOURCEDATE: Thu Aug 22 18:47:24 2013 > | EGSOURCENAME: Entrez Gene > | EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA > | GOSOURCEDATE: 20130302 > | GOSOURCENAME: Gene Ontology > | GOSOURCEURL: ftp://ftp.geneontology.org/pub/go/godata > | GOEGSOURCEDATE: Thu Aug 22 18:47:24 2013 > | GOEGSOURCENAME: Entrez Gene > | GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA > | Db type: OrgDb > | Supporting package: AnnotationDbi > >> cols(CHO.db) > [1] "ENTREZID" "ACCNUM" "ALIAS" "CHR" "PMID" "REFSEQ" > [7] "SYMBOL" "UNIGENE" "GENENAME" "GO" "EVIDENCE" "ONTOLOGY" >> keys <- head( keys(CHO.db)) >> keys > [1] "100682525" "100682526" "100682527" "100682528" "100682529" "100682530" >> select(CHO.db, keys=keys, cols = c("SYMBOL","REFSEQ","UNIGENE")) > ENTREZID SYMBOL REFSEQ UNIGENE > 1 100682525 P53 NM_001243976 <na> > 2 100682525 P53 NP_001230905 <na> > 3 100682526 Tuba1c NM_001243977 <na> > 4 100682526 Tuba1c NP_001230906 <na> > 5 100682527 Tuba1a NM_001243978 <na> > 6 100682527 Tuba1a NP_001230907 <na> > 7 100682528 Tuba1b NM_001243979 <na> > 8 100682528 Tuba1b NP_001230908 <na> > 9 100682529 Mgat1 NM_001243980 <na> > 10 100682529 Mgat1 NP_001230909 <na> > 11 100682530 Plec XM_003507629 <na> > 12 100682530 Plec XP_003507677 <na> > Warning message: > In .generateExtraRows(tab, keys, jointype) : > 'select' resulted in 1:many mapping between keys and return rows >> sessionInfo() > R version 3.0.1 Patched (2013-06-05 r62877) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods base > > other attached packages: > [1] org.Cgriseus.eg.db_0.1 RCurl_1.95-4.1 bitops_1.0-6 GO.db_2.9.0 > [5] AnnotationForge_1.2.2 org.Hs.eg.db_2.9.0 RSQLite_0.11.4 DBI_0.2-7 > [9] AnnotationDbi_1.22.6 Biobase_2.20.1 BiocGenerics_0.6.0 > > loaded via a namespace (and not attached): > [1] IRanges_1.18.3 stats4_3.0.1 tools_3.0.1 > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 12.3 years ago Guido Hooiveld ★ 4.1k

Login before adding your answer.