Entering edit mode
Guido Hooiveld
★
4.1k
@guido-hooiveld-2020
Last seen 9 days ago
Wageningen University, Wageningen, the …
Hi Marc and others,
I am using makeOrgPackageFromNCBI() to create an annotation package
for Chinese hamster (Cricetulus griseus), but experience some problems
during this process. Please see code below for details. It could be
very well that I miss something obvious, so any suggestion what may
cause this would be appreciated!
Thanks,
Guido
1) I am using R on Win7, have admin rights, and also start R through
'Run as administrator'. Why can the file 'org.Cgriseus.eg.sqlite' then
not be removed? (Reason 'Permission denied'). Note: I understand this
is just a warning but it may be relevant.
2a) Despite no *.db package was produced, I still tried to install the
database from the directory the files were generated (i.e.
D:\\org.Cgriseus.eg.db). This *seemed* to go OK, but when I check they
number of mapped egids it failed at the org.Cgriseus.egREFSEQ
mapping...
2b) Interestingly, when I manually load the sqlite database (that
could not be removed) these org.Cgriseus.egREFSEQ mappings are
present! See code at bottom.
2c) --> How to make a *.db from an *.sqlite?
# Create db0 for Chinese hamster using makeOrgPackageFromNCBI()
> library(AnnotationForge)
> makeOrgPackageFromNCBI(
+ version="0.1",
+ maintainer="Guido Hooiveld <guido.hooiveld@wur.nl>",
+ author="Guido Hooiveld <guido.hooiveld@wur.nl>",
+ outputDir=".",
+ tax_id=10029,
+ genus="Cricetulus",
+ species="griseus")
Loading required package: GO.db
Getting data for gene2pubmed.gz
Loading required package: RCurl
Loading required package: bitops
discarding data from other organisms
Populating gene2pubmed table:
table gene2pubmed filled
Getting data for gene2accession.gz
discarding data from other organisms
Populating gene2accession table:
table gene2accession filled
Getting data for gene2refseq.gz
discarding data from other organisms
Populating gene2refseq table:
table gene2refseq filled
Getting data for gene2unigene
discarding data from other organisms
Populating gene2unigene table:
table gene2unigene filled
Getting data for gene_info.gz
discarding data from other organisms
Populating gene_info table:
table gene_info filled
Getting data for gene2go.gz
discarding data from other organisms
Populating gene2go table:
Getting blast2GO data as a substitute for gene2go
table metadata filled
table map_metadata filled
table gene2go filled
table metadata filled
table map_metadata filled
Populating genes table:
genes table filled
Populating gene_info_temp table:
gene_info_temp table filled
Populating alias table:
alias table filled
Populating chromosomes table:
chromosomes table filled
Populating pubmed table:
pubmed table filled
Populating refseq table:
refseq table filled
Populating accessions table:
accessions table filled
Populating unigene table:
Dropping GO IDs that are too new for the current GO.db
Dropping GO IDs that are too new for the current GO.db
Dropping GO IDs that are too new for the current GO.db
Populating go_bp table:
go_bp table filled
Populating go_mf table:
go_mf table filled
Populating go_cc table:
go_cc table filled
Populating go_bp_all table:
go_bp_all table filled
Populating go_mf_all table:
go_mf_all table filled
Populating go_cc_all table:
go_cc_all table filled
dropping table gene2pubmeddropping table gene2accessiondropping table
gene2refseqdropping table gene2unigenedropping table gene_infodropping
table gene2go
Making GO views
SELECT count(DISTINCT g.gene_id) FROM gene_info AS t, genes as g WHERE
t._id=g._id AND t.gene_name NOT NULL
SELECT count(DISTINCT g.gene_id) FROM gene_info AS t, genes as g WHERE
t._id=g._id AND t.symbol NOT NULL
SELECT count(DISTINCT t.symbol) FROM gene_info AS t, genes as g WHERE
t._id=g._id AND t.symbol NOT NULL
SELECT count(DISTINCT g.gene_id) FROM chromosomes AS t, genes as g
WHERE t._id=g._id AND t.chromosome NOT NULL
SELECT count(DISTINCT g.gene_id) FROM refseq AS t, genes as g WHERE
t._id=g._id AND t.accession NOT NULL
SELECT count(DISTINCT t.accession) FROM refseq AS t, genes as g WHERE
t._id=g._id AND t.accession NOT NULL
SELECT count(DISTINCT g.gene_id) FROM unigene AS t, genes as g WHERE
t._id=g._id AND t.unigene_id NOT NULL
SELECT count(DISTINCT t.unigene_id) FROM unigene AS t, genes as g
WHERE t._id=g._id AND t.unigene_id NOT NULL
SELECT count(DISTINCT g.gene_id) FROM accessions AS t, genes as g
WHERE t._id=g._id AND t.accession NOT NULL
SELECT count(DISTINCT t.accession) FROM accessions AS t, genes as g
WHERE t._id=g._id AND t.accession NOT NULL
SELECT count(DISTINCT g.gene_id) FROM alias AS t, genes as g WHERE
t._id=g._id AND t.alias_symbol NOT NULL
table map_counts filled
Creating package in ./org.Cgriseus.eg.db
[1] FALSE
Warning messages:
1: In .makeSimpleTable(ug, table = "unigene", con) :
no values found for table unigene in this data chunk.
2: In file.remove(dbfile) :
cannot remove file 'org.Cgriseus.eg.sqlite', reason 'Permission
denied'
>
> # Now manually install files from DIR that has been generated.
>
> install.packages(repos=NULL, pkgs="D:\\org.Cgriseus.eg.db",
type="source")
* installing *source* package 'org.Cgriseus.eg.db' ...
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
*** arch - i386
*** arch - x64
* DONE (org.Cgriseus.eg.db)
> library(org.Cgriseus.eg.db)
> org.Cgriseus.eg()
Quality control information for org.Cgriseus.eg:
This package has the following mappings:
org.Cgriseus.egALIAS2EG has 25227 mapped keys (of 25227 keys)
org.Cgriseus.egCHR has 25227 mapped keys (of 25227 keys)
org.Cgriseus.egGENENAME has 25227 mapped keys (of 25227 keys)
org.Cgriseus.egGO has 25227 mapped keys (of 25227 keys)
org.Cgriseus.egGO2ALLEGS has 25227 mapped keys (of 16020 keys)
org.Cgriseus.egGO2EG has 25227 mapped keys (of 12124 keys)
org.Cgriseus.egREFSEQ has 25227 mapped keys (of 25227 keys)
Error in get(mapname) : object 'org.Cgriseus.egREFSEQ2EG' not found
>
>
> #load sqlite to check that REFSEQ mappings are included
> CHO.db <- loadDb("org.Cgriseus.eg.sqlite")
> CHO.db
OrgDb object:
| BL2GOSOURCEDATE: Thu Aug 22 18:47:20 2013
| BL2GOSOURCENAME: blast2GO
| BL2GOSOURCEURL: http://www.blast2go.de/
| DBSCHEMAVERSION: 2.1
| DBSCHEMA: ORGANISM_DB
| ORGANISM: Cricetulus griseus
| SPECIES: Cricetulus griseus
| CENTRALID: EG
| TAXID: 10029
| EGSOURCEDATE: Thu Aug 22 18:47:24 2013
| EGSOURCENAME: Entrez Gene
| EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
| GOSOURCEDATE: 20130302
| GOSOURCENAME: Gene Ontology
| GOSOURCEURL: ftp://ftp.geneontology.org/pub/go/godata
| GOEGSOURCEDATE: Thu Aug 22 18:47:24 2013
| GOEGSOURCENAME: Entrez Gene
| GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
| Db type: OrgDb
| Supporting package: AnnotationDbi
> cols(CHO.db)
[1] "ENTREZID" "ACCNUM" "ALIAS" "CHR" "PMID" "REFSEQ"
[7] "SYMBOL" "UNIGENE" "GENENAME" "GO" "EVIDENCE" "ONTOLOGY"
>
> keys <- head( keys(CHO.db))
> keys
[1] "100682525" "100682526" "100682527" "100682528" "100682529"
"100682530"
>
> select(CHO.db, keys=keys, cols = c("SYMBOL","REFSEQ","UNIGENE"))
ENTREZID SYMBOL REFSEQ UNIGENE
1 100682525 P53 NM_001243976 <na>
2 100682525 P53 NP_001230905 <na>
3 100682526 Tuba1c NM_001243977 <na>
4 100682526 Tuba1c NP_001230906 <na>
5 100682527 Tuba1a NM_001243978 <na>
6 100682527 Tuba1a NP_001230907 <na>
7 100682528 Tuba1b NM_001243979 <na>
8 100682528 Tuba1b NP_001230908 <na>
9 100682529 Mgat1 NM_001243980 <na>
10 100682529 Mgat1 NP_001230909 <na>
11 100682530 Plec XM_003507629 <na>
12 100682530 Plec XP_003507677 <na>
Warning message:
In .generateExtraRows(tab, keys, jointype) :
'select' resulted in 1:many mapping between keys and return rows
>
> sessionInfo()
R version 3.0.1 Patched (2013-06-05 r62877)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods base
other attached packages:
[1] org.Cgriseus.eg.db_0.1 RCurl_1.95-4.1 bitops_1.0-6
GO.db_2.9.0
[5] AnnotationForge_1.2.2 org.Hs.eg.db_2.9.0 RSQLite_0.11.4
DBI_0.2-7
[9] AnnotationDbi_1.22.6 Biobase_2.20.1 BiocGenerics_0.6.0
loaded via a namespace (and not attached):
[1] IRanges_1.18.3 stats4_3.0.1 tools_3.0.1
>
[[alternative HTML version deleted]]