Question: Bug in makeOrgPackageFromNCBI from AnnotationForge?
5.5 years ago by
Marco Blanchette • 210
United States/Kansas City/Stowers Institute for Medical Research
Marco Blanchette • 210 wrote:
I am working on a project involving Schizosaccharomyces pombe as a source for genomic analysis and love to use ReportingTools html producing wrappers. However, I am struggling as there is no AnnotationDbi package available for this organism. I decided to finally take the plunge and try to see if I could be one myself using AnnotationForge and was quite exciting to find that I could perhaps melt one simply by using the makeOrgPackageFromNCBI(). Most likely, something went wrong and I suspect a bug somewhere in the pipeline. I have not dug deeper then trying to build the package and use it hoping that someone closer to the code could shed some lights. Here the steps I took:' > library(AnnotationForge) > makeOrgPackageFromNCBI(version = "0.1", author = "Marco Blanchette <mab at="" stowers.org="">", maintainer = "Marco Blanchette <mab at="" stowers.org="">", outputDir = ".", tax_id = "4896", genus = "Schizosaccharomyces", species = "pombe") This step succeeded with only a warning: Warning message: In .makeSimpleTable(ug, table = "unigene", con) : no values found for table unigene in this data chunk. I didn't think this was critical enough to raise any red flag, so I then proceeded with the installation that went smoothly > library(devtools) > install('org.Spombe.eg.db') > library('org.Spombe.eg.db') Then I try to use it with ReportingTools publish() but fail as it returns an error related to Entrez ID which I had a conversion table from biomaRt. I dug a bit deeper and found that none of the genes I was querying were in the database to finally realize that there was only 38 entries int the org.Spombe.eg.db database I had just created and installed... Check this out: > keytypes(org.Spombe.eg.db)  "ENTREZID" "ACCNUM" "ALIAS" "CHR" "PMID" "REFSEQ"  "SYMBOL" "UNIGENE" "GENENAME" "GO" "EVIDENCE" "ONTOLOGY" Looking good! However: > length(keys(org.Spombe.eg.db,'ENTREZID'))  38 Can someone close enough to the code shed some lights has to whether there is a bug in AnnotationForge or whether it is the NCBI database that is not conforming to what is expected? For instance, biomart has 5117 entrez ID > library(biomaRt) > mart <- useMart("fungi_mart_18","spombe_eg_gene") > ensembl2entrez <- getBM(c('ensembl_gene_id','entrezgene'),mart=mart) > sum(!is.na(ensembl2entrez$entrezgene))  5117 The ids I tested on the NCBI website return the correct genes. However, only 10 of the AnnotationForge EntrezID (out of a skirmish 38 ids) are found in biomaRt > sum(keys(org.Spombe.eg.db,'ENTREZID') %in% ensembl2entrez$entrezgene)  10 Again, I would appreciate any comments or suggestions as to whether this is a bug or something I did wrong or a miss alignment between the NCBI S. pombe annotation and what is expected by AnnotationForge. Thanks -- Marco Blanchette, Ph.D. Assistant Investigator Stowers Institute for Medical Research 1000 East 50th St. Kansas City, MO 64110 Tel: 816-926-4071 Cell: 816-726-8419 Fax: 816-926-2018
ADD COMMENT • link •