how to create an OrgDb package?
1
0
Entering edit mode
Haibo Liu • 0
@haibol2017-23658
Last seen 5 weeks ago
United States

Dear Bioconductor community,

I am trying to build an OrgDb for some custom genome with a GTF annotation file. I tried to start with the human genome using the makeOrgPackage from the AnnotationForge package. A few head lines of the input file “GRCh38.gene.info.txt” are shown as below. The resulting package is installed successfully, but it can Not be queried.

Thank you very much if you have any thoughts/comments/solutions.

Haibo

library("AnnotationForge")
library("AnnotationDbi")

gene_information <- read.delim("GRCh38.gene.info.txt", header = FALSE)

head(gene_information)
           V1              V2          V3

1 ENSG00000243485 ENST00000473358 MIR1302-2HG
2 ENSG00000243485 ENST00000469289 MIR1302-2HG
3 ENSG00000237613 ENST00000417324 FAM138A
4 ENSG00000237613 ENST00000461467 FAM138A
5 ENSG00000186092 ENST00000641515 OR4F5
6 ENSG00000186092 ENST00000335137 OR4F5

fSym <- unique(gene_information[, c(1,3)])
colnames(fSym) <- c("GID", "SYMBOL")

ensembl_trans <- unique(gene_information[, c(1:2)])
colnames(ensembl_trans) <- c("GID", "ENSEMBLTRANS")

ensembl <- unique(gene_information[, c(1,1)])
colnames(ensembl) <- c("GID", "ENSEMBL")

#tmpdir <- tempdir()
tmpdir <- "test2"
if (!dir.exists(tmpdir))
{
    dir.create(tmpdir)
}

makeOrgPackage(gene_info = fSym, 
               ensembl_trans = ensembl_trans,
               ensembl = ensembl,
               version = "0.1",
               maintainer = "Some One so@someplace.org",
               author = "Some One so@someplace.org",
               outputDir = tmpdir,
               tax_id= "9606",
               genus= "Homo",
               species= "sapiens",
               goTable=NULL)

install.packages(file.path(tmpdir, "org.Hsapiens.eg.db"), 
                 type = "source", repos=NULL)


library("org.Hsapiens.eg.db")
AnnotationDbi::select(org.Hsapiens.eg.db, keys = "ENSG00000243485", columns = "SYMBOL", keytype = "ENSEMBL")

## Error message :
Error in names(ans) <- unlist(make.name.tree(x, recursive, what.names),  : 
  attempt to set an attribute on NULL

sessionInfo( )

R version 4.1.0 (2021-05-18) Platform: i386-w64-mingw32/i386 (32-bit) Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages: [1] stats4 parallel stats graphics grDevices [6] utils datasets methods base

other attached packages: [1] RSQLite_2.2.7 org.Hsapiens.eg.db_0.1 [3] AnnotationForge_1.34.1 AnnotationDbi_1.54.1
[5] IRanges_2.26.0 S4Vectors_0.30.0
[7] Biobase_2.52.0 BiocGenerics_0.38.0

loaded via a namespace (and not attached): [1] KEGGREST_1.32.0 tidyselect_1.1.1
[3] xfun_0.25 purrr_0.3.4
[5] colorspace_2.0-2 vctrs_0.3.8
[7] generics_0.1.0 htmltools_0.5.1.1
[9] yaml_2.2.1 XML_3.99-0.8
[11] utf8_1.2.2 blob_1.2.2
[13] rlang_0.4.11 pillar_1.6.3
[15] glue_1.4.2 DBI_1.1.1
[17] bit64_4.0.5 GenomeInfoDbData_1.2.6 [19] lifecycle_1.0.1 zlibbioc_1.38.0
[21] Biostrings_2.60.2 munsell_0.5.0
[23] gtable_0.3.0 memoise_2.0.0
[25] evaluate_0.14 knitr_1.36
[27] fastmap_1.1.0 GenomeInfoDb_1.28.4
[29] fansi_0.5.0 Rcpp_1.0.7
[31] scales_1.1.1 BiocManager_1.30.16
[33] cachem_1.0.5 XVector_0.32.0
[35] bit_4.0.4 ggplot2_3.3.5
[37] png_0.1-7 digest_0.6.27
[39] dplyr_1.0.7 cowplot_1.1.1
[41] grid_4.1.0 tools_4.1.0
[43] bitops_1.0-7 magrittr_2.0.1
[45] RCurl_1.98-1.3 tibble_3.1.3
[47] crayon_1.4.1 pkgconfig_2.0.3
[49] ellipsis_0.3.2 rstudioapi_0.13
[51] assertthat_0.2.1 rmarkdown_2.11
[53] httr_1.4.2 R6_2.5.1
[55] compiler_4.1.0

orgDb annotationForge • 131 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 8 days ago
United States

It's a bug. The release branch is frozen, so I'll patch the devel branch and it will be part of the new release. To use the fixed package you will need to use devel until the release.

It will take 24-48 hours to propagate (or if you want to hang with the cool kids, you can use BiocManager::install("jmacdon/AnnotationDbi") after an hour or so to get the fix). You are looking for version 1.55.2.

0
Entering edit mode
> select(org.Hsapiens.eg.db, keys = "ENSG00000243485", columns = "SYMBOL", keytype = "ENSEMBL")
'select()' returned 1:1 mapping between keys and columns
          ENSEMBL      SYMBOL
1 ENSG00000243485 MIR1302-2HG
> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] org.Hsapiens.eg.db_0.1 AnnotationDbi_1.55.2   IRanges_2.26.0        
[4] S4Vectors_0.30.2       Biobase_2.52.0         BiocGenerics_0.38.0   
[7] BiocManager_1.30.16   

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.7             XVector_0.32.0         compiler_4.1.0        
 [4] GenomeInfoDb_1.28.4    zlibbioc_1.38.0        prettyunits_1.1.1     
 [7] bitops_1.0-7           remotes_2.4.1          tools_4.1.0           
[10] pkgbuild_1.2.0         bit_4.0.4              RSQLite_2.2.8         
[13] memoise_2.0.0          pkgconfig_2.0.3        png_0.1-7             
[16] rlang_0.4.11           DBI_1.1.1              cli_3.0.1             
[19] rstudioapi_0.13        curl_4.3.2             fastmap_1.1.0         
[22] GenomeInfoDbData_1.2.6 withr_2.4.2            httr_1.4.2            
[25] Biostrings_2.60.2      vctrs_0.3.8            rprojroot_2.0.2       
[28] bit64_4.0.5            R6_2.5.1               processx_3.5.2        
[31] callr_3.7.0            blob_1.2.2             ps_1.6.0              
[34] KEGGREST_1.32.0        RCurl_1.98-1.5         cachem_1.0.6          
[37] crayon_1.4.1

I cheated and installed in a release R/Bioconductor. I wouldn't recommend that, and nobody will provide support if you do so. Put a different way, if you have a problem and we see that you have mixed'n'matched package versions, the first response will be for you to run 'BiocManager::valid()`, which will undo the mixing. So you should either wait for the release next week, or install R-4.1.2 and Bioc-devel.

ADD REPLY
0
Entering edit mode

Thank you so much, James, for the quick response and the fix. I will wait for the release next week.

Haibo

ADD REPLY

Login before adding your answer.

Traffic: 311 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6