Dear Bioconductor community,
I am trying to build an OrgDb for some custom genome with a GTF annotation file. I tried to start with the human genome using the makeOrgPackage from the AnnotationForge package. A few head lines of the input file “GRCh38.gene.info.txt” are shown as below. The resulting package is installed successfully, but it can Not be queried.
Thank you very much if you have any thoughts/comments/solutions.
Haibo
library("AnnotationForge")
library("AnnotationDbi")
gene_information <- read.delim("GRCh38.gene.info.txt", header = FALSE)
head(gene_information)
V1 V2 V3
1 ENSG00000243485 ENST00000473358 MIR1302-2HG
2 ENSG00000243485 ENST00000469289 MIR1302-2HG
3 ENSG00000237613 ENST00000417324 FAM138A
4 ENSG00000237613 ENST00000461467 FAM138A
5 ENSG00000186092 ENST00000641515 OR4F5
6 ENSG00000186092 ENST00000335137 OR4F5
fSym <- unique(gene_information[, c(1,3)])
colnames(fSym) <- c("GID", "SYMBOL")
ensembl_trans <- unique(gene_information[, c(1:2)])
colnames(ensembl_trans) <- c("GID", "ENSEMBLTRANS")
ensembl <- unique(gene_information[, c(1,1)])
colnames(ensembl) <- c("GID", "ENSEMBL")
#tmpdir <- tempdir()
tmpdir <- "test2"
if (!dir.exists(tmpdir))
{
dir.create(tmpdir)
}
makeOrgPackage(gene_info = fSym,
ensembl_trans = ensembl_trans,
ensembl = ensembl,
version = "0.1",
maintainer = "Some One so@someplace.org",
author = "Some One so@someplace.org",
outputDir = tmpdir,
tax_id= "9606",
genus= "Homo",
species= "sapiens",
goTable=NULL)
install.packages(file.path(tmpdir, "org.Hsapiens.eg.db"),
type = "source", repos=NULL)
library("org.Hsapiens.eg.db")
AnnotationDbi::select(org.Hsapiens.eg.db, keys = "ENSG00000243485", columns = "SYMBOL", keytype = "ENSEMBL")
## Error message :
Error in names(ans) <- unlist(make.name.tree(x, recursive, what.names), :
attempt to set an attribute on NULL
sessionInfo( )
R version 4.1.0 (2021-05-18) Platform: i386-w64-mingw32/i386 (32-bit) Running under: Windows 10 x64 (build 19043)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages: [1] stats4 parallel stats graphics grDevices [6] utils datasets methods base
other attached packages:
[1] RSQLite_2.2.7 org.Hsapiens.eg.db_0.1
[3] AnnotationForge_1.34.1 AnnotationDbi_1.54.1
[5] IRanges_2.26.0 S4Vectors_0.30.0
[7] Biobase_2.52.0 BiocGenerics_0.38.0
loaded via a namespace (and not attached):
[1] KEGGREST_1.32.0 tidyselect_1.1.1
[3] xfun_0.25 purrr_0.3.4
[5] colorspace_2.0-2 vctrs_0.3.8
[7] generics_0.1.0 htmltools_0.5.1.1
[9] yaml_2.2.1 XML_3.99-0.8
[11] utf8_1.2.2 blob_1.2.2
[13] rlang_0.4.11 pillar_1.6.3
[15] glue_1.4.2 DBI_1.1.1
[17] bit64_4.0.5 GenomeInfoDbData_1.2.6
[19] lifecycle_1.0.1 zlibbioc_1.38.0
[21] Biostrings_2.60.2 munsell_0.5.0
[23] gtable_0.3.0 memoise_2.0.0
[25] evaluate_0.14 knitr_1.36
[27] fastmap_1.1.0 GenomeInfoDb_1.28.4
[29] fansi_0.5.0 Rcpp_1.0.7
[31] scales_1.1.1 BiocManager_1.30.16
[33] cachem_1.0.5 XVector_0.32.0
[35] bit_4.0.4 ggplot2_3.3.5
[37] png_0.1-7 digest_0.6.27
[39] dplyr_1.0.7 cowplot_1.1.1
[41] grid_4.1.0 tools_4.1.0
[43] bitops_1.0-7 magrittr_2.0.1
[45] RCurl_1.98-1.3 tibble_3.1.3
[47] crayon_1.4.1 pkgconfig_2.0.3
[49] ellipsis_0.3.2 rstudioapi_0.13
[51] assertthat_0.2.1 rmarkdown_2.11
[53] httr_1.4.2 R6_2.5.1
[55] compiler_4.1.0
I cheated and installed in a release R/Bioconductor. I wouldn't recommend that, and nobody will provide support if you do so. Put a different way, if you have a problem and we see that you have mixed'n'matched package versions, the first response will be for you to run 'BiocManager::valid()`, which will undo the mixing. So you should either wait for the release next week, or install R-4.1.2 and Bioc-devel.
Thank you so much, James, for the quick response and the fix. I will wait for the release next week.
Haibo
Hi Bioconductor team and Haibo,
I am also trying to generate an OrgDb object for pig using the same script shared above.
When I have the "Description" argument, it would report error below though I do have first column as 'GID' :
When I do not have the "Description" argument
it complians about the DESCRIPTION file. Can you give me some suggestions?
Thanks in advance!
Regarding this:
I have also been facing this error... see: problem with makeOrgPackageFromNCBI when making an annotation package
Basically you have to put the email address between
<
and>
. You forgot to include these. Thus it should be:Having said this, are you aware that for each ENSEMBL release these databases are made available for use in Bioconductor through the so-called
AnnnotationHub
by Johannes Rainer? So there may be no need for you to do what you are doing. See for more on this e.g. here ensembldb EnsDb databases for Ensembl release 101 added to AnnotationHub and EnsDb.Rnorvegicus for Rnor6.Hey Guido, A million thanks!! its working!! Penny