I am trying to create a TxDb for a non-model organism. The genome is annotated on NCBI and the associated GO files, etc are on there. So, I am trying to use the makeTxDbFromGAF() function. I continue to get the same error no matter what I try, that the first column must be the gene ID "GID".
library("AnnotationForge")
library("BaseSet")
library("dplyr")
library("BaseSet")
gaf<-getGAF("gene_ontology.gaf")
gaf_data <- as.data.frame(gaf)
head(gaf_data)
colnames(gaf_data)[colnames(gaf_data) == "DB_Object_ID"] <- "GID"
gaf_data <- gaf_data %>% select(GID, everything())
head(gaf_data)
makeOrgPackageFromGAF <- function(gaf_data, output_path) {
makeOrgPackage(
gene_info = gaf_data,
organism = "Haemorhous mexicanus",
version = "0.1",
maintainer = "Anna Perez-Umphrey <aperezumphrey@gmail.com>",
author = "Anna Perez-Umphrey <aperezumphrey@gmail.com>",
outputDir = output_path,
tax_id = "30427"
)
}
output_path <- "C:/Users/apere/Desktop/GO"
makeOrgPackageFromGAF(gaf_data, output_path)
#Error in .makeOrgPackage(data, version = version, maintainer = maintainer, :
# The 1st column must always be the gene ID 'GID'
sessionInfo( )
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22631)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8 LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C LC_TIME=English_United States.utf8
time zone: America/New_York
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] BaseSet_0.9.0 dplyr_1.1.3 readr_2.1.5 GenomicFeatures_1.54.4 GenomicRanges_1.54.1
[6] GenomeInfoDb_1.38.5 AnnotationForge_1.44.0 AnnotationDbi_1.64.1 IRanges_2.36.0 S4Vectors_0.40.2
[11] Biobase_2.62.0 BiocGenerics_0.48.1
loaded via a namespace (and not attached):
[1] DBI_1.2.3 bitops_1.0-7 biomaRt_2.58.2 rlang_1.1.1
[5] magrittr_2.0.3 matrixStats_1.2.0 compiler_4.3.1 RSQLite_2.3.4
[9] mgcv_1.8-42 png_0.1-8 vctrs_0.6.4 stringr_1.5.1
[13] pkgconfig_2.0.3 crayon_1.5.3 fastmap_1.1.1 dbplyr_2.5.0
[17] XVector_0.42.0 utf8_1.2.4 Rsamtools_2.18.0 tzdb_0.4.0
[21] nloptr_2.1.1 bit_4.0.5 zlibbioc_1.48.0 cachem_1.0.8
[25] progress_1.2.3 blob_1.2.4 DelayedArray_0.28.0 BiocParallel_1.36.0
[29] parallel_4.3.1 prettyunits_1.2.0 R6_2.5.1 stringi_1.8.4
[33] rtracklayer_1.62.0 pkgload_1.4.0 boot_1.3-28.1 lubridate_1.9.3
[37] numDeriv_2016.8-1.1 estimability_1.5.1 Rcpp_1.0.11 SummarizedExperiment_1.32.0
[41] Matrix_1.6-5 splines_4.3.1 timechange_0.3.0 glmmTMB_1.1.9
[45] tidyselect_1.2.1 rstudioapi_0.16.0 abind_1.4-5 yaml_2.3.7
[49] TMB_1.9.14 codetools_0.2-19 curl_5.2.0 lattice_0.21-8
[53] tibble_3.2.1 withr_3.0.0 KEGGREST_1.42.0 coda_0.19-4.1
[57] BiocFileCache_2.10.2 xml2_1.3.6 Biostrings_2.70.1 pillar_1.9.0
[61] BiocManager_1.30.23 filelock_1.0.3 MatrixGenerics_1.14.0 generics_0.1.3
[65] vroom_1.6.5 RCurl_1.98-1.14 hms_1.1.3 minqa_1.2.7
[69] xtable_1.8-4 glue_1.6.2 emmeans_1.10.3 tools_4.3.1
[73] BiocIO_1.12.0 lme4_1.1-35.5 GenomicAlignments_1.38.2 mvtnorm_1.2-5
[77] XML_3.99-0.17 grid_4.3.1 nlme_3.1-162 GenomeInfoDbData_1.2.11
[81] restfulr_0.0.15 cli_3.6.1 rappdirs_0.3.3 fansi_1.0.5
[85] S4Arrays_1.2.0 digest_0.6.33 SparseArray_1.2.3 rjson_0.2.21
[89] memoise_2.0.1 lifecycle_1.0.4 httr_1.4.7 GO.db_3.18.0
[93] bit64_4.0.5 MASS_7.3-60
Are you trying to make a
TxDb
or anOrgDb
? You say the former, but then appear to be trying to make the latter.