I am having error code trying to build a custom TxDb object to TxDb package. Please, could you help with figuring out the problem? Thanks.
dir <- "C:/Users/XX/AppData/Local/R/win-library/4.2/GenomicFeatures/extdata/seq"
gffmodel <- file.path(dir,"GCF_009819885.2_bCatUst1.pri.v2_genomic.gff")
(txdb <- makeTxDbFromGFF(file="GCF_009819885.2_bCatUst1.pri.v2.gff", format="gff3", dataSource = "https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/819/885/GCF_009819885.2_bCatUst1.pri.v2/", organism = "Catharus ustulatus"))
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
TxDb object:
Db type: TxDb
Supporting package: GenomicFeatures
Data source: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/819/885/GCF_009819885.2_bCatUst1.pri.v2/
Organism: Catharus ustulatus
Taxonomy ID: 91951
miRBase build ID: NA
Genome: NA
Nb of transcripts: 40735
Db created by: GenomicFeatures package from Bioconductor
Creation time: 2022-12-06 16:31:04 -0600 (Tue, 06 Dec 2022)
GenomicFeatures version at creation time: 1.50.2
RSQLite version at creation time: 2.2.19
DBSCHEMAVERSION: 1.2
**Warning messages:**
1: In .extract_transcripts_from_GRanges(tx_IDX, gr, mcols0$type, mcols0$ID, :
some transcripts have no "transcript_id" attribute ==> their
name ("tx_name" column in the TxDb object) was set to NA
2: In .extract_transcripts_from_GRanges(tx_IDX, gr, mcols0$type, mcols0$ID, :
the transcript names ("tx_name" column in the TxDb object)
imported from the "transcript_id" attribute are not unique
txdb
TxDb object:
Db type: TxDb
Supporting package: GenomicFeatures
Data source: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/819/885/GCF_009819885.2_bCatUst1.pri.v2/
Organism: Catharus ustulatus
Taxonomy ID: 91951
miRBase build ID: NA
Genome: NA
Nb of transcripts: 40735
Db created by: GenomicFeatures package from Bioconductor
Creation time: 2022-12-06 16:31:04 -0600 (Tue, 06 Dec 2022)
GenomicFeatures version at creation time: 1.50.2
RSQLite version at creation time: 2.2.19
DBSCHEMAVERSION: 1.2
saveDb(txdb, file="Custulatus.sqlite")
TxDb object:
Db type: TxDb
Supporting package: GenomicFeatures
Data source: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/819/885/GCF_009819885.2_bCatUst1.pri.v2/
Organism: Catharus ustulatus
Taxonomy ID: 91951
miRBase build ID: NA
Genome: NA
Nb of transcripts: 40735
Db created by: GenomicFeatures package from Bioconductor
Creation time: 2022-12-06 16:31:04 -0600 (Tue, 06 Dec 2022)
GenomicFeatures version at creation time: 1.50.2
RSQLite version at creation time: 2.2.19
DBSCHEMAVERSION: 1.2
con <- dbconn(txdb)
DBI::dbGetQuery(con, "INSERT INTO metadata VALUES ('Resource URL', 'https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/819/885/GCF_009819885.2_bCatUst1.pri.v2/');")
data frame with 0 columns and 0 rows
**Warning message:**
In result_fetch(res@ptr, n = n) :
SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
makeTxDbPackage(txdb, version="1.2", maintainer="X X X@zzz.edu", author="X X", destDir= "C:/Users/X/AppData/Local/R/win-library/4.2/GenomicFeatures/exdata/seq")
**Error** in createPackage(pkgname = pkgname, destinationDir = destDir, originDir = template_path, :
'destinationDir' must be a directory (C:/Users/X/AppData/Local/R/win-library/4.2/GenomicFeatures/exdata/seq)
sessionInfo("GenomicFeatures")
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
attached base packages:
character(0)
other attached packages:
[1] GenomicFeatures_1.50.2
loaded via a namespace (and not attached):
[1] Rcpp_1.0.9 lattice_0.20-45 prettyunits_1.1.1
[4] png_0.1-8 Rsamtools_2.14.0 Biostrings_2.66.0
[7] assertthat_0.2.1 digest_0.6.30 utf8_1.2.2
[10] BiocFileCache_2.6.0 R6_2.5.1 GenomeInfoDb_1.34.4
[13] stats4_4.2.2 evaluate_0.18 RSQLite_2.2.19
[16] httr_1.4.4 pillar_1.8.1 utils_4.2.2
[19] zlibbioc_1.44.0 rlang_1.0.6 progress_1.2.2
[22] curl_4.3.3 blob_1.2.3 S4Vectors_0.36.0
[25] Matrix_1.5-1 rmarkdown_2.18 BiocParallel_1.32.4
[28] stringr_1.5.0 RCurl_1.98-1.9 bit_4.0.5
[31] biomaRt_2.54.0 DelayedArray_0.23.2 xfun_0.35
[34] compiler_4.2.2 rtracklayer_1.58.0 pkgconfig_2.0.3
[37] stats_4.2.2 BiocGenerics_0.44.0 htmltools_0.5.3
[40] tidyselect_1.2.0 KEGGREST_1.38.0 SummarizedExperiment_1.28.0
[43] tibble_3.1.8 GenomeInfoDbData_1.2.9 matrixStats_0.63.0
[46] IRanges_2.32.0 codetools_0.2-18 grDevices_4.2.2
[49] XML_3.99-0.12 fansi_1.0.3 crayon_1.5.2
[52] dplyr_1.0.10 dbplyr_2.2.1 GenomicAlignments_1.34.0
[55] bitops_1.0-7 rappdirs_0.3.3 grid_4.2.2
[58] lifecycle_1.0.3 DBI_1.1.3 magrittr_2.0.3
[61] datasets_4.2.2 cli_3.4.1 stringi_1.7.8
[64] cachem_1.0.6 XVector_0.38.0 xml2_1.3.3
[67] ellipsis_0.3.2 graphics_4.2.2 filelock_1.0.2
[70] generics_0.1.3 vctrs_0.5.1 base_4.2.2
[73] rjson_0.2.21 restfulr_0.0.15 tools_4.2.2
[76] bit64_4.0.5 Biobase_2.58.0 glue_1.6.2
[79] hms_1.1.2 MatrixGenerics_1.10.0 parallel_4.2.2
[82] fastmap_1.1.0 yaml_2.3.6 AnnotationDbi_1.60.0
[85] GenomicRanges_1.49.0 memoise_2.0.1 knitr_1.41
[88] BiocIO_1.8.0 methods_4.2.2
Thanks for your response. The SQL statement works but still seeing error for makeTxDbPackage after adjusting the destination directory.
It is doing this query:
And then testing that the returned value returns
TRUE
from the testAnd what you are putting into the metadata table is failing that test.
I tried using the code below (found on the forum) as a solution to previous error code about missing 'Resource URL' but it seems the link to NCBI assembly where I downloaded the gff file didn't work.
Could it be that the NCBI link points to zipped gff/gtf files only? If so, please, is there another way I can navigate through the "missing resource url" problem?
Thanks for your help.
The error you provided has nothing whatsoever to do with the URI 'working' or not. To repeat what I already said, it's doing a test to see what you have as the Resource URL entry in the metadata table, and it's saying that it is either A), not character, or B), has a length != 1L, or C) is NA.
It's simple enough for you to do something like
and then decipher why it's failing the test.