Hello all,
I am trying to create an TxDB from a GFF3 file.
> gtfFile <- "../../Emihu1_best_genes_altered.gff" > txdb <- makeTxDbFromGFF(gtfFile, format="auto")
However it gives me the following error:
Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... Error: subscript contains NAs > traceback() 18: stop(wmsg(...), call. = FALSE) 17: .subscript_error("subscript contains NAs") 16: NSBS(i, x, exact = exact, strict.upper.bound = !allow.append, allow.NAs = allow.NAs) 15: NSBS(i, x, exact = exact, strict.upper.bound = !allow.append, allow.NAs = allow.NAs) 14: normalizeSingleBracketSubscript(i, x, as.NSBS = TRUE) 13: extractROWS(x, i) 12: extractROWS(x, i) 11: .nextMethod(x, i) 10: eval(call, callEnv) 9: eval(call, callEnv) 8: callNextMethod(x, i) 7: Parent[exon_with_gene_parent_IDX] 6: Parent[exon_with_gene_parent_IDX] 5: unlist(Parent[exon_with_gene_parent_IDX], use.names = FALSE) 4: ID[gene_IDX] %in% unlist(Parent[exon_with_gene_parent_IDX], use.names = FALSE) 3: .get_gene_as_tx_IDX(gene_IDX, ID, exon_with_gene_parent_IDX, Parent) 2: makeTxDbFromGRanges(gr, metadata = metadata) 1: makeTxDbFromGFF(gtfFile, format = "gff3")
My GFF files looks as follows (with an additional column for "attributes"):
> read.gff(gtffile, na.strings = c(".", "?")) seqid source type start end score strand phase 1 scaffold_1 JGI exon 2702 2844 NA - <NA> 2 scaffold_1 JGI exon 2995 3157 NA - <NA> 3 scaffold_1 JGI CDS 3042 3157 NA - 0 4 scaffold_1 JGI stop_codon 3042 3044 NA - 0 5 scaffold_1 JGI exon 3042 3176 NA - <NA> 6 scaffold_1 JGI CDS 3042 3176 NA - 2 7 scaffold_1 JGI stop_codon 3042 3044 NA - 0 8 scaffold_1 JGI exon 3442 3678 NA - <NA>
Is the error coming from the NAs in the "score" column?
If so, are there any solutions to prevent this?
Thanks in advance.
> sessionInfo() R version 3.4.0 (2017-04-21) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.5 LTS Matrix products: default BLAS: /usr/lib/libblas/libblas.so.3.0 LAPACK: /usr/lib/lapack/liblapack.so.3.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=nl_NL.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=nl_NL.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=nl_NL.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] ape_4.1 ensembldb_2.0.3 [3] AnnotationFilter_1.0.0 DESeq2_1.16.1 [5] BiocInstaller_1.26.0 BiocParallel_1.10.1 [7] GenomicAlignments_1.12.1 SummarizedExperiment_1.6.3 [9] DelayedArray_0.2.7 matrixStats_0.52.2 [11] GenomicFeatures_1.28.4 AnnotationDbi_1.38.1 [13] Biobase_2.36.2 Rsamtools_1.28.0 [15] Biostrings_2.44.1 XVector_0.16.0 [17] GenomicRanges_1.28.3 GenomeInfoDb_1.12.2 [19] IRanges_2.10.2 S4Vectors_0.14.3 [21] BiocGenerics_0.22.0 loaded via a namespace (and not attached): [1] httr_1.2.1 bit64_0.9-7 [3] AnnotationHub_2.8.2 splines_3.4.0 [5] shiny_1.0.3 Formula_1.2-1 [7] interactiveDisplayBase_1.14.0 latticeExtra_0.6-28 [9] blob_1.1.0 GenomeInfoDbData_0.99.0 [11] yaml_2.1.14 RSQLite_2.0 [13] backports_1.1.0 lattice_0.20-35 [15] digest_0.6.12 RColorBrewer_1.1-2 [17] checkmate_1.8.2 colorspace_1.3-2 [19] httpuv_1.3.3 htmltools_0.3.6 [21] Matrix_1.2-10 plyr_1.8.4 [23] pkgconfig_2.0.1 XML_3.98-1.7 [25] biomaRt_2.32.1 genefilter_1.58.1 [27] zlibbioc_1.22.0 xtable_1.8-2 [29] scales_0.4.1 htmlTable_1.9 [31] tibble_1.3.3 annotate_1.54.0 [33] ggplot2_2.2.1 nnet_7.3-12 [35] lazyeval_0.2.0 mime_0.5 [37] survival_2.41-3 magrittr_1.5 [39] memoise_1.1.0 nlme_3.1-131 [41] foreign_0.8-68 tools_3.4.0 [43] data.table_1.10.4 stringr_1.2.0 [45] munsell_0.4.3 locfit_1.5-9.1 [47] cluster_2.0.6 compiler_3.4.0 [49] rlang_0.1.1 grid_3.4.0 [51] RCurl_1.95-4.8 htmlwidgets_0.8 [53] tcltk_3.4.0 bitops_1.0-6 [55] base64enc_0.1-3 gtable_0.2.0 [57] curl_2.6 DBI_0.6-1 [59] R6_2.2.2 gridExtra_2.2.1 [61] knitr_1.16 rtracklayer_1.36.4 [63] bit_1.1-12 Hmisc_4.0-3 [65] ProtGenerics_1.8.0 stringi_1.1.5 [67] Rcpp_0.12.11 geneplotter_1.54.0 [69] rpart_4.1-11 acepack_1.4.1
It is hard for me to try to debug this without the file that is causing problems. Could you please email me off list (lori.shepherd@roswellpark.org) to figure out a solution.