Hello all,
I appreciate any help in advance. This has been driving me crazy for the last 24hrs...
I am trying to extract exon lengths from my GFF3 file.
txdb <- makeTxDbFromGFF("../../Chinese Long Genome/cucumber_v2.gff3",format="gff3") exBytx<-exonsBy(txdb, by="tx", use.names=T) txlengthData<-sum(width(reduce(exBytx)))
This worked under a previous version of BioC but after updating R and BioC now I'm getting this:
txdb <- makeTxDbFromGFF("../../Chinese Long Genome/cucumber_v2.gff3",format="gff3")
Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... Error: subscript contains NAs or out-of-bounds indices > traceback() 18: stop(wmsg(...), call. = FALSE) 17: .subscript_error("subscript contains NAs or out-of-bounds indices") 16: NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) 15: NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) 14: normalizeSingleBracketSubscript(i, x, as.NSBS = TRUE) 13: extractROWS(x, i) 12: extractROWS(x, i) 11: .nextMethod(x, i) 10: eval(expr, envir, enclos) 9: eval(call, callEnv) 8: callNextMethod(x, i) 7: Parent[exon_with_gene_parent_IDX] 6: Parent[exon_with_gene_parent_IDX] 5: unlist(Parent[exon_with_gene_parent_IDX], use.names = FALSE) 4: ID[gene_IDX] %in% unlist(Parent[exon_with_gene_parent_IDX], use.names = FALSE) 3: .get_gene_as_tx_IDX(gene_IDX, ID, exon_with_gene_parent_IDX, Parent) 2: makeTxDbFromGRanges(gr, metadata = metadata) 1: makeTxDbFromGFF("../../Chinese Long Genome/cucumber_v2.gff3", format = "gff3")
Running the sample a.gff file that comes with the package works.
What changed between the versions that now doesn't allow me to import the GFF?
My GFF stayed the same. I even downloaded it again from my db. I subset one gene from the GFF and that doesn't work either. Something about the way our GFF is formatted probably doesn't talk well with makeTxDb. But what? and why suddenly not?
Thanks again in advance.
Previous sessionInfo that worked:
sessionInfo() R version 3.2.3 (2015-12-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] goseq_1.22.0 BiocInstaller_1.20.3 rtracklayer_1.28.10 GenomicFeatures_1.20.6 [5] AnnotationDbi_1.30.1 Biobase_2.28.0 RSQLite_1.0.0 DBI_0.4-1 [9] geneLenDataBase_1.4.0 BiasedUrn_1.07 pheatmap_1.0.8 ggplot2_2.1.0 [13] DESeq2_1.8.2 RcppArmadillo_0.6.700.6.0 Rcpp_0.12.4 GenomicRanges_1.20.8 [17] GenomeInfoDb_1.4.3 IRanges_2.2.9 S4Vectors_0.6.6 BiocGenerics_0.14.0 [21] NCmisc_1.1.4
Current Session info doesn't work:
> sessionInfo() R version 3.3.0 (2016-05-03) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] rtracklayer_1.32.0 GenomicFeatures_1.24.2 AnnotationDbi_1.34.3 Biobase_2.32.0 GenomicRanges_1.24.1 GenomeInfoDb_1.8.2 IRanges_2.6.0 [8] S4Vectors_0.10.1 BiocGenerics_0.18.0 goseq_1.24.0 geneLenDataBase_1.8.0 BiasedUrn_1.07 loaded via a namespace (and not attached): [1] XVector_0.12.0 zlibbioc_1.18.0 GenomicAlignments_1.8.1 BiocParallel_1.6.2 lattice_0.20-33 tools_3.3.0 [7] grid_3.3.0 SummarizedExperiment_1.2.2 nlme_3.1-128 mgcv_1.8-12 DBI_0.4-1 Matrix_1.2-6 [13] bitops_1.0-6 RCurl_1.95-4.8 biomaRt_2.28.0 RSQLite_1.0.0 GO.db_3.3.0 Biostrings_2.40.1 [19] Rsamtools_1.24.0 XML_3.98-1.4
It would help to see your GFF file.
Thanks for the reply.
Sorry about that. Here are a few lines from the top:
At first glance it appears that the format of the file is correct, but somehow the Parent field is not being parsed. The GFF parsing was rewritten in C over the last development cycle, but not by me, so I'll defer to the author.