Search
Question: makeTxDbFromGFF Error: subscript contains NAs or out-of-bounds indices - after package update
0
gravatar for Ben Mansfeld
14 months ago by
Michigan State University
Ben Mansfeld0 wrote:

Hello all, 

I appreciate any help in advance. This has been driving me crazy for the last 24hrs...

I am trying to extract exon lengths from my GFF3 file. 

txdb <- makeTxDbFromGFF("../../Chinese Long Genome/cucumber_v2.gff3",format="gff3")
exBytx<-exonsBy(txdb, by="tx", use.names=T)
txlengthData<-sum(width(reduce(exBytx)))

This worked under a previous version of BioC but after updating R and BioC  now I'm getting this:

txdb <- makeTxDbFromGFF("../../Chinese Long Genome/cucumber_v2.gff3",format="gff3")
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... Error: subscript contains NAs or out-of-bounds indices

> traceback()
18: stop(wmsg(...), call. = FALSE)
17: .subscript_error("subscript contains NAs or out-of-bounds indices")
16: NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append)
15: NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append)
14: normalizeSingleBracketSubscript(i, x, as.NSBS = TRUE)
13: extractROWS(x, i)
12: extractROWS(x, i)
11: .nextMethod(x, i)
10: eval(expr, envir, enclos)
9: eval(call, callEnv)
8: callNextMethod(x, i)
7: Parent[exon_with_gene_parent_IDX]
6: Parent[exon_with_gene_parent_IDX]
5: unlist(Parent[exon_with_gene_parent_IDX], use.names = FALSE)
4: ID[gene_IDX] %in% unlist(Parent[exon_with_gene_parent_IDX], use.names = FALSE)
3: .get_gene_as_tx_IDX(gene_IDX, ID, exon_with_gene_parent_IDX, 
       Parent)
2: makeTxDbFromGRanges(gr, metadata = metadata)
1: makeTxDbFromGFF("../../Chinese Long Genome/cucumber_v2.gff3", 
       format = "gff3")

Running the sample a.gff file that comes with the package works. 

What changed between the versions that now doesn't allow me to import the GFF?

My GFF stayed the same. I even downloaded it again from my db. I subset one gene from the GFF and that doesn't work either. Something about the way our GFF is formatted probably doesn't talk well with makeTxDb. But what? and why suddenly not?

Thanks again in advance.

 

Previous sessionInfo that worked:

sessionInfo()

R version 3.2.3 (2015-12-10)

Platform: x86_64-w64-mingw32/x64 (64-bit)

Running under: Windows >= 8 x64 (build 9200)


locale:

[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252

[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252   


attached base packages:

[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base    


other attached packages:

 [1] goseq_1.22.0              BiocInstaller_1.20.3      rtracklayer_1.28.10       GenomicFeatures_1.20.6  

 [5] AnnotationDbi_1.30.1      Biobase_2.28.0            RSQLite_1.0.0             DBI_0.4-1               

 [9] geneLenDataBase_1.4.0     BiasedUrn_1.07            pheatmap_1.0.8            ggplot2_2.1.0           

[13] DESeq2_1.8.2              RcppArmadillo_0.6.700.6.0 Rcpp_0.12.4               GenomicRanges_1.20.8    

[17] GenomeInfoDb_1.4.3        IRanges_2.2.9             S4Vectors_0.6.6           BiocGenerics_0.14.0     

[21] NCmisc_1.1.4    

Current Session info doesn't work:

> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] rtracklayer_1.32.0     GenomicFeatures_1.24.2 AnnotationDbi_1.34.3   Biobase_2.32.0         GenomicRanges_1.24.1   GenomeInfoDb_1.8.2     IRanges_2.6.0         
 [8] S4Vectors_0.10.1       BiocGenerics_0.18.0    goseq_1.24.0           geneLenDataBase_1.8.0  BiasedUrn_1.07        

loaded via a namespace (and not attached):
 [1] XVector_0.12.0             zlibbioc_1.18.0            GenomicAlignments_1.8.1    BiocParallel_1.6.2         lattice_0.20-33            tools_3.3.0               
 [7] grid_3.3.0                 SummarizedExperiment_1.2.2 nlme_3.1-128               mgcv_1.8-12                DBI_0.4-1                  Matrix_1.2-6              
[13] bitops_1.0-6               RCurl_1.95-4.8             biomaRt_2.28.0             RSQLite_1.0.0              GO.db_3.3.0                Biostrings_2.40.1         
[19] Rsamtools_1.24.0           XML_3.98-1.4
ADD COMMENTlink modified 14 months ago by Hervé Pagès ♦♦ 12k • written 14 months ago by Ben Mansfeld0

It would help to see your GFF file.

ADD REPLYlink written 14 months ago by Michael Lawrence9.4k

Thanks for the reply.

Sorry about that. Here are a few lines from the top:

Chr1    .    gene    2952    4284    .    -    .    ID=Csa1G000010; Name=Csa1G000010
Chr1    .    mRNA    2952    4284    .    -    .    ID=Csa1M000010.1; Parent=Csa1G000010; Name=Csa1M000010.1
Chr1    .    five_prime_utr    4200    4284    .    -    .    ID=Csa1M000010.1.utr5p1; Parent=Csa1M000010.1
Chr1    .    exon    4131    4284    .    -    .    ID=Csa1M000010.1.exon1; Parent=Csa1M000010.1
Chr1    .    CDS    4131    4199    .    -    0    ID=Csa1M000010.1.cds1; Parent=Csa1M000010.1
Chr1    .    exon    3837    4040    .    -    .    ID=Csa1M000010.1.exon2; Parent=Csa1M000010.1
Chr1    .    CDS    3837    4040    .    -    0    ID=Csa1M000010.1.cds2; Parent=Csa1M000010.1
Chr1    .    exon    2952    3377    .    -    .    ID=Csa1M000010.1.exon3; Parent=Csa1M000010.1
Chr1    .    CDS    3324    3377    .    -    0    ID=Csa1M000010.1.cds3; Parent=Csa1M000010.1
Chr1    .    three_prime_utr    2952    3323    .    -    .    ID=Csa1M000010.1.utr3p1; Parent=Csa1M000010.1


ADD REPLYlink modified 14 months ago • written 14 months ago by Ben Mansfeld0

At first glance it appears that the format of the file is correct, but somehow the Parent field is not being parsed. The GFF parsing was rewritten in C over the last development cycle, but not by me, so I'll defer to the author.

ADD REPLYlink written 14 months ago by Michael Lawrence9.4k
3
gravatar for Hervé Pagès
14 months ago by
Hervé Pagès ♦♦ 12k
United States
Hervé Pagès ♦♦ 12k wrote:

Hi Ben,,

Your GFF3 file has spaces between the tag=value pairs and the preceding semicolon. The new parser in rtracklayer was considering that this space is part of the tag (so the tags would be " Name" and " Parent"), which was causing problems.

This is fixed in rtracklayer 1.32.1 (release) and 1.33.5 (devel). Both versions should become available via biocLite() in the next 48 hours or so.

Cheers,

H.

ADD COMMENTlink written 14 months ago by Hervé Pagès ♦♦ 12k

Hervé,

Thanks for addressing this. I couldn't figure it out for the life of me.

I appreciate all the work you devs do!

-Ben

ADD REPLYlink written 14 months ago by Ben Mansfeld0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 238 users visited in the last hour