After upgrading to R 3.4/Bioconductor 3.4 today, the `import.gff3` function (and presumably other related functions) are no longer working for me.
For example, using this sample GFF3 file from ENSEMBL, with spaces replaced with tabs:
##gff-version 3 ctg123 . mRNA 1300 9000 . + . ID=mrna0001;Name=sonichedgehog ctg123 . exon 1300 1500 . + . ID=exon00001;Parent=mrna0001 ctg123 . exon 1050 1500 . + . ID=exon00002;Parent=mrna0001 ctg123 . exon 3000 3902 . + . ID=exon00003;Parent=mrna0001 ctg123 . exon 5000 5500 . + . ID=exon00004;Parent=mrna0001 ctg123 . exon 7000 9000 . + . ID=exon00005;Parent=mrna0001
Attempting to load the file results in the following error:
> library('rtracklayer') > import.gff3('test.gff') Error in match.arg(pruning.mode) : 'arg' should be one of “error”, “coarse”, “fine”, “tidy” Calls: import.gff3 ... genome<- -> seqinfo<- -> seqinfo<- -> <Anonymous> -> match.arg
I checked if the issue persists in R 3.5/Bioconductor 3.5, and it seems to work fine there.
Since R 3.4 was just released and it will be a while before 3.5 hits the shelves, it may be helpful to backport the fix to Bioconductor 3.4.
Regards,
Keith
System info:
R 3.4
> sessionInfo() R version 3.4.0 (2017-04-21) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Arch Linux Matrix products: default BLAS: /usr/lib/libblas.so.3.7.0 LAPACK: /usr/lib/liblapack.so.3.7.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base other attached packages: [1] rtracklayer_1.34.2 GenomicRanges_1.26.4 GenomeInfoDb_1.11.11 [4] IRanges_2.8.2 S4Vectors_0.12.2 BiocGenerics_0.20.0 [7] colorout_1.1-2 loaded via a namespace (and not attached): [1] lattice_0.20-35 XML_3.98-1.6 [3] Rsamtools_1.26.2 Biostrings_2.42.1 [5] GenomicAlignments_1.10.1 bitops_1.0-6 [7] grid_3.4.0 zlibbioc_1.20.0 [9] XVector_0.14.1 Matrix_1.2-9 [11] BiocParallel_1.8.2 tools_3.4.0 [13] Biobase_2.34.0 RCurl_1.95-4.8 [15] compiler_3.4.0 SummarizedExperiment_1.4.0 [17] GenomeInfoDbData_0.99.0
R SVN (3.5)
> sessionInfo()
R Under development (unstable) (2017-04-24 r72617)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Arch Linux
Matrix products: default
BLAS: /usr/lib/libblas.so.3.7.0
LAPACK: /usr/lib/liblapack.so.3.7.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] rtracklayer_1.35.12 GenomicRanges_1.27.23 GenomeInfoDb_1.11.11
[4] IRanges_2.9.19 S4Vectors_0.13.17 BiocGenerics_0.21.3
[7] colorout_1.1-2
loaded via a namespace (and not attached):
[1] XVector_0.15.2 zlibbioc_1.21.0
[3] GenomicAlignments_1.11.12 BiocParallel_1.9.6
[5] lattice_0.20-35 tools_3.5.0
[7] SummarizedExperiment_1.5.10 grid_3.5.0
[9] Biobase_2.35.1 matrixStats_0.52.2
[11] Matrix_1.2-9 GenomeInfoDbData_0.99.0
[13] bitops_1.0-6 RCurl_1.95-4.8
[15] DelayedArray_0.1.11 compiler_3.5.0
[17] Biostrings_2.43.8 Rsamtools_1.27.16
[19] XML_3.98-1.6
It looks like GenomeInfoDb is from Bioconductor 3.5, not 3.4. And just so it's clear, Bioc 3.5 is targeting R 3.4, not R 3.5. We will not target R 3.5 until Bioc 3.7. It's only coincidental that the versions are so similar now.
Good catch! It looks like the "3.4" directory may have been previously used by the SVN version of of R, and some newer packages got mixed in. Martin's suggestion below further supports this. Thanks for the insight!