Hello,
I just upgraded to R 3.2.2 / BioC 3.2 / rtracklayer 1.30.0, and some of my code to import NCBI's gff3 files now throws an error when it worked fine with R 3.2.1 / BioC 3.1 / rtracklayer 1.28.6. I have example codes below for both versions trying to import NCBI's mouse ref_GRCm38.p3_top_level.gff3.gz downloaded from ftp://ftp.ncbi.nlm.nih.gov/genomes/M_musculus/GFF/. The new rtracklayer is throwing an error about "cannnot determine seqnames column unambiguously". I looked through the help files for ?import.gff3 in both versions, but don't see any changes. Is this a new bug? My workaround is to save the GRanges object from the old version as a .RData file and then load it into the new R/BioC, which seems to work fine. Any help in getting the new rtracklayer to read in gff file would be appreciated!
Thanks,
Jenny
R 3.2.1 / BioC 3.1 / rtracklayer 1.28.6:
R version 3.2.1 (2015-06-18) -- "World-Famous Astronaut"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)
#lines removed ...
> .libPaths()
[1] "C:/Users/drnevich/Documents/R/win-library/3.2"
[2] "C:/Program Files/R/R-3.2.1/library"
>
> #Change to point to old BioC3.1 packages I saved...
>
> .libPaths(new = "C:/Users/drnevich/Documents/R/win-library/3.2_BioC3.1")
>
> library(rtracklayer)
Loading required package: GenomicRanges
Loading required package: BiocGenerics
Loading required package: parallel
#lines removed...
>
> setwd("D:/Statistics/Freund/Fire_sept2015/ReSeq/")
>
> #mouse GFF downloaded from ftp://ftp.ncbi.nlm.nih.gov/genomes/M_musculus/GFF/
>
> gff0 <- import("ref_GRCm38.p3_top_level.gff3.gz")
> #no errors!
>
> save(gff0, file = "ref_GRCm38.p3_top_level.gff3.RData")
>
> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods
[9] base
other attached packages:
[1] rtracklayer_1.28.6 GenomicRanges_1.20.5 GenomeInfoDb_1.4.1 IRanges_2.2.5
[5] S4Vectors_0.6.2 BiocGenerics_0.14.0
loaded via a namespace (and not attached):
[1] XML_3.98-1.3 Rsamtools_1.20.4 Biostrings_2.36.1
[4] bitops_1.0-6 GenomicAlignments_1.4.1 futile.options_1.0.0
[7] zlibbioc_1.14.0 XVector_0.8.0 futile.logger_1.4.1
[10] lambda.r_1.1.7 BiocParallel_1.2.11 tools_3.2.1
[13] RCurl_1.95-4.7
R 3.2.2 / BioC 3.2 / rtracklayer 1.30.0:
R version 3.2.2 (2015-08-14) -- "Fire Safety"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)
#lines removed...
> .libPaths()
[1] "C:/Users/drnevich/Documents/R/win-library/3.2"
[2] "C:/Program Files/R/R-3.2.2/library"
> #keep to use the new BioC 3.2 packages
>
> library(rtracklayer)
Loading required package: GenomicRanges
Loading required package: BiocGenerics
Loading required package: parallel
#lines removed...
>
> setwd("D:/Statistics/Freund/Fire_sept2015/ReSeq/")
>
> #mouse GFF downloaded from ftp://ftp.ncbi.nlm.nih.gov/genomes/M_musculus/GFF/
>
> gff0 <- import("ref_GRCm38.p3_top_level.gff3.gz")
Error in .find_seqnames_col(df_colnames0, seqnames.field0, prefix) :
cannnot determine seqnames column unambiguously
>
> #Load in the RData file output from R 3.2.1:
> load("ref_GRCm38.p3_top_level.gff3.RData")
>
> #Check to see if I can use it:
> table(gff0$type)
C_gene_segment cDNA_match CDS D_gene_segment
32 8710 937664 24
D_loop exon gene J_gene_segment
1 1170729 48835 156
match mRNA ncRNA primary_transcript
7271 78013 24746 1283
region rRNA sequence_variant transcript
195 35 6 7067
tRNA V_gene_segment
437 613
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats4 parallel stats graphics grDevices utils
[7] datasets methods base
other attached packages:
[1] rtracklayer_1.30.0 GenomicRanges_1.22.0
[3] GenomeInfoDb_1.6.0 IRanges_2.4.0
[5] S4Vectors_0.8.0 BiocGenerics_0.16.0
loaded via a namespace (and not attached):
[1] XML_3.98-1.3 Rsamtools_1.22.0
[3] Biostrings_2.38.0 GenomicAlignments_1.6.0
[5] bitops_1.0-6 futile.options_1.0.0
[7] zlibbioc_1.16.0 XVector_0.10.0
[9] futile.logger_1.4.1 lambda.r_1.1.7
[11] BiocParallel_1.4.0 tools_3.2.2
[13] Biobase_2.30.0 RCurl_1.95-4.7
[15] SummarizedExperiment_1.0.0

Glad you got this in time for your workshop! H.