Hello,
I just upgraded to R 3.2.2 / BioC 3.2 / rtracklayer 1.30.0, and some of my code to import NCBI's gff3 files now throws an error when it worked fine with R 3.2.1 / BioC 3.1 / rtracklayer 1.28.6. I have example codes below for both versions trying to import NCBI's mouse ref_GRCm38.p3_top_level.gff3.gz downloaded from ftp://ftp.ncbi.nlm.nih.gov/genomes/M_musculus/GFF/. The new rtracklayer is throwing an error about "cannnot determine seqnames column unambiguously". I looked through the help files for ?import.gff3 in both versions, but don't see any changes. Is this a new bug? My workaround is to save the GRanges object from the old version as a .RData file and then load it into the new R/BioC, which seems to work fine. Any help in getting the new rtracklayer to read in gff file would be appreciated!
Thanks,
Jenny
R 3.2.1 / BioC 3.1 / rtracklayer 1.28.6:
R version 3.2.1 (2015-06-18) -- "World-Famous Astronaut" Copyright (C) 2015 The R Foundation for Statistical Computing Platform: x86_64-w64-mingw32/x64 (64-bit) #lines removed ... > .libPaths() [1] "C:/Users/drnevich/Documents/R/win-library/3.2" [2] "C:/Program Files/R/R-3.2.1/library" > > #Change to point to old BioC3.1 packages I saved... > > .libPaths(new = "C:/Users/drnevich/Documents/R/win-library/3.2_BioC3.1") > > library(rtracklayer) Loading required package: GenomicRanges Loading required package: BiocGenerics Loading required package: parallel #lines removed... > > setwd("D:/Statistics/Freund/Fire_sept2015/ReSeq/") > > #mouse GFF downloaded from ftp://ftp.ncbi.nlm.nih.gov/genomes/M_musculus/GFF/ > > gff0 <- import("ref_GRCm38.p3_top_level.gff3.gz") > #no errors! > > save(gff0, file = "ref_GRCm38.p3_top_level.gff3.RData") > > sessionInfo() R version 3.2.1 (2015-06-18) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods [9] base other attached packages: [1] rtracklayer_1.28.6 GenomicRanges_1.20.5 GenomeInfoDb_1.4.1 IRanges_2.2.5 [5] S4Vectors_0.6.2 BiocGenerics_0.14.0 loaded via a namespace (and not attached): [1] XML_3.98-1.3 Rsamtools_1.20.4 Biostrings_2.36.1 [4] bitops_1.0-6 GenomicAlignments_1.4.1 futile.options_1.0.0 [7] zlibbioc_1.14.0 XVector_0.8.0 futile.logger_1.4.1 [10] lambda.r_1.1.7 BiocParallel_1.2.11 tools_3.2.1 [13] RCurl_1.95-4.7
R 3.2.2 / BioC 3.2 / rtracklayer 1.30.0:
R version 3.2.2 (2015-08-14) -- "Fire Safety" Copyright (C) 2015 The R Foundation for Statistical Computing Platform: x86_64-w64-mingw32/x64 (64-bit) #lines removed... > .libPaths() [1] "C:/Users/drnevich/Documents/R/win-library/3.2" [2] "C:/Program Files/R/R-3.2.2/library" > #keep to use the new BioC 3.2 packages > > library(rtracklayer) Loading required package: GenomicRanges Loading required package: BiocGenerics Loading required package: parallel #lines removed... > > setwd("D:/Statistics/Freund/Fire_sept2015/ReSeq/") > > #mouse GFF downloaded from ftp://ftp.ncbi.nlm.nih.gov/genomes/M_musculus/GFF/ > > gff0 <- import("ref_GRCm38.p3_top_level.gff3.gz") Error in .find_seqnames_col(df_colnames0, seqnames.field0, prefix) : cannnot determine seqnames column unambiguously > > #Load in the RData file output from R 3.2.1: > load("ref_GRCm38.p3_top_level.gff3.RData") > > #Check to see if I can use it: > table(gff0$type) C_gene_segment cDNA_match CDS D_gene_segment 32 8710 937664 24 D_loop exon gene J_gene_segment 1 1170729 48835 156 match mRNA ncRNA primary_transcript 7271 78013 24746 1283 region rRNA sequence_variant transcript 195 35 6 7067 tRNA V_gene_segment 437 613 > sessionInfo() R version 3.2.2 (2015-08-14) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats4 parallel stats graphics grDevices utils [7] datasets methods base other attached packages: [1] rtracklayer_1.30.0 GenomicRanges_1.22.0 [3] GenomeInfoDb_1.6.0 IRanges_2.4.0 [5] S4Vectors_0.8.0 BiocGenerics_0.16.0 loaded via a namespace (and not attached): [1] XML_3.98-1.3 Rsamtools_1.22.0 [3] Biostrings_2.38.0 GenomicAlignments_1.6.0 [5] bitops_1.0-6 futile.options_1.0.0 [7] zlibbioc_1.16.0 XVector_0.10.0 [9] futile.logger_1.4.1 lambda.r_1.1.7 [11] BiocParallel_1.4.0 tools_3.2.2 [13] Biobase_2.30.0 RCurl_1.95-4.7 [15] SummarizedExperiment_1.0.0
Glad you got this in time for your workshop! H.