Error when trying to load a gff3 with GenomicFeatures
1
0
Entering edit mode
@gilhornung-9503
Last seen 4.4 years ago

Hi,

I downloaded the GFF file for S. cerevisiae from NCBI:

ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_000146045.2_R64/GCF_000146045.2_R64_genomic.gff.gz

I'm trying to open the file with the following command:

txdb <- makeTranscriptDbFromGFF("genomes/Saccharomyces_cerevisiae/GCF_000146045.2_R64_genomic.gff",
                                format="gff3")

 And I get the following error:

extracting transcript information
Extracting gene IDs
extracting transcript information
Processing splicing information for gff3 file.
Deducing exon rank from relative coordinates provided
Prepare the 'metadata' data frame ... metadata: OK
Now generating chrominfo from available sequence names. No chromosome length information is available.
Error in sqliteSendQuery(con, statement, bind.data) :
  rsqlite_query_send: could not execute: UNIQUE constraint failed: splicing._tx_id, splicing.exon_rank
In addition: Warning messages:
1: In .deduceExonRankings(exs, format = "gff") :
  Infering Exon Rankings.  If this is not what you expected, then please be sure that you have provided a valid attribute for exonRankAttributeName
2: In matchCircularity(chroms, circ_seqs) :
  None of the strings in your circ_seqs argument match your seqnames.
3: 'dbBeginTransaction' is deprecated.
Use 'dbBegin' instead.
See help("Deprecated")
4: 'dbBeginTransaction' is deprecated.
Use 'dbBegin' instead.
See help("Deprecated")
5: 'dbBeginTransaction' is deprecated.
Use 'dbBegin' instead.
See help("Deprecated")
6: 'dbBeginTransaction' is deprecated.
Use 'dbBegin' instead.
See help("Deprecated")
7: 'dbBeginTransaction' is deprecated.
Use 'dbBegin' instead.
See help("Deprecated") 

> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8    
[8] LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GenomicFeatures_1.18.7   AnnotationDbi_1.28.2     Biobase_2.26.0           DESeq2_1.6.3             RcppArmadillo_0.5.000.0  Rcpp_0.12.3              VariantAnnotation_1.12.9
[8] Rsamtools_1.18.3         Biostrings_2.34.1        XVector_0.6.0            GenomicRanges_1.18.4     GenomeInfoDb_1.2.5       IRanges_2.0.1            S4Vectors_0.4.0       
[15] BiocGenerics_0.12.1      amap_0.8-14              matrixStats_0.14.2       ggplot2_1.0.1            gplots_2.17.0            RColorBrewer_1.1-2       BiocParallel_1.0.3      

loaded via a namespace (and not attached):
[1] acepack_1.3-3.3         annotate_1.44.0         base64enc_0.1-3         BatchJobs_1.6           BBmisc_1.9              biomaRt_2.22.0          bitops_1.0-6            brew_1.0-6           
[9] BSgenome_1.34.1         caTools_1.17.1          checkmate_1.5.2         cluster_2.0.3           codetools_0.2-14        colorspace_1.2-6        DBI_0.3.1               digest_0.6.9         
[17] fail_1.3                foreach_1.4.3           foreign_0.8-66          Formula_1.2-1           gdata_2.17.0            genefilter_1.48.1       geneplotter_1.44.0      GenomicAlignments_1.2.2
[25] grid_3.1.1              gridExtra_2.0.0         gtable_0.1.2            gtools_3.5.0            Hmisc_3.17-0            iterators_1.0.8         KernSmooth_2.23-15      lattice_0.20-31      
[33] latticeExtra_0.6-28     locfit_1.5-9.1          magrittr_1.5            MASS_7.3-40             munsell_0.4.3           nnet_7.3-12             plyr_1.8.2              proto_0.3-10         
[41] RCurl_1.95-4.7          reshape2_1.4.1          rpart_4.1-10            RSQLite_1.0.0           rtracklayer_1.26.3      scales_0.3.0            sendmailR_1.2-1         splines_3.1.1        
[49] stringi_1.0-1           stringr_1.0.0           survival_2.38-3         tools_3.1.1             XML_3.98-1.3            xtable_1.8-2            zlibbioc_1.12.0        

 

 
genomicfeatures • 1.1k views
ADD COMMENT
0
Entering edit mode
Mike Smith ★ 6.5k
@mike-smith
Last seen 6 hours ago
EMBL Heidelberg

You're using an old version of R and the GenomicFeatures package.  Perhaps try upgrading to R-3.2.4 and get the most recent version of GenomicFeatures.  The makeTranscriptDbFromGFF() function has been deprecated, removed, and replaced by makeTxDbFromGFF()

This combination works for me:

txdb <- makeTxDbFromGFF("GCF_000146045.2_R64_genomic.gff", format = "gff3")
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Warning message:
In .find_exon_cds(exons, cds) :
  The following transcripts have exons that contain more than one CDS
  (only the first CDS was kept for each exon): rna1045, rna114, rna1154,
  rna1156, rna1208, rna1210, rna1266, rna1318, rna1738, rna1765, rna1867,
  rna210, rna2230, rna2249, rna228, rna2320, rna2377, rna2379, rna2559,
  rna2805, rna2911, rna2983, rna3144, rna3289, rna3291, rna4010, rna4084,
  rna4269, rna4420, rna4426, rna4522, rna4529, rna4873, rna5098, rna5303,
  rna5557, rna5610, rna5655, rna5755, rna576, rna5834, rna6032, rna6040,
  rna6223, rna6247, rna6249, rna973

 

ADD COMMENT
0
Entering edit mode

Thank you, Mike!

ADD REPLY

Login before adding your answer.

Traffic: 483 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6