Search
Question: Error when trying to load a gff3 with GenomicFeatures
0
gravatar for gil.hornung
20 months ago by
gil.hornung0 wrote:

Hi,

I downloaded the GFF file for S. cerevisiae from NCBI:

ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_000146045.2_R64/GCF_000146045.2_R64_genomic.gff.gz

I'm trying to open the file with the following command:

txdb <- makeTranscriptDbFromGFF("genomes/Saccharomyces_cerevisiae/GCF_000146045.2_R64_genomic.gff",
                                format="gff3")

 And I get the following error:

extracting transcript information
Extracting gene IDs
extracting transcript information
Processing splicing information for gff3 file.
Deducing exon rank from relative coordinates provided
Prepare the 'metadata' data frame ... metadata: OK
Now generating chrominfo from available sequence names. No chromosome length information is available.
Error in sqliteSendQuery(con, statement, bind.data) :
  rsqlite_query_send: could not execute: UNIQUE constraint failed: splicing._tx_id, splicing.exon_rank
In addition: Warning messages:
1: In .deduceExonRankings(exs, format = "gff") :
  Infering Exon Rankings.  If this is not what you expected, then please be sure that you have provided a valid attribute for exonRankAttributeName
2: In matchCircularity(chroms, circ_seqs) :
  None of the strings in your circ_seqs argument match your seqnames.
3: 'dbBeginTransaction' is deprecated.
Use 'dbBegin' instead.
See help("Deprecated")
4: 'dbBeginTransaction' is deprecated.
Use 'dbBegin' instead.
See help("Deprecated")
5: 'dbBeginTransaction' is deprecated.
Use 'dbBegin' instead.
See help("Deprecated")
6: 'dbBeginTransaction' is deprecated.
Use 'dbBegin' instead.
See help("Deprecated")
7: 'dbBeginTransaction' is deprecated.
Use 'dbBegin' instead.
See help("Deprecated") 

> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8    
[8] LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GenomicFeatures_1.18.7   AnnotationDbi_1.28.2     Biobase_2.26.0           DESeq2_1.6.3             RcppArmadillo_0.5.000.0  Rcpp_0.12.3              VariantAnnotation_1.12.9
[8] Rsamtools_1.18.3         Biostrings_2.34.1        XVector_0.6.0            GenomicRanges_1.18.4     GenomeInfoDb_1.2.5       IRanges_2.0.1            S4Vectors_0.4.0       
[15] BiocGenerics_0.12.1      amap_0.8-14              matrixStats_0.14.2       ggplot2_1.0.1            gplots_2.17.0            RColorBrewer_1.1-2       BiocParallel_1.0.3      

loaded via a namespace (and not attached):
[1] acepack_1.3-3.3         annotate_1.44.0         base64enc_0.1-3         BatchJobs_1.6           BBmisc_1.9              biomaRt_2.22.0          bitops_1.0-6            brew_1.0-6           
[9] BSgenome_1.34.1         caTools_1.17.1          checkmate_1.5.2         cluster_2.0.3           codetools_0.2-14        colorspace_1.2-6        DBI_0.3.1               digest_0.6.9         
[17] fail_1.3                foreach_1.4.3           foreign_0.8-66          Formula_1.2-1           gdata_2.17.0            genefilter_1.48.1       geneplotter_1.44.0      GenomicAlignments_1.2.2
[25] grid_3.1.1              gridExtra_2.0.0         gtable_0.1.2            gtools_3.5.0            Hmisc_3.17-0            iterators_1.0.8         KernSmooth_2.23-15      lattice_0.20-31      
[33] latticeExtra_0.6-28     locfit_1.5-9.1          magrittr_1.5            MASS_7.3-40             munsell_0.4.3           nnet_7.3-12             plyr_1.8.2              proto_0.3-10         
[41] RCurl_1.95-4.7          reshape2_1.4.1          rpart_4.1-10            RSQLite_1.0.0           rtracklayer_1.26.3      scales_0.3.0            sendmailR_1.2-1         splines_3.1.1        
[49] stringi_1.0-1           stringr_1.0.0           survival_2.38-3         tools_3.1.1             XML_3.98-1.3            xtable_1.8-2            zlibbioc_1.12.0        

 

 
ADD COMMENTlink modified 20 months ago by Mike Smith2.1k • written 20 months ago by gil.hornung0
0
gravatar for Mike Smith
20 months ago by
Mike Smith2.1k
EMBL Heidelberg / de.NBI
Mike Smith2.1k wrote:

You're using an old version of R and the GenomicFeatures package.  Perhaps try upgrading to R-3.2.4 and get the most recent version of GenomicFeatures.  The makeTranscriptDbFromGFF() function has been deprecated, removed, and replaced by makeTxDbFromGFF()

This combination works for me:

txdb <- makeTxDbFromGFF("GCF_000146045.2_R64_genomic.gff", format = "gff3")
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Warning message:
In .find_exon_cds(exons, cds) :
  The following transcripts have exons that contain more than one CDS
  (only the first CDS was kept for each exon): rna1045, rna114, rna1154,
  rna1156, rna1208, rna1210, rna1266, rna1318, rna1738, rna1765, rna1867,
  rna210, rna2230, rna2249, rna228, rna2320, rna2377, rna2379, rna2559,
  rna2805, rna2911, rna2983, rna3144, rna3289, rna3291, rna4010, rna4084,
  rna4269, rna4420, rna4426, rna4522, rna4529, rna4873, rna5098, rna5303,
  rna5557, rna5610, rna5655, rna5755, rna576, rna5834, rna6032, rna6040,
  rna6223, rna6247, rna6249, rna973

 

ADD COMMENTlink written 20 months ago by Mike Smith2.1k

Thank you, Mike!

ADD REPLYlink written 19 months ago by gil.hornung0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 110 users visited in the last hour