Search
Question: Ensembl Fungi GFF3 with GenomicFeatures : makeTranscriptDbFromGFF
0
gravatar for Mikko Arvas
3.0 years ago by
Mikko Arvas90
European Union
Mikko Arvas90 wrote:

Hi,

I want to count hits from BAMs with the help of GenomicFeatures transcriptdb. I got Trichoderma reesei gff3 from EnsemblFungi ftp://ftp.ensemblgenomes.org/pub/fungi/release-24/gff3/trichoderma_reesei/Trichoderma_reesei.GCA_000167675.2.24.gff3.gz

>  hse <- makeTranscriptDbFromGFF("Trichoderma_reesei.GCA_000167675.2.24.gff3", format="gff3",exonRankAttributeName='rank',dataSource="Ensembl  Trichoderma_reesei.GCA_000167675.2.24.gff", species="Trichoderma reesei")
Prepare the 'metadata' data frame ... metadata: OK
Now generating chrominfo from available sequence names. No chromosome length information is available.
Error in unique(tables[["transcripts"]][["tx_chrom"]]) :
  error in evaluating the argument 'x' in selecting a method for function 'unique': Error: object 'tables' not found

Making a chrominfo from the GFF does not really help just changes the error message slightly:

>  hse <- makeTranscriptDbFromGFF("Trichoderma_reesei.GCA_000167675.2.23.gff3", format="gff3",chrominfo=chrominfo,exonRankAttributeName='rank',dataSource="Ensembl  Trichoderma_reesei.GCA_000167675.2.23.gff", species="Trichoderma reesei")
Prepare the 'metadata' data frame ... metadata: OK
Error in is.data.frame(arg) : object 'tables' not found
In addition: Warning message:
In if is.na(chrominfo)) { :
  the condition has length > 1 and only the first element will be used

Trichoderma_reesei.GCA_000167675.2.24.gff3.gz does not really pass http://genometools.org/cgi-bin/gff3validator.cgi, but making it to pass by changing (ID,PARENT,NAME) attributes to (id,parent,name), just gives the same error message from makeTranscriptDbFromGFF.

Any help and comments (even though my mistake could be very naive) would be much appreciated.

Best,

Mikko Arvas

> traceback()
2: unique(tables[["transcripts"]][["tx_chrom"]])
1: makeTranscriptDbFromGFF("/mnt/vttspace/Genomes/Trichoderma_reesei/ForGenomeBrowser/Trichoderma_reesei.GCA_000167675.2.23.gff3",
       format = "gff3", exonRankAttributeName = "rank", dataSource = "Ensembl  Trichoderma_reesei.GCA_000167675.2.23.gff",
       species = "Trichoderma reesei")
 

> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
[1] GenomicFeatures_1.18.2 AnnotationDbi_1.28.1   Biobase_2.26.0        
[4] GenomicRanges_1.18.3   GenomeInfoDb_1.2.3     IRanges_2.0.0         
[7] S4Vectors_0.4.0        BiocGenerics_0.12.1   

loaded via a namespace (and not attached):
 [1] base64enc_0.1-2         BatchJobs_1.5           BBmisc_1.8             
 [4] BiocParallel_1.0.0      biomaRt_2.22.0          Biostrings_2.34.0      
 [7] bitops_1.0-6            brew_1.0-6              checkmate_1.5.0        
[10] codetools_0.2-9         compiler_3.1.1          DBI_0.3.1              
[13] digest_0.6.4            fail_1.2                foreach_1.4.2          
[16] GenomicAlignments_1.2.1 iterators_1.0.7         RCurl_1.95-4.3         
[19] Rsamtools_1.18.2        RSQLite_1.0.0           rtracklayer_1.26.2     
[22] sendmailR_1.2-1         stringr_0.6.2           tools_3.1.1            
[25] XML_3.98-1.1            XVector_0.6.0           zlibbioc_1.12.0   

ADD COMMENTlink modified 3.0 years ago by Marc Carlson7.2k • written 3.0 years ago by Mikko Arvas90
0
gravatar for Marc Carlson
3.0 years ago by
Marc Carlson7.2k
United States
Marc Carlson7.2k wrote:

Hi Mikko!

That is an unusual looking gff file you have there.  Instead of 'Parent' it uses 'PARENT', and instead of 'mRNA' it uses 'transcript'.  There were some other oddities too, but I made changes to the makeTxDbFromGFF() function in the development branch so that it will be more robust to these kinds of things.  In a day or so the new version of GenomicFeatures (version 1.19.7) should pop up online and you should be able to do this:

library(GenomicFeatures)

hse <- makeTxDbFromGFF("Trichoderma_reesei.GCA_000167675.2.24.gff3",
                       format="gff3",
                       exonRankAttributeName='rank',
                       dataSource="Ensembl  Trichoderma_reesei.GCA_000167675.2.24.gff",
                       species="Trichoderma reesei",
                       circ_seqs=character(),
                       gffTxName= "transcript")

 

Notice that I  also added a new argument (gffTxName) for when you have a range "type" (3rd column in a gff file) for the transcript ranges in a gff file that is 'non-standard' as is being used here by ensembl.  The "expected" thing to find here for messages is actually 'mRNA'.  But we have opted to accommodate alternate strings here by allowing you to specify this argument.

Anyhow I hope this helps you,

 

 Marc

 

 

ADD COMMENTlink written 3.0 years ago by Marc Carlson7.2k

Hi Mikko, Marc,

In the meantime note that you can make the TxDb from the Ensembl Fungi Mart with:

txdb <- makeTranscriptDbFromBiomart("fungi_mart_23", dataset="treesei_eg_gene")

Then:

> txdb
TxDb object:
| Db type: TxDb
| Supporting package: GenomicFeatures
| Data source: BioMart
| Organism: Trichoderma reesei
| Resource URL: www.biomart.org:80
| BioMart database: fungi_mart_23
| BioMart database version: ENSEMBL FUNGI 23 (EBI UK)
| BioMart dataset: treesei_eg_gene
| BioMart dataset description: Trichoderma reesei genes (v2.0 (2011-07-EnsemblFungi))
| BioMart dataset version: v2.0 (2011-07-EnsemblFungi)
| Full dataset: yes
| miRBase build ID: NA
| transcript_nrow: 9395
| exon_nrow: 27960
| cds_nrow: 27644
| Db created by: GenomicFeatures package from Bioconductor
| Creation time: 2014-12-11 16:03:37 -0800 (Thu, 11 Dec 2014)
| GenomicFeatures version at creation time: 1.18.2
| RSQLite version at creation time: 1.0.0
| DBSCHEMAVERSION: 1.0

Best,

H.

 

ADD REPLYlink written 3.0 years ago by Hervé Pagès ♦♦ 13k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 421 users visited in the last hour