Ensembl Fungi GFF3 with GenomicFeatures : makeTranscriptDbFromGFF
1
0
Entering edit mode
Mikko Arvas ▴ 90
@mikko-arvas-1213
Last seen 10.0 years ago
European Union

Hi,

I want to count hits from BAMs with the help of GenomicFeatures transcriptdb. I got Trichoderma reesei gff3 from EnsemblFungi ftp://ftp.ensemblgenomes.org/pub/fungi/release-24/gff3/trichoderma_reesei/Trichoderma_reesei.GCA_000167675.2.24.gff3.gz

>  hse <- makeTranscriptDbFromGFF("Trichoderma_reesei.GCA_000167675.2.24.gff3", format="gff3",exonRankAttributeName='rank',dataSource="Ensembl  Trichoderma_reesei.GCA_000167675.2.24.gff", species="Trichoderma reesei")
Prepare the 'metadata' data frame ... metadata: OK
Now generating chrominfo from available sequence names. No chromosome length information is available.
Error in unique(tables[["transcripts"]][["tx_chrom"]]) :
  error in evaluating the argument 'x' in selecting a method for function 'unique': Error: object 'tables' not found

Making a chrominfo from the GFF does not really help just changes the error message slightly:

>  hse <- makeTranscriptDbFromGFF("Trichoderma_reesei.GCA_000167675.2.23.gff3", format="gff3",chrominfo=chrominfo,exonRankAttributeName='rank',dataSource="Ensembl  Trichoderma_reesei.GCA_000167675.2.23.gff", species="Trichoderma reesei")
Prepare the 'metadata' data frame ... metadata: OK
Error in is.data.frame(arg) : object 'tables' not found
In addition: Warning message:
In if is.na(chrominfo)) { :
  the condition has length > 1 and only the first element will be used

Trichoderma_reesei.GCA_000167675.2.24.gff3.gz does not really pass http://genometools.org/cgi-bin/gff3validator.cgi, but making it to pass by changing (ID,PARENT,NAME) attributes to (id,parent,name), just gives the same error message from makeTranscriptDbFromGFF.

Any help and comments (even though my mistake could be very naive) would be much appreciated.

Best,

Mikko Arvas

> traceback()
2: unique(tables[["transcripts"]][["tx_chrom"]])
1: makeTranscriptDbFromGFF("/mnt/vttspace/Genomes/Trichoderma_reesei/ForGenomeBrowser/Trichoderma_reesei.GCA_000167675.2.23.gff3",
       format = "gff3", exonRankAttributeName = "rank", dataSource = "Ensembl  Trichoderma_reesei.GCA_000167675.2.23.gff",
       species = "Trichoderma reesei")
 

> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
[1] GenomicFeatures_1.18.2 AnnotationDbi_1.28.1   Biobase_2.26.0        
[4] GenomicRanges_1.18.3   GenomeInfoDb_1.2.3     IRanges_2.0.0         
[7] S4Vectors_0.4.0        BiocGenerics_0.12.1   

loaded via a namespace (and not attached):
 [1] base64enc_0.1-2         BatchJobs_1.5           BBmisc_1.8             
 [4] BiocParallel_1.0.0      biomaRt_2.22.0          Biostrings_2.34.0      
 [7] bitops_1.0-6            brew_1.0-6              checkmate_1.5.0        
[10] codetools_0.2-9         compiler_3.1.1          DBI_0.3.1              
[13] digest_0.6.4            fail_1.2                foreach_1.4.2          
[16] GenomicAlignments_1.2.1 iterators_1.0.7         RCurl_1.95-4.3         
[19] Rsamtools_1.18.2        RSQLite_1.0.0           rtracklayer_1.26.2     
[22] sendmailR_1.2-1         stringr_0.6.2           tools_3.1.1            
[25] XML_3.98-1.1            XVector_0.6.0           zlibbioc_1.12.0   

genomicfeatures • 1.7k views
ADD COMMENT
0
Entering edit mode
Marc Carlson ★ 7.2k
@marc-carlson-2264
Last seen 8.3 years ago
United States

Hi Mikko!

That is an unusual looking gff file you have there.  Instead of 'Parent' it uses 'PARENT', and instead of 'mRNA' it uses 'transcript'.  There were some other oddities too, but I made changes to the makeTxDbFromGFF() function in the development branch so that it will be more robust to these kinds of things.  In a day or so the new version of GenomicFeatures (version 1.19.7) should pop up online and you should be able to do this:

library(GenomicFeatures)

hse <- makeTxDbFromGFF("Trichoderma_reesei.GCA_000167675.2.24.gff3",
                       format="gff3",
                       exonRankAttributeName='rank',
                       dataSource="Ensembl  Trichoderma_reesei.GCA_000167675.2.24.gff",
                       species="Trichoderma reesei",
                       circ_seqs=character(),
                       gffTxName= "transcript")

 

Notice that I  also added a new argument (gffTxName) for when you have a range "type" (3rd column in a gff file) for the transcript ranges in a gff file that is 'non-standard' as is being used here by ensembl.  The "expected" thing to find here for messages is actually 'mRNA'.  But we have opted to accommodate alternate strings here by allowing you to specify this argument.

Anyhow I hope this helps you,

 

 Marc

 

 

ADD COMMENT
0
Entering edit mode

Hi Mikko, Marc,

In the meantime note that you can make the TxDb from the Ensembl Fungi Mart with:

txdb <- makeTranscriptDbFromBiomart("fungi_mart_23", dataset="treesei_eg_gene")

Then:

> txdb
TxDb object:
| Db type: TxDb
| Supporting package: GenomicFeatures
| Data source: BioMart
| Organism: Trichoderma reesei
| Resource URL: www.biomart.org:80
| BioMart database: fungi_mart_23
| BioMart database version: ENSEMBL FUNGI 23 (EBI UK)
| BioMart dataset: treesei_eg_gene
| BioMart dataset description: Trichoderma reesei genes (v2.0 (2011-07-EnsemblFungi))
| BioMart dataset version: v2.0 (2011-07-EnsemblFungi)
| Full dataset: yes
| miRBase build ID: NA
| transcript_nrow: 9395
| exon_nrow: 27960
| cds_nrow: 27644
| Db created by: GenomicFeatures package from Bioconductor
| Creation time: 2014-12-11 16:03:37 -0800 (Thu, 11 Dec 2014)
| GenomicFeatures version at creation time: 1.18.2
| RSQLite version at creation time: 1.0.0
| DBSCHEMAVERSION: 1.0

Best,

H.

 

ADD REPLY

Login before adding your answer.

Traffic: 451 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6