Question: How to create tx2gene data.frame when there's no TxDb object for the organism you are working with.
gravatar for prab4th
5 months ago by
prab4th0 wrote:

I have been following the workflow available at [Importing transcript abundance datasets with tximport]( and it is required to use a TxDb object. I am working with Rice and there isn't a TxDb object for rice. But rice has a BSgenome object.

Is there any way I can use the BSgenome object? I just want to use my Salmon output to be used in DESeq2.

ADD COMMENTlink modified 5 months ago by Johannes Rainer1.2k • written 5 months ago by prab4th0
gravatar for Hervé Pagès
5 months ago by
Hervé Pagès ♦♦ 13k
United States
Hervé Pagès ♦♦ 13k wrote:


Alternatively, you can use makeTxDbFromBiomart() to make a TxDb object from the Ensembl Plants mart:

mart <- useMart(biomart="plants_mart", host="")
datasets <- listDatasets(mart)
datasets[1:6 , 1:2]
#                dataset                               description
# 1    atauschii_eg_gene              Aegilops tauschii genes (...
# 2 obrachyantha_eg_gene Oryza brachyantha genes (Oryza_brachya...
# 3 ptrichocarpa_eg_gene                Populus trichocarpa gen...
# 4     ppersica_eg_gene                   Prunus persica genes...
# 5   stuberosum_eg_gene              Solanum tuberosum genes (...
# 6     sitalica_eg_gene                   Setaria italica gene...
idx <- grep("oryza", datasets$description,
datasets[idx, 1:2]
#                    dataset                           description
# 2     obrachyantha_eg_gene  Oryza brachyantha genes (Oryza_br...
# 8          onivara_eg_gene                  Oryza nivara gene...
# 14       opunctata_eg_gene                Oryza punctata gene...
# 15         oindica_eg_gene               Oryza sativa Indica ...
# 18   oglumaepatula_eg_gene            Oryza glumaepatula gene...
# 19        obarthii_eg_gene                 Oryza barthii gene...
# 20         osativa_eg_gene            Oryza sativa Japonica g...
# 25   omeridionalis_eg_gene Oryza meridionalis genes (Oryza_me...
# 28      orufipogon_eg_gene                   Oryza rufipogon ...
# 38 olongistaminata_eg_gene Oryza longistaminata genes (O_long...
# 41     oglaberrima_eg_gene                    Oryza glaberrim...

Choose your dataset of interest (e.g. osativa_eg_gene), then:

txdb <- makeTxDbFromBiomart(biomart="plants_mart",

Please note that some important tweaks were made to makeTxDbFromBiomart() last week to improve its support for EnsemblGenomes (see here A: Errors with makeTxDbFromBiomart for the details) so make sure you use the latest version of GenomicFeatures (1.28.5) before trying the above.



ADD COMMENTlink modified 5 months ago • written 5 months ago by Hervé Pagès ♦♦ 13k


This worked! thank you :) If you post your answer as a top level comment I can accept it.


Alsolibrary(GenomicFeatures) should be there for makeTxDbFromBiomart() to work.

ADD REPLYlink modified 5 months ago • written 5 months ago by prab4th0

Done. I added the library(GenomicFeatures) line. Thanks for the feedback!



ADD REPLYlink written 5 months ago by Hervé Pagès ♦♦ 13k
gravatar for Johannes Rainer
5 months ago by
Johannes Rainer1.2k
Johannes Rainer1.2k wrote:

Ensemblgenomes provides gene models for many plants. Check . You could either download a gtf or gff3 file for rice from there and build a TxDb using makeTxDbFromGff (GenomicFeatures package) or, since the data is in Ensembl format, an EnsDb using ensDbFromGtf (ensembldb package - EnsDb and TxDb packages/databases provide the same functionality/annotations).

For EnsDb, creating an EnsDb from a GTF you might lack some annotations since they are not provided in the file. If you tell me what release and species (which of the many oryza forms e.g. oryza_sativa, oryza_meridionalis etc) you'd need, I could build the EnsDb database/package for you directly from the ensemblgenomes MySQL databases - just let me know.

cheers, jo

ADD COMMENTlink written 5 months ago by Johannes Rainer1.2k

I'll try the `makeTxDbFromGff` first and get back to you if l couldn't get it to work. Thanks Jo

ADD REPLYlink written 5 months ago by prab4th0

These were the files availble for Oryza sativa:

File: Oryza_sativa.IRGSP-1.0.37.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chr.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.abinitio.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.1.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.3.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.2.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.4.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.6.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.5.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.7.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.8.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.11.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.12.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.9.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.10.gff3.gz

Should I use the first file or should I combine each chromosome files in some way before feeding it into makeTxDbfromGFF?

ADD REPLYlink written 5 months ago by prab4th0

I would use the first one - or the second, which to my understanding contains only genes encoded on chromosomes (the other might contain also containing genes encoded in contigs).

ADD REPLYlink written 5 months ago by Johannes Rainer1.2k


Hey, I'm using deseq2 after kallisto to analyze rice data. I'm using an ensembl gtf and I want to create a txdb. I used this function: 

txdb2 <- makeTxDbFromGFF(file="C:/Users/Dee/Desktop/Thesis_rice/Oryza_sativa.IRGSP-1.0.37.gtf", dataSource=paste("",sep=""), organism="Oryza sativa")

and I got that error:

Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... Error in c(x, value) : 
  could not find symbol "recursive" in environment of the generic function

any help?


ADD REPLYlink written 3 months ago by dina.hesham1390

Might be a problem in the makeTxDbFromGFF function from the GenomicFeatures package. It works with the ensDbFromGtf from the ensembldb package.

> library(ensembldb)
> dbf <- ensDbFromGtf("Oryza_sativa.IRGSP-1.0.37.gtf.gz")
Importing GTF file ... OK
Processing metadata ... OK
Processing genes ...
 Attribute availability:
  o gene_id ... OK
  o gene_name ... OK
  o entrezid ... Nope
  o gene_biotype ... OK
Processing transcripts ...
 Attribute availability:
  o transcript_id ... OK
  o gene_id ... OK
  o transcript_biotype ... OK
Processing exons ... OK
Processing chromosomes ... Fetch seqlengths from ensembl ... OK
Generating index ... OK
Verifying validity of the information in the database:
Checking transcripts ... OK
Checking exons ... OK
Warning messages:
1: call dbDisconnect() when finished working with a connection
2: In ensDbFromGRanges(GTF, outfile = outfile, path = path, organism = organism,  :
   I'm missing column(s): 'entrezid'. The corresponding database column(s) will be empty!
3: closing unused connection 7 (
4: closing unused connection 6 (
5: closing unused connection 5 (
6: closing unused connection 4 (
7: closing unused connection 3 (
> edb <- EnsDb(dbf)
> edb
EnsDb for Ensembl:
|Backend: SQLite
|Db type: EnsDb
|Type of Gene ID: Ensembl Gene ID
|Supporting package: ensembldb
|Db created by: ensembldb package from Bioconductor
|script_version: 0.0.1
|Creation time: Sat Nov 25 18:53:08 2017
|ensembl_version: 37
|ensembl_host: unknown
|Organism: Oryza_sativa
|genome_build: IRGSP-1.0
|source_file: Oryza_sativa.IRGSP-1.0.37.gtf.gz
| No. of genes: 91992.
| No. of transcripts: 98663.

cheers, jo

ADD REPLYlink modified 3 months ago • written 3 months ago by Johannes Rainer1.2k

Thanks alot!!

ADD REPLYlink written 3 months ago by dina.hesham1390


is there any reason you used kallisto over Salmon?

ADD REPLYlink written 3 months ago by prab4th0

I'm using both for comparison.



ADD REPLYlink written 3 months ago by dina.hesham1390
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 226 users visited in the last hour