Question

How to create tx2gene data.frame when there's no TxDb object for the organism you are working with.

0

Entering edit mode

prab4th • 0

@prab4th-14026

Last seen 21 months ago

United States

I have been following the workflow available at [Importing transcript abundance datasets with tximport](http://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html) and it is required to use a TxDb object. I am working with Rice and there isn't a TxDb object for rice. But rice has a BSgenome object.

Is there any way I can use the BSgenome object? I just want to use my Salmon output to be used in DESeq2.

tximport deseq2 annotationdbi • 3.9k views

ADD COMMENT • link updated 6.6 years ago by Johannes Rainer ★ 2.0k • written 6.6 years ago by prab4th • 0

2

Entering edit mode

Johannes Rainer ★ 2.0k

@johannes-rainer-6987

Last seen 5 weeks ago

Italy

Ensemblgenomes provides gene models for many plants. Check http://plants.ensembl.org/index.html . You could either download a gtf or gff3 file for rice from there and build a TxDb using makeTxDbFromGff (GenomicFeatures package) or, since the data is in Ensembl format, an EnsDb using ensDbFromGtf (ensembldb package - EnsDb and TxDb packages/databases provide the same functionality/annotations).

For EnsDb, creating an EnsDb from a GTF you might lack some annotations since they are not provided in the file. If you tell me what release and species (which of the many oryza forms e.g. oryza_sativa, oryza_meridionalis etc) you'd need, I could build the EnsDb database/package for you directly from the ensemblgenomes MySQL databases - just let me know.

cheers, jo

ADD COMMENT • link 6.6 years ago Johannes Rainer ★ 2.0k

0

Entering edit mode

I'll try the `makeTxDbFromGff` first and get back to you if l couldn't get it to work. Thanks Jo

ADD REPLY • link 6.6 years ago prab4th • 0

0

Entering edit mode

These were the files availble for Oryza sativa: ftp://ftp.ensemblgenomes.org/pub/plants/release-37/gff3/oryza_sativa/

File: Oryza_sativa.IRGSP-1.0.37.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chr.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.abinitio.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.1.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.3.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.2.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.4.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.6.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.5.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.7.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.8.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.11.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.12.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.9.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.10.gff3.gz

Should I use the first file or should I combine each chromosome files in some way before feeding it into makeTxDbfromGFF?

ADD REPLY • link 6.6 years ago prab4th • 0

1

Entering edit mode

I would use the first one - or the second, which to my understanding contains only genes encoded on chromosomes (the other might contain also containing genes encoded in contigs).

ADD REPLY • link 6.6 years ago Johannes Rainer ★ 2.0k

0

Entering edit mode

Hey,

Hey, I'm using deseq2 after kallisto to analyze rice data. I'm using an ensembl gtf and I want to create a txdb. I used this function:

txdb2 <- makeTxDbFromGFF(file="C:/Users/Dee/Desktop/Thesis_rice/Oryza_sativa.IRGSP-1.0.37.gtf", dataSource=paste("ftp://ftp.ensemblgenomes.org/pub/plants/release-37/gtf/oryza_sativa/",sep=""), organism="Oryza sativa")

and I got that error:

Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... Error in c(x, value) :
could not find symbol "recursive" in environment of the generic function

any help?

ADD REPLY • link 6.5 years ago dina.hesham139 • 0

1

Entering edit mode

Might be a problem in the makeTxDbFromGFF function from the GenomicFeatures package. It works with the ensDbFromGtf from the ensembldb package.

> library(ensembldb)
> dbf <- ensDbFromGtf("Oryza_sativa.IRGSP-1.0.37.gtf.gz")
Importing GTF file ... OK
Processing metadata ... OK
Processing genes ...
 Attribute availability:
  o gene_id ... OK
  o gene_name ... OK
  o entrezid ... Nope
  o gene_biotype ... OK
OK
Processing transcripts ...
 Attribute availability:
  o transcript_id ... OK
  o gene_id ... OK
  o transcript_biotype ... OK
OK
Processing exons ... OK
Processing chromosomes ... Fetch seqlengths from ensembl ... OK
Generating index ... OK
  -------------
Verifying validity of the information in the database:
Checking transcripts ... OK
Checking exons ... OK
Warning messages:
1: call dbDisconnect() when finished working with a connection
2: In ensDbFromGRanges(GTF, outfile = outfile, path = path, organism = organism,  :
   I'm missing column(s): 'entrezid'. The corresponding database column(s) will be empty!
3: closing unused connection 7 (ftp://ftp.ensemblgenomes.org/pub/release-37/plants/mysql/)
4: closing unused connection 6 (ftp://ftp.ensemblgenomes.org/pub/release-37/metazoa/mysql/)
5: closing unused connection 5 (ftp://ftp.ensemblgenomes.org/pub/release-37/fungi/mysql/)
6: closing unused connection 4 (ftp://ftp.ensemblgenomes.org/pub/release-37/bacteria/mysql/)
7: closing unused connection 3 (ftp://ftp.ensembl.org/pub/release-37/mysql/)
> edb <- EnsDb(dbf)
> edb
EnsDb for Ensembl:
|Backend: SQLite
|Db type: EnsDb
|Type of Gene ID: Ensembl Gene ID
|Supporting package: ensembldb
|Db created by: ensembldb package from Bioconductor
|script_version: 0.0.1
|Creation time: Sat Nov 25 18:53:08 2017
|ensembl_version: 37
|ensembl_host: unknown
|Organism: Oryza_sativa
|genome_build: IRGSP-1.0
|DBSCHEMAVERSION: 1.0
|source_file: Oryza_sativa.IRGSP-1.0.37.gtf.gz
| No. of genes: 91992.
| No. of transcripts: 98663.
>

cheers, jo

ADD REPLY • link 6.5 years ago Johannes Rainer ★ 2.0k

0

Entering edit mode

Thanks alot!!

ADD REPLY • link 6.5 years ago dina.hesham139 • 0

0

Entering edit mode

Hi,

is there any reason you used kallisto over Salmon?

ADD REPLY • link 6.4 years ago prab4th • 0

0

Entering edit mode

I'm using both for comparison.

cheers,

Dina

ADD REPLY • link 6.4 years ago dina.hesham139 • 0

score 4 · Accepted Answer · 2017-09-25

Hi,

Alternatively, you can use makeTxDbFromBiomart() to make a TxDb object from the Ensembl Plants mart:

library(biomaRt)
mart <- useMart(biomart="plants_mart", host="plants.ensembl.org")
datasets <- listDatasets(mart)
datasets[1:6 , 1:2]
#                dataset                               description
# 1    atauschii_eg_gene              Aegilops tauschii genes (...
# 2 obrachyantha_eg_gene Oryza brachyantha genes (Oryza_brachya...
# 3 ptrichocarpa_eg_gene                Populus trichocarpa gen...
# 4     ppersica_eg_gene                   Prunus persica genes...
# 5   stuberosum_eg_gene              Solanum tuberosum genes (...
# 6     sitalica_eg_gene                   Setaria italica gene...
idx <- grep("oryza", datasets$description, ignore.case=TRUE)
datasets[idx, 1:2]
#                    dataset                           description
# 2     obrachyantha_eg_gene  Oryza brachyantha genes (Oryza_br...
# 8          onivara_eg_gene                  Oryza nivara gene...
# 14       opunctata_eg_gene                Oryza punctata gene...
# 15         oindica_eg_gene               Oryza sativa Indica ...
# 18   oglumaepatula_eg_gene            Oryza glumaepatula gene...
# 19        obarthii_eg_gene                 Oryza barthii gene...
# 20         osativa_eg_gene            Oryza sativa Japonica g...
# 25   omeridionalis_eg_gene Oryza meridionalis genes (Oryza_me...
# 28      orufipogon_eg_gene                   Oryza rufipogon ...
# 38 olongistaminata_eg_gene Oryza longistaminata genes (O_long...
# 41     oglaberrima_eg_gene                    Oryza glaberrim...

Choose your dataset of interest (e.g. osativa_eg_gene), then:

library(GenomicFeatures)
txdb <- makeTxDbFromBiomart(biomart="plants_mart",
                            dataset="osativa_eg_gene",
                            host="plants.ensembl.org")

Please note that some important tweaks were made to makeTxDbFromBiomart() last week to improve its support for EnsemblGenomes (see here A: Errors with makeTxDbFromBiomart for the details) so make sure you use the latest version of GenomicFeatures (1.28.5) before trying the above.

Cheers,

H.