Search
Question: How to create tx2gene data.frame when there's no TxDb object for the organism you are working with.
0
gravatar for prab4th
9 months ago by
prab4th0
prab4th0 wrote:

I have been following the workflow available at [Importing transcript abundance datasets with tximport](http://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html) and it is required to use a TxDb object. I am working with Rice and there isn't a TxDb object for rice. But rice has a BSgenome object.

Is there any way I can use the BSgenome object? I just want to use my Salmon output to be used in DESeq2.

ADD COMMENTlink modified 9 months ago by Johannes Rainer1.3k • written 9 months ago by prab4th0
4
gravatar for Hervé Pagès
9 months ago by
Hervé Pagès ♦♦ 13k
United States
Hervé Pagès ♦♦ 13k wrote:

Hi,

Alternatively, you can use makeTxDbFromBiomart() to make a TxDb object from the Ensembl Plants mart:

library(biomaRt)
mart <- useMart(biomart="plants_mart", host="plants.ensembl.org")
datasets <- listDatasets(mart)
datasets[1:6 , 1:2]
#                dataset                               description
# 1    atauschii_eg_gene              Aegilops tauschii genes (...
# 2 obrachyantha_eg_gene Oryza brachyantha genes (Oryza_brachya...
# 3 ptrichocarpa_eg_gene                Populus trichocarpa gen...
# 4     ppersica_eg_gene                   Prunus persica genes...
# 5   stuberosum_eg_gene              Solanum tuberosum genes (...
# 6     sitalica_eg_gene                   Setaria italica gene...
idx <- grep("oryza", datasets$description, ignore.case=TRUE)
datasets[idx, 1:2]
#                    dataset                           description
# 2     obrachyantha_eg_gene  Oryza brachyantha genes (Oryza_br...
# 8          onivara_eg_gene                  Oryza nivara gene...
# 14       opunctata_eg_gene                Oryza punctata gene...
# 15         oindica_eg_gene               Oryza sativa Indica ...
# 18   oglumaepatula_eg_gene            Oryza glumaepatula gene...
# 19        obarthii_eg_gene                 Oryza barthii gene...
# 20         osativa_eg_gene            Oryza sativa Japonica g...
# 25   omeridionalis_eg_gene Oryza meridionalis genes (Oryza_me...
# 28      orufipogon_eg_gene                   Oryza rufipogon ...
# 38 olongistaminata_eg_gene Oryza longistaminata genes (O_long...
# 41     oglaberrima_eg_gene                    Oryza glaberrim...

Choose your dataset of interest (e.g. osativa_eg_gene), then:

library(GenomicFeatures)
txdb <- makeTxDbFromBiomart(biomart="plants_mart",
                            dataset="osativa_eg_gene",
                            host="plants.ensembl.org")

Please note that some important tweaks were made to makeTxDbFromBiomart() last week to improve its support for EnsemblGenomes (see here A: Errors with makeTxDbFromBiomart for the details) so make sure you use the latest version of GenomicFeatures (1.28.5) before trying the above.

Cheers,

H.

ADD COMMENTlink modified 9 months ago • written 9 months ago by Hervé Pagès ♦♦ 13k

Hello,

This worked! thank you :) If you post your answer as a top level comment I can accept it.

 

Alsolibrary(GenomicFeatures) should be there for makeTxDbFromBiomart() to work.

ADD REPLYlink modified 9 months ago • written 9 months ago by prab4th0

Done. I added the library(GenomicFeatures) line. Thanks for the feedback!

Cheers,

H.

ADD REPLYlink written 9 months ago by Hervé Pagès ♦♦ 13k
2
gravatar for Johannes Rainer
9 months ago by
Johannes Rainer1.3k
Italy
Johannes Rainer1.3k wrote:

Ensemblgenomes provides gene models for many plants. Check http://plants.ensembl.org/index.html . You could either download a gtf or gff3 file for rice from there and build a TxDb using makeTxDbFromGff (GenomicFeatures package) or, since the data is in Ensembl format, an EnsDb using ensDbFromGtf (ensembldb package - EnsDb and TxDb packages/databases provide the same functionality/annotations).

For EnsDb, creating an EnsDb from a GTF you might lack some annotations since they are not provided in the file. If you tell me what release and species (which of the many oryza forms e.g. oryza_sativa, oryza_meridionalis etc) you'd need, I could build the EnsDb database/package for you directly from the ensemblgenomes MySQL databases - just let me know.

cheers, jo

ADD COMMENTlink written 9 months ago by Johannes Rainer1.3k

I'll try the `makeTxDbFromGff` first and get back to you if l couldn't get it to work. Thanks Jo
 

ADD REPLYlink written 9 months ago by prab4th0

These were the files availble for Oryza sativa: ftp://ftp.ensemblgenomes.org/pub/plants/release-37/gff3/oryza_sativa/

File: Oryza_sativa.IRGSP-1.0.37.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chr.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.abinitio.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.1.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.3.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.2.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.4.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.6.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.5.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.7.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.8.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.11.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.12.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.9.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.10.gff3.gz

Should I use the first file or should I combine each chromosome files in some way before feeding it into makeTxDbfromGFF?

ADD REPLYlink written 9 months ago by prab4th0
1

I would use the first one - or the second, which to my understanding contains only genes encoded on chromosomes (the other might contain also containing genes encoded in contigs).

ADD REPLYlink written 9 months ago by Johannes Rainer1.3k

Hey,

Hey, I'm using deseq2 after kallisto to analyze rice data. I'm using an ensembl gtf and I want to create a txdb. I used this function: 

txdb2 <- makeTxDbFromGFF(file="C:/Users/Dee/Desktop/Thesis_rice/Oryza_sativa.IRGSP-1.0.37.gtf", dataSource=paste("ftp://ftp.ensemblgenomes.org/pub/plants/release-37/gtf/oryza_sativa/",sep=""), organism="Oryza sativa")

and I got that error:

Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... Error in c(x, value) : 
  could not find symbol "recursive" in environment of the generic function

any help?
 

 

ADD REPLYlink written 7 months ago by dina.hesham1390
1

Might be a problem in the makeTxDbFromGFF function from the GenomicFeatures package. It works with the ensDbFromGtf from the ensembldb package.

> library(ensembldb)
> dbf <- ensDbFromGtf("Oryza_sativa.IRGSP-1.0.37.gtf.gz")
Importing GTF file ... OK
Processing metadata ... OK
Processing genes ...
 Attribute availability:
  o gene_id ... OK
  o gene_name ... OK
  o entrezid ... Nope
  o gene_biotype ... OK
OK
Processing transcripts ...
 Attribute availability:
  o transcript_id ... OK
  o gene_id ... OK
  o transcript_biotype ... OK
OK
Processing exons ... OK
Processing chromosomes ... Fetch seqlengths from ensembl ... OK
Generating index ... OK
  -------------
Verifying validity of the information in the database:
Checking transcripts ... OK
Checking exons ... OK
Warning messages:
1: call dbDisconnect() when finished working with a connection
2: In ensDbFromGRanges(GTF, outfile = outfile, path = path, organism = organism,  :
   I'm missing column(s): 'entrezid'. The corresponding database column(s) will be empty!
3: closing unused connection 7 (ftp://ftp.ensemblgenomes.org/pub/release-37/plants/mysql/)
4: closing unused connection 6 (ftp://ftp.ensemblgenomes.org/pub/release-37/metazoa/mysql/)
5: closing unused connection 5 (ftp://ftp.ensemblgenomes.org/pub/release-37/fungi/mysql/)
6: closing unused connection 4 (ftp://ftp.ensemblgenomes.org/pub/release-37/bacteria/mysql/)
7: closing unused connection 3 (ftp://ftp.ensembl.org/pub/release-37/mysql/)
> edb <- EnsDb(dbf)
> edb
EnsDb for Ensembl:
|Backend: SQLite
|Db type: EnsDb
|Type of Gene ID: Ensembl Gene ID
|Supporting package: ensembldb
|Db created by: ensembldb package from Bioconductor
|script_version: 0.0.1
|Creation time: Sat Nov 25 18:53:08 2017
|ensembl_version: 37
|ensembl_host: unknown
|Organism: Oryza_sativa
|genome_build: IRGSP-1.0
|DBSCHEMAVERSION: 1.0
|source_file: Oryza_sativa.IRGSP-1.0.37.gtf.gz
| No. of genes: 91992.
| No. of transcripts: 98663.
>

cheers, jo

ADD REPLYlink modified 7 months ago • written 7 months ago by Johannes Rainer1.3k

Thanks alot!!

ADD REPLYlink written 7 months ago by dina.hesham1390

Hi,

is there any reason you used kallisto over Salmon?

ADD REPLYlink written 7 months ago by prab4th0

I'm using both for comparison.

cheers,

Dina

ADD REPLYlink written 7 months ago by dina.hesham1390
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 168 users visited in the last hour