Question

Saving and retrieving TxDB object

0

Entering edit mode

nishanthemje • 0

@nishanthemje-24028

Last seen 4.7 years ago

Hi there!

I am trying to learn an in-silico approach- EISA (Exon-Intron Split Analysis) using publicly available RNA Seq data (https://bioconductor.riken.jp/packages/devel/bioc/vignettes/eisaR/inst/doc/eisaR.html#References). The method requires the user to load a TxDB object of the transcriptome for the analysis. For Catharanthus plant that I am working on, there is a GFF3 file, which I need to convert into a TxDB object. I am not able to do this following the instructions available online. Following the instructions, I am able to get to the point where the TxDB object has been made, if I am understanding correctly, but am not able to save or retrieve the file (screenshot of R workspace attached) for the next step. This is what I see after entering the commands:

> library(GenomicFeatures)
> gff_file <- system.file("extdata", "GFF3_files", "a.gff3",
+ package="GenomicFeatures")
> txdb <- makeTxDbFromGFF(file = "cro_v2.gene_models.gff3", format = "gff3")
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK

I am not able to understand whether the TxDB object has been created, and how to access it for further analysis, if it has been created. Any support would be appreciated.

Thanks, Nishanth

makeTxDBFromGFF EISA • 3.9k views

ADD COMMENT • link updated 4.9 years ago by James W. MacDonald 68k • written 4.9 years ago by nishanthemje • 0

0

Entering edit mode

After the code you show, can you see anything with txdb? it should show something similar to if you used the gff_file as an example

> txdb
TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: /home/lori/R/x86_64-pc-linux-gnu-library/4.0-BioC-3.12/GenomicFeatures/extdata/GFF3_files/a.gff3
# Organism: NA
# Taxonomy ID: NA
# miRBase build ID: NA
# Genome: NA
# Nb of transcripts: 488
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2020-08-21 10:48:12 -0400 (Fri, 21 Aug 2020)
# GenomicFeatures version at creation time: 1.41.2
# RSQLite version at creation time: 2.2.0
# DBSCHEMAVERSION: 1.2

If this is true than it generated correctly. You could save for future use by simply using the save() function in R to a location of your choice and then using load() in future R session.

ADD REPLY • link 4.9 years ago shepherl 4.2k

0

Entering edit mode

Hi,

Make sure to always use saveDb/loadDb on a TxDb object, not save/load. The latter will break the object. The former will save the object to a self-contained SQLite file that is relocatable.

H.

ADD REPLY • link 4.9 years ago Hervé Pagès 16k

score 3 · Answer 1 · 2020-08-21

When you make a TxDb that way, it only exists in your R workspace and will disappear when you close R. So if you only need it once, you can make it, use the TxDb object that you have created to do whatever, and then when you close R it's gone.

If you want it to be persistent, you have to save it somehow and then load it back into your workspace when you need it again. You could hypothetically make a TxDb package, but that is likely more work than it's worth. You can instead just save the object and load it back in directly. One way to do that is what Lori Shepherd suggested.

An alternative to what Lori suggested is saveDb and loadDb from AnnotationDbi. An example, using an existing TxDb:

> library(AnnotationDbi)
> library(AnnotationHub)
> hub <- AnnotationHub()
## some random species I know will have a TxDb
> query(hub, c("mulatta","txdb"))
AnnotationHub with 15 records
# snapshotDate(): 2020-04-27
# $dataprovider: UCSC
# $species: Macaca mulatta
# $rdataclass: TxDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH52261"]]' 

            title                                     
  AH52261 | TxDb.Mmulatta.UCSC.rheMac3.refGene.sqlite 
  AH52262 | TxDb.Mmulatta.UCSC.rheMac8.refGene.sqlite 
  AH57989 | TxDb.Mmulatta.UCSC.rheMac3.refGene.sqlite 
  AH57990 | TxDb.Mmulatta.UCSC.rheMac8.refGene.sqlite 
  AH61795 | TxDb.Mmulatta.UCSC.rheMac3.refGene.sqlite 
  ...       ...                                       
  AH75759 | TxDb.Mmulatta.UCSC.rheMac3.refGene.sqlite 
  AH75760 | TxDb.Mmulatta.UCSC.rheMac8.refGene.sqlite 
  AH75761 | TxDb.Mmulatta.UCSC.rheMac10.refGene.sqlite
  AH79593 | TxDb.Mmulatta.UCSC.rheMac3.refGene.sqlite 
  AH79594 | TxDb.Mmulatta.UCSC.rheMac8.refGene.sqlite 
## get the TxDb
> z <- hub[["AH79594"]]
downloading 1 resources
retrieving 1 resource
  |======================================================================| 100%

loading from cache
Loading required package: GenomicFeatures
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
> z
TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: UCSC
# Genome: rheMac8
# Organism: Macaca mulatta
# Taxonomy ID: 9544
# UCSC Table: refGene
# UCSC Track: RefSeq Genes
# Resource URL: http://genome.ucsc.edu/
# Type of Gene ID: Entrez Gene ID
# Full dataset: yes
# miRBase build ID: NA
# Nb of transcripts: 6504
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2020-04-28 14:21:43 +0000 (Tue, 28 Apr 2020)
# GenomicFeatures version at creation time: 1.39.7
# RSQLite version at creation time: 2.2.0
# DBSCHEMAVERSION: 1.2

## save it
> saveDb(z, "whatevs")
TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: UCSC
# Genome: rheMac8
# Organism: Macaca mulatta
# Taxonomy ID: 9544
# UCSC Table: refGene
# UCSC Track: RefSeq Genes
# Resource URL: http://genome.ucsc.edu/
# Type of Gene ID: Entrez Gene ID
# Full dataset: yes
# miRBase build ID: NA
# Nb of transcripts: 6504
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2020-04-28 14:21:43 +0000 (Tue, 28 Apr 2020)
# GenomicFeatures version at creation time: 1.39.7
# RSQLite version at creation time: 2.2.0
# DBSCHEMAVERSION: 1.2

## Get rid of it
> rm(z)
> z
Error: object 'z' not found

## and load it back in.
> z <- loadDb("whatevs")
> z
TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: UCSC
# Genome: rheMac8
# Organism: Macaca mulatta
# Taxonomy ID: 9544
# UCSC Table: refGene
# UCSC Track: RefSeq Genes
# Resource URL: http://genome.ucsc.edu/
# Type of Gene ID: Entrez Gene ID
# Full dataset: yes
# miRBase build ID: NA
# Nb of transcripts: 6504
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2020-04-28 14:21:43 +0000 (Tue, 28 Apr 2020)
# GenomicFeatures version at creation time: 1.39.7
# RSQLite version at creation time: 2.2.0
# DBSCHEMAVERSION: 1.2

Your second question about using the TxDb is beyond the scope of a support site post. You should read the GenomicFeatures vignette, which is intended to teach you how to use them.