Search
Question: Feature types in TxDb objects
4
3.7 years ago by
Thomas Girke1.6k
United States
Thomas Girke1.6k wrote:

Here are few questions/suggestions for our experts on range operations:

(1) Are there currently accessor functions available to extract the range data of the following feature types from TxDb objects: miRNAs, rRNAs, tRNAs and transposons? The existing functions (e.g. microRNA) for some of these features seem to depend on other databases (e.g. miRbase) which are not very useful for more recently sequenced organisms where all we usually have is a fasta sequence file and a gff/gtf annotation file. Since the above features are usually encoded in most gff/gtf files it would be nice if they were retrievable from TxDb objects as well.

(2) A simple accessor function for obtaining intergenic ranges from TxDb objects would also be useful. Right now I only know how to obtain this range type in several steps by (i) extracting transcripts by genes, (ii) flattening the overlapping ones with reduce() and then applying gaps() to obtain the non-annotated regions. Having an explicit function, similar to promoter() or intron(), for this routine would be useful and minimize errors often made by less experienced users.

Thanks,

Thomas

modified 3.5 years ago • written 3.7 years ago by Thomas Girke1.6k
1
3.5 years ago by
Hervé Pagès ♦♦ 13k
United States
Hervé Pagès ♦♦ 13k wrote:

Hi Thomas,

This is now implemented in BioC 3.1. TxDb objects now have a new column (tx_type) that you can request thru the columns arg of the transcripts() extractor. This column is populated when you make a TxDb object from Ensembl (makeTxDbFromBiomart) or from a GFF3/GTF file (makeTxDbFromGFF), but not yet (i.e. it's set to NA) when you make it from a UCSC track (makeTxDbFromUCSC). However it seems that UCSC is also providing that information for some tracks so we're planning to have makeTxDbFromUCSC get it from these tracks at some point (after the BioC 3.1 release though).

See here

A: Does BSgenome.Dmelanogaster.UCSC.dm2 mountain non-coding RNAs?

for an example using the new tx_type column.

H.

0
3.7 years ago by
Marc Carlson7.2k
United States
Marc Carlson7.2k wrote:

Hi Thomas,

Thanks for your feedback.  We currently don't store the feature type field that you are referring to (that comes from these GTF and GFF files), so this will require a small schema change on our end to even be able to do this.  This is because (as you already know) GTF and GFF files were not the 1st type of files that we made into TxDb objects.  But we can change that so that this kind of data could come back in a metadata column (for those resources where it's available).  That change alone would make for some straightforward sub setting whenever someone has that kind of data.

As for your second suggestion, well I think that's also a good point and it would not be too hard to support an 'intergenic' wrapper function.

Marc

0
3.5 years ago by
Thomas Girke1.6k
United States
Thomas Girke1.6k wrote:

Excellent! Thanks for making this available.

Best,

Thomas