Feature types in TxDb objects
3
4
Entering edit mode
Thomas Girke ★ 1.7k
@thomas-girke-993
Last seen 9 months ago
United States

Here are few questions/suggestions for our experts on range operations:

(1) Are there currently accessor functions available to extract the range data of the following feature types from TxDb objects: miRNAs, rRNAs, tRNAs and transposons? The existing functions (e.g. microRNA) for some of these features seem to depend on other databases (e.g. miRbase) which are not very useful for more recently sequenced organisms where all we usually have is a fasta sequence file and a gff/gtf annotation file. Since the above features are usually encoded in most gff/gtf files it would be nice if they were retrievable from TxDb objects as well.  

(2) A simple accessor function for obtaining intergenic ranges from TxDb objects would also be useful. Right now I only know how to obtain this range type in several steps by (i) extracting transcripts by genes, (ii) flattening the overlapping ones with reduce() and then applying gaps() to obtain the non-annotated regions. Having an explicit function, similar to promoter() or intron(), for this routine would be useful and minimize errors often made by less experienced users. 

Thanks,

Thomas 

 

 

 

GenomicFeatures rtracklayer • 2.4k views
ADD COMMENT
1
Entering edit mode
@herve-pages-1542
Last seen 4 hours ago
Seattle, WA, United States

Hi Thomas,

This is now implemented in BioC 3.1. TxDb objects now have a new column (tx_type) that you can request thru the columns arg of the transcripts() extractor. This column is populated when you make a TxDb object from Ensembl (makeTxDbFromBiomart) or from a GFF3/GTF file (makeTxDbFromGFF), but not yet (i.e. it's set to NA) when you make it from a UCSC track (makeTxDbFromUCSC). However it seems that UCSC is also providing that information for some tracks so we're planning to have makeTxDbFromUCSC get it from these tracks at some point (after the BioC 3.1 release though).

See here

A: Does BSgenome.Dmelanogaster.UCSC.dm2 mountain non-coding RNAs?

for an example using the new tx_type column.

H.

ADD COMMENT
0
Entering edit mode
Marc Carlson ★ 7.2k
@marc-carlson-2264
Last seen 8.4 years ago
United States

Hi Thomas,

Thanks for your feedback.  We currently don't store the feature type field that you are referring to (that comes from these GTF and GFF files), so this will require a small schema change on our end to even be able to do this.  This is because (as you already know) GTF and GFF files were not the 1st type of files that we made into TxDb objects.  But we can change that so that this kind of data could come back in a metadata column (for those resources where it's available).  That change alone would make for some straightforward sub setting whenever someone has that kind of data. 

As for your second suggestion, well I think that's also a good point and it would not be too hard to support an 'intergenic' wrapper function.

 

 Marc

ADD COMMENT
0
Entering edit mode
Thomas Girke ★ 1.7k
@thomas-girke-993
Last seen 9 months ago
United States

Excellent! Thanks for making this available. 

Best,

Thomas

ADD COMMENT

Login before adding your answer.

Traffic: 644 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6