Dear all,
please may i ask a simple question :
in BioC, what is the simplest (or the most direct way) to retrieve the COORDINATES of EXONS for RefSeq or ENSEMBL genes ?
thanks a lot,
-- bogdan
Dear all,
please may i ask a simple question :
in BioC, what is the simplest (or the most direct way) to retrieve the COORDINATES of EXONS for RefSeq or ENSEMBL genes ?
thanks a lot,
-- bogdan
You can certainly use Bioc core packages to do so, but the getInBuiltAnnotation
function in Rsubread
offers you a simple way to get this information for human and mouse RefSeq genes.
For ensembl, load the ensembldb and AnnotationHub and query for EnsDb objects for Homo sapiens, release 97
> library(ensembldb)
> library(AnnotationHub)
> hub = AnnotationHub()
snapshotDate(): 2019-07-10
> query(hub, c("EnsDb", "Homo sapiens", "97"))
This returns a single record with id AH73881. Retrieve and use exons()
to extract the information
> exons(hub[["AH73881"]])
downloading 0 resources
loading from cache
GRanges object with 828532 ranges and 1 metadata column:
seqnames ranges strand | exon_id
<Rle> <IRanges> <Rle> | <character>
ENSE00002234944 1 11869-12227 + | ENSE00002234944
ENSE00001948541 1 12010-12057 + | ENSE00001948541
ENSE00001671638 1 12179-12227 + | ENSE00001671638
ENSE00003582793 1 12613-12721 + | ENSE00003582793
ENSE00001758273 1 12613-12697 + | ENSE00001758273
... ... ... ... . ...
ENSE00001741452 Y 26628271-26628437 - | ENSE00001741452
ENSE00001681574 Y 26630647-26630749 - | ENSE00001681574
ENSE00001638296 Y 26633345-26633431 - | ENSE00001638296
ENSE00001797328 Y 26634523-26634652 - | ENSE00001797328
ENSE00001794473 Y 56855244-56855488 + | ENSE00001794473
-------
seqinfo: 424 sequences from GRCh38 genome
Use identical steps for refSeq (knownGene) annotations, using
query(hub, c("TxDb", "Homo sapiens"))
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
thanks a lot, Wei ! looking at the example that you did provide, it works great :