the coordinates of exons of RefSeq or ENSEMBL genes
2
0
Entering edit mode
Bogdan ▴ 670
@bogdan-2367
Last seen 6 months ago
Palo Alto, CA, USA

Dear all,

please may i ask a simple question :

in BioC, what is the simplest (or the most direct way) to retrieve the COORDINATES of EXONS for RefSeq or ENSEMBL genes ?

thanks a lot,

-- bogdan

txdb genome exons • 2.6k views
ADD COMMENT
3
Entering edit mode
Wei Shi ★ 3.6k
@wei-shi-2183
Last seen 10 days ago
Australia/Melbourne/Olivia Newton-John …

You can certainly use Bioc core packages to do so, but the getInBuiltAnnotation function in Rsubread offers you a simple way to get this information for human and mouse RefSeq genes.

ADD COMMENT
0
Entering edit mode

thanks a lot, Wei ! looking at the example that you did provide, it works great :

 x <- getInBuiltAnnotation("hg38")
 x[1:5,]
ADD REPLY
3
Entering edit mode
@martin-morgan-1513
Last seen 1 hour ago
United States

For ensembl, load the ensembldb and AnnotationHub and query for EnsDb objects for Homo sapiens, release 97

> library(ensembldb)
> library(AnnotationHub)
> hub = AnnotationHub()
snapshotDate(): 2019-07-10
> query(hub, c("EnsDb", "Homo sapiens", "97"))

This returns a single record with id AH73881. Retrieve and use exons() to extract the information

> exons(hub[["AH73881"]])
downloading 0 resources
loading from cache
GRanges object with 828532 ranges and 1 metadata column:
                  seqnames            ranges strand |         exon_id
                     <Rle>         <IRanges>  <Rle> |     <character>
  ENSE00002234944        1       11869-12227      + | ENSE00002234944
  ENSE00001948541        1       12010-12057      + | ENSE00001948541
  ENSE00001671638        1       12179-12227      + | ENSE00001671638
  ENSE00003582793        1       12613-12721      + | ENSE00003582793
  ENSE00001758273        1       12613-12697      + | ENSE00001758273
              ...      ...               ...    ... .             ...
  ENSE00001741452        Y 26628271-26628437      - | ENSE00001741452
  ENSE00001681574        Y 26630647-26630749      - | ENSE00001681574
  ENSE00001638296        Y 26633345-26633431      - | ENSE00001638296
  ENSE00001797328        Y 26634523-26634652      - | ENSE00001797328
  ENSE00001794473        Y 56855244-56855488      + | ENSE00001794473
  -------
  seqinfo: 424 sequences from GRCh38 genome

Use identical steps for refSeq (knownGene) annotations, using

query(hub, c("TxDb", "Homo sapiens"))
ADD COMMENT
0
Entering edit mode

thank you, Martin ! have a happy weekend !

ADD REPLY

Login before adding your answer.

Traffic: 851 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6