How to extract BOTH exon and matched transcript ID from TxDb.Hsapiens.UCSC.hg19.knownGene database?
1
0
Entering edit mode
@shinhengchiou-15264
Last seen 6.7 years ago

Hi, 

As the title suggested, I wonder if I can extract exon information (e.g. through exonsBy) wherein I'll have each exon genome coordinates, exon ID, and the gene ID of which they belong (by = "gene") and, on top of all that, the transcript ID each exon belong to?

Thank you very much!

 

Shin

txdb.hsapiens.ucsc.hg19.knowngene • 1.3k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 11 hours ago
United States
> ex <- exonsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, use.names = TRUE)

> exgr <- unlist(ex)

> mcols(exgr)$txid <- names(exgr)

> mcols(exgr)$geneid <- mapIds(TxDb.Hsapiens.UCSC.hg19.knownGene, names(exgr), "GENEID","TXNAME")
'select()' returned 1:1 mapping between keys and columns
> exgr
GRanges object with 742493 ranges and 5 metadata columns:
                   seqnames         ranges strand |   exon_id   exon_name
                      <Rle>      <IRanges>  <Rle> | <integer> <character>
  uc001aaa.3           chr1 [11874, 12227]      + |         1        <NA>
  uc001aaa.3           chr1 [12613, 12721]      + |         3        <NA>
  uc001aaa.3           chr1 [13221, 14409]      + |         5        <NA>
  uc010nxq.1           chr1 [11874, 12227]      + |         1        <NA>
  uc010nxq.1           chr1 [12595, 12721]      + |         2        <NA>
         ...            ...            ...    ... .       ...         ...
  uc011mgv.2 chrUn_gl000241 [22732, 22846]      - |    289961        <NA>
  uc011mgv.2 chrUn_gl000241 [20433, 20481]      - |    289960        <NA>
  uc011mgw.1 chrUn_gl000243 [11501, 11530]      + |    289967        <NA>
  uc022brq.1 chrUn_gl000243 [13608, 13637]      + |    289968        <NA>
  uc022brr.1 chrUn_gl000247 [ 5787,  5816]      - |    289969        <NA>
             exon_rank        txid      geneid
             <integer> <character> <character>
  uc001aaa.3         1  uc001aaa.3   100287102
  uc001aaa.3         2  uc001aaa.3   100287102
  uc001aaa.3         3  uc001aaa.3   100287102
  uc010nxq.1         1  uc010nxq.1   100287102
  uc010nxq.1         2  uc010nxq.1   100287102
         ...       ...         ...         ...
  uc011mgv.2         6  uc011mgv.2        <NA>
  uc011mgv.2         7  uc011mgv.2        <NA>
  uc011mgw.1         1  uc011mgw.1        <NA>
  uc022brq.1         1  uc022brq.1        <NA>
  uc022brr.1         1  uc022brr.1        <NA>
  -------
  seqinfo: 93 sequences (1 circular) from hg19 genome
>
ADD COMMENT

Login before adding your answer.

Traffic: 766 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6