Question

Simple UCSC to transcript annotation

0

Entering edit mode

AntonS • 0

@antons-13533

Last seen 6.8 years ago

Hi,

I tried to find the solution on my own, but I think the whole tdx, annotationdb, ... are circuitous.

I have a chromosomal location (chr1:48902063-48902085, chr2:202340342-202340364, ....) and I just want to know, whether this region is an exon. Is there a simple command to do it?

Best regards and thank you in advanced

annotation ucsc tdx transcripts • 1.0k views

ADD COMMENT • link 6.8 years ago AntonS • 0

score 1 · Answer 1 · 2017-07-19

> library(TxDb.Hsapiens.UCSC.hg19.knownGene)

> ex <- exons(TxDb.Hsapiens.UCSC.hg19.knownGene)

> gr <- GRanges(c("chr1","chr2"), IRanges(c(48902063,202340342), c(48902085, 202340364)))

> subsetByOverlaps(ex, gr)
GRanges object with 1 range and 1 metadata column:
      seqnames                 ranges strand |   exon_id
         <Rle>              <IRanges>  <Rle> | <integer>
  [1]     chr2 [202340342, 202340465]      + |     34935
  -------
  seqinfo: 93 sequences (1 circular) from hg19 genome
>

So the answers are no, and part of an exon.

score 0 · Answer 2 · 2017-07-19

0

Entering edit mode

AntonS • 0

@antons-13533

Last seen 6.8 years ago

Thank you, but this object is again nested in at least 3 levels and horrible to handle. Never had a such hard to handle package.

Now I just need a simple data.frame

Chr ExonStart ExonEnd ExonNr Gene

ADD COMMENT • link 6.8 years ago AntonS • 0

0

Entering edit mode

If you want to add a comment, please use the ADD COMMENT link. The Add your answer box below is for people to add answers, not more questions or comments.

I have no idea what you mean by 'nested in at least 3 levels'. Are you complaining that it took three lines of code to generate an answer?

If so, there are always tradeoffs to be made - you can make things really simple and straightforward, but the cost to that is you force people to do what you think they should do, and make it difficult to do other things that they might want to do. The alternative is to make things very powerful, but the cost to that is complexity. All of the Bioconductor infrastructure for dealing with genomic position data is very powerful, but at the same time very complex. What people gain from the complexity is the ability to do lots of things that a simpler API would likely prevent.

There is extensive documentation for all of the objects that I have generated, so I would point you to the help pages, as well as the vignettes for the GenomicRanges package. As a hint towards what you want, do note that

> ex <- exonsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, "gene")
> unlist(ex)
GRanges object with 272776 ranges and 2 metadata columns:
       seqnames               ranges strand |   exon_id   exon_name
          <Rle>            <IRanges>  <Rle> | <integer> <character>
     1    chr19 [58858172, 58858395]      - |    250809        <NA>
     1    chr19 [58858719, 58859006]      - |    250810        <NA>
     1    chr19 [58859832, 58860494]      - |    250811        <NA>
     1    chr19 [58860934, 58862017]      - |    250812        <NA>
     1    chr19 [58861736, 58862017]      - |    250813        <NA>
   ...      ...                  ...    ... .       ...         ...
  9997    chr22 [50961997, 50962853]      - |    266958        <NA>
  9997    chr22 [50963871, 50964033]      - |    266960        <NA>
  9997    chr22 [50963901, 50964034]      - |    266961        <NA>
  9997    chr22 [50964430, 50964570]      - |    266963        <NA>
  9997    chr22 [50964675, 50964905]      - |    266965        <NA>
  -------
  seqinfo: 93 sequences (1 circular) from hg19 genome

Gives you a GRanges object, where the names of the GRanges object are the Entrez Gene ID, which is, I presume, what you wanted for the Gene column.

ADD REPLY • link 6.8 years ago James W. MacDonald 65k

0

Entering edit mode

also as.data.frame() to get the simple data.frame that the user desires.

ADD REPLY • link 6.8 years ago Martin Morgan 25k