Simple UCSC to transcript annotation
2
0
Entering edit mode
AntonS • 0
@antons-13533
Last seen 5.7 years ago

Hi,

I tried to find the solution on my own, but I think the whole tdx, annotationdb, ... are circuitous.

I have a chromosomal location (chr1:48902063-48902085, chr2:202340342-202340364, ....) and I just want to know, whether this region is an exon. Is there a simple command to do it?

Best regards and thank you in advanced

annotation ucsc tdx transcripts • 738 views
1
Entering edit mode
@james-w-macdonald-5106
Last seen 11 hours ago
United States
> library(TxDb.Hsapiens.UCSC.hg19.knownGene)

> ex <- exons(TxDb.Hsapiens.UCSC.hg19.knownGene)

> gr <- GRanges(c("chr1","chr2"), IRanges(c(48902063,202340342), c(48902085, 202340364)))

> subsetByOverlaps(ex, gr)
GRanges object with 1 range and 1 metadata column:
seqnames                 ranges strand |   exon_id
<Rle>              <IRanges>  <Rle> | <integer>
[1]     chr2 [202340342, 202340465]      + |     34935
-------
seqinfo: 93 sequences (1 circular) from hg19 genome
>

So the answers are no, and part of an exon.

0
Entering edit mode
AntonS • 0
@antons-13533
Last seen 5.7 years ago

Thank you, but this object is again nested in at least 3 levels and horrible to handle. Never had a such hard to handle package.

Now I just need a simple data.frame

Chr     ExonStart     ExonEnd     ExonNr     Gene

0
Entering edit mode

I have no idea what you mean by 'nested in at least 3 levels'. Are you complaining that it took three lines of code to generate an answer?

If so, there are always tradeoffs to be made - you can make things really simple and straightforward, but the cost to that is you force people to do what you think they should do, and make it difficult to do other things that they might want to do. The alternative is to make things very powerful, but the cost to that is complexity. All of the Bioconductor infrastructure for dealing with genomic position data is very powerful, but at the same time very complex. What people gain from the complexity is the ability to do lots of things that a simpler API would likely prevent.

There is extensive documentation for all of the objects that I have generated, so I would point you to the help pages, as well as the vignettes for the GenomicRanges package. As a hint towards what you want, do note that

> ex <- exonsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, "gene")
> unlist(ex)
GRanges object with 272776 ranges and 2 metadata columns:
seqnames               ranges strand |   exon_id   exon_name
<Rle>            <IRanges>  <Rle> | <integer> <character>
1    chr19 [58858172, 58858395]      - |    250809        <NA>
1    chr19 [58858719, 58859006]      - |    250810        <NA>
1    chr19 [58859832, 58860494]      - |    250811        <NA>
1    chr19 [58860934, 58862017]      - |    250812        <NA>
1    chr19 [58861736, 58862017]      - |    250813        <NA>
...      ...                  ...    ... .       ...         ...
9997    chr22 [50961997, 50962853]      - |    266958        <NA>
9997    chr22 [50963871, 50964033]      - |    266960        <NA>
9997    chr22 [50963901, 50964034]      - |    266961        <NA>
9997    chr22 [50964430, 50964570]      - |    266963        <NA>
9997    chr22 [50964675, 50964905]      - |    266965        <NA>
-------
seqinfo: 93 sequences (1 circular) from hg19 genome

Gives you a GRanges object, where the names of the GRanges object are the Entrez Gene ID, which is, I presume, what you wanted for the Gene column.

0
Entering edit mode

also as.data.frame() to get the simple data.frame that the user desires.