I'm trying to get the positions of exons in the DMD gene using reference assembly hg19. All information I've found indicates that DMD has 79 exons. Yet I get more than 90 variously overlapping exons using GenomicFeatures
:
library(GenomicRanges)
library(Homo.sapiens)
library(GenomicFeatures)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
dmd_exbytx <- exonsBy(txdb, "gene")[["1756"]]
> dmd_exbytx
GRanges object with 90 ranges and 2 metadata columns:
seqnames ranges strand | exon_id exon_name
<Rle> <IRanges> <Rle> | <integer> <character>
[1] chrX [31137345, 31140047] - | 272767 <NA>
[2] chrX [31144759, 31144790] - | 272768 <NA>
[3] chrX [31152219, 31152311] - | 272769 <NA>
[4] chrX [31164408, 31164531] - | 272770 <NA>
[5] chrX [31165392, 31165635] - | 272771 <NA>
... ... ... ... . ... ...
[86] chrX [33038256, 33038317] - | 272852 <NA>
[87] chrX [33146180, 33146544] - | 272853 <NA>
[88] chrX [33146264, 33146545] - | 272854 <NA>
[89] chrX [33229399, 33229673] - | 272855 <NA>
[90] chrX [33357376, 33357726] - | 272856 <NA>
-------
seqinfo: 93 sequences (1 circular) from hg19 genome
What's going on here? Maybe I'm not understanding the biology, which would make sense since I'm not a biologist. I just want the start and end positions for the 79 exons in DMD, but hours of internet searching has gotten me nowhere.
Thanks, this really helped. I ended up choosing one of the 79-exon transcripts that corresponded to an additional RNAseq experiment that was run.