Hi all,
I am trying to construct a gene track using TxDb. Hsapiens.UCSC.hg19.knownGene in Gviz package.
I found a region that does not return any gene while when I look up in the TxDb there is actually transcripts.
Please see the following code that I used:
library(Gviz)
library(GenomicRanges)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(org.Hs.eg.db)
## Wired region
chrm <- "chr6"
chrstart <- 30244097
chrend <- 31444097
geneName <- "DDR1"
## One of the good ones
#chrm <- "chr2"
#chrstart <- 176357518
#chrend <- 177557518
#geneName <- "HOXD13"
## Generate gene track in a given region
grtrack <- GeneRegionTrack(TxDb.Hsapiens.UCSC.hg19.knownGene,
chromosome = chrm,start = chrstart, end =chrend, name = "\nGenes\n",showTitle=FALSE, collapseTranscripts=TRUE,
shape="arrow",fontsize.group=50,transcriptAnnotation = "symbol",fill="black",fontcolor.group="black",
fontsize=30,cex.title=1,col.title="black",background.title="white"
)
## Map the entrez ID with symbol for genes returned in grtrack
symbols <- unlist(mapIds(org.Hs.eg.db, gene(grtrack), "SYMBOL", "ENTREZID", multiVals = "first"))
When using the wired region, the grtrack actually return nothing:
> gene(grtrack)
character(0)
If I do a look up in the TxDb, there are genes in this wired region(the gene I am interested here is DDR1, entrezID 780):
> select(TxDb.Hsapiens.UCSC.hg19.knownGene, keys = "780", keytype = "GENEID", columns = c("CDSCHROM", "CDSEND", "CDSSTART"))
'select()' returned 1:many mapping between keys and columns
GENEID CDSCHROM CDSSTART CDSEND
1 780 chr6 30856507 30856591
2 780 chr6 30856685 30856787
3 780 chr6 30856979 30857207
4 780 chr6 30858750 30858897
5 780 chr6 30859157 30859256
6 780 chr6 30859779 30859965
7 780 chr6 30860073 30860319
8 780 chr6 30860845 30860940
This is the only exception that I found. Other regions work perfectly fine.
Anybody got a hint? Thanks for helping!
Thank you, James! Sorry for the incomplete output. I've noticed DDR1 has different transcripts on multi chromosomes.
What I thought previously is that Gviz would collect all the transcripts in a specified region and collapse them into a single gene. However, it turns out that it would instead look at all transcripts of a 'gene' if the 'gene' is found in that specific region and try to collapse all of them into a single gene. In this case, DDR1 has different positions and thus get dropped.