Gene missing in TxDb.Hsapiens.UCSC.hg19.knownGene
1
0
Entering edit mode
@drramkichop-8164
Last seen 6.5 years ago
United States

Hello group,

Am using TxDb.Hsapiens.UCSC.hg19.knownGene to annotate genomic intervals and I came across an instance where I missed a legitimate disease gene SYNGAP1. In this case,

g <- as.data.frame(genes(TxDb.Hsapiens.UCSC.hg19.knownGene))

And this contains 23,056 genes but this is missing the gene SYNGAP1 (https://www.ncbi.nlm.nih.gov/gene/8831 )This is an important gene and cannot afford to miss it. Wonder if am doing anything wrong here. If it is really missing, I am scared to use this database.

Thanks.

annotation • 1.2k views
ADD COMMENT
3
Entering edit mode
Robert Castelo ★ 3.3k
@rcastelo
Last seen 1 day ago
Barcelona/Universitat Pompeu Fabra

hi,

indeed, if you try to access the annotation of the gene with the 'genes()' method and its default parameters, you won't find it:

"8831" %in% names(genes(txdb))
[1] FALSE

however, the gene forms part of this annotation package and you can see it if you use the 'exonsBy()' method:

 library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
"8831" %in% keys(txdb, keytype="GENEID")
[1] TRUE
allgenes <- exonsBy(txdb, by="gene")
allgenes["8831"]
GRangesList object of length 1:
$8831
GRanges object with 44 ranges and 2 metadata columns:
            seqnames               ranges strand |   exon_id   exon_name
               <Rle>            <IRanges>  <Rle> | <integer> <character>
  [1]           chr6 [33387847, 33388108]      + |     87817        <NA>
  [2]           chr6 [33391254, 33391375]      + |     87818        <NA>
  [3]           chr6 [33393575, 33393680]      + |     87819        <NA>
  [4]           chr6 [33399938, 33400029]      + |     87820        <NA>
  [5]           chr6 [33400462, 33400583]      + |     87821        <NA>
  ...            ...                  ...    ... .       ...         ...
 [40] chr6_ssto_hap7   [4894596, 4894807]      + |    288563        <NA>
 [41] chr6_ssto_hap7   [4894602, 4894807]      + |    288564        <NA>
 [42] chr6_ssto_hap7   [4895851, 4895954]      + |    288565        <NA>
 [43] chr6_ssto_hap7   [4895864, 4895954]      + |    288566        <NA>
 [44] chr6_ssto_hap7   [4899781, 4901710]      + |    288567        <NA>
-------
seqinfo: 93 sequences (1 circular) from hg19 genome

notice that the gene is annotated in more than one chromosome. if you check the help page of the 'genes()' method you will find that it has an option 'single.strand.genes.only=TRUE' by which "genes that have exons located on both strands of the same chromosome or on two different chromosomes are dropped". so, if you set this option to FALSE you'll find your gene with the 'genes()' method:

"8831" %in% names(genes(txdb, single.strand.genes.only=FALSE))
[1] TRUE

cheers,

robert.

 

 

 

ADD COMMENT

Login before adding your answer.

Traffic: 518 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6