Question: TxDb.Hsapiens.UCSC.hg19.knownGene misannotation of a gene
0
gravatar for tangming2005
3.1 years ago by
tangming2005150
United States
tangming2005150 wrote:

Hi,

I think I found a bug, not sure I am right or not.

library(TxDb.Hsapiens.UCSC.hg19.knownGene)

UCSC.hg19<- TxDb.Hsapiens.UCSC.hg19.knownGene
hg19.genes<- genes(UCSC.hg19)

library("org.Hs.eg.db")

gene_symbol<- AnnotationDbi::select(org.Hs.eg.db, keys=hg19.genes$gene_id, 

                                    columns="SYMBOL", keytype="ENTREZID")

all.equal(hg19.genes$gene_id, gene_symbol$ENTREZID)

hg19.genes$gene_id<- gene_symbol$SYMBOL

hg19.genes[9349]
GRanges object with 1 range and 1 metadata column:
         seqnames               ranges strand |     gene_id
            <Rle>            <IRanges>  <Rle> | <character>
  286297     chr9 [42844370, 67032072]      - |   LOC286297
  -------
  seqinfo: 93 sequences (1 circular) from hg19 genome

I then went to IGV to check "LOC286297", the end coordinate is 42859085

The reason I found this gene is because I want to check how many genes in each chromosome arm and found this gene is spanning p and q arm.

Thanks,

Tommy

 

ADD COMMENTlink modified 3.1 years ago by James W. MacDonald51k • written 3.1 years ago by tangming2005150
Answer: TxDb.Hsapiens.UCSC.hg19.knownGene misannotation of a gene
1
gravatar for James W. MacDonald
3.1 years ago by
United States
James W. MacDonald51k wrote:

I'm not sure that this is an error per se. When you ask for the genes, what you get are the smallest start position for any transcript, and the largest end position for any transcript, which is consistent for this non-coding transcript:

> txs <- transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene)

> txs["286297"]
GRangesList object of length 1:
$286297
GRanges object with 2 ranges and 2 metadata columns:
      seqnames               ranges strand |     tx_id     tx_name
         <Rle>            <IRanges>  <Rle> | <integer> <character>
  [1]     chr9 [42844370, 42859085]      - |     36344  uc031tdv.1
  [2]     chr9 [67017375, 67032072]      - |     36381  uc031tdz.1

> gns <- genes(TxDb.Hsapiens.UCSC.hg19.knownGene)
> gns["286297"]
GRanges object with 1 range and 1 metadata column:
         seqnames               ranges strand |     gene_id
            <Rle>            <IRanges>  <Rle> | <character>
  286297     chr9 [42844370, 67032072]      - |      286297
  -------
  seqinfo: 93 sequences (1 circular) from hg19 genome

I am not sure there is a simple way to 'fix' this, particularly for non-coding transcripts, which may be found lots of places.

ADD COMMENTlink written 3.1 years ago by James W. MacDonald51k

That makes sense. It's less confusing than GENCODE Genes, which sometimes uses the same gene symbol but different ENSG identifiers for largely overlapping genes in a certain genomic region. I asked about it and one of the team replied that it would be fixed by GENCODE 26.

ADD REPLYlink written 3.1 years ago by Dario Strbenac1.5k

thanks. I did not know that a gene can have two transcripts that far apart.

ADD REPLYlink written 3.1 years ago by tangming2005150
1

There are actually quite a few genes that are on both the X and Y chromosomes, and they disappear when you do

gns <- genes(TxDb.Hsapiens.UCSC.hg19.knownGene)

because the smallest start and the largest end value tend to be on different chromosomes. By definition you can't have them in a GRanges object as one thing (e.g., the GRanges paradigm doesn't include regions that span between two chromosomes). So the simple notion that a 'gene' is simply the region encompassed by all transcripts for that gene sort of breaks down for some sex-linked genes, as well as some of the non-coding transcripts.


 

ADD REPLYlink written 3.1 years ago by James W. MacDonald51k

Thanks James, this is the details one need to pay attention to.

ADD REPLYlink written 3.1 years ago by tangming2005150

Is there an easy way to get gene numbers on each chromosome arms?  this is my solution   but number of genes on sex chromosomes maybe off.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by tangming2005150
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 201 users visited in the last hour