Question

PDCD1 missing from TxDb.Hsapiens.UCSC.hg38.knownGene?

1

Entering edit mode

enricoferrero ▴ 680

@enricoferrero-6037

Last seen 3.4 years ago

Switzerland

Can somebody help me understand why PDCD1 is missing from TxDb.Hsapiens.UCSC.hg38.knownGene (but not from TxDb.Hsapiens.UCSC.hg19.knownGene)?

library(org.Hs.eg.db)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)

geneid <- mapIds(org.Hs.eg.db, "PDCD1", "ENTREZID", "SYMBOL")

hg19.genes <- genes(TxDb.Hsapiens.UCSC.hg19.knownGene)
geneid %in% hg19.genes$gene_id
# TRUE

hg38.genes <- genes(TxDb.Hsapiens.UCSC.hg38.knownGene)
geneid %in% hg38.genes$gene_id
# FALSE

The gene is clearly part of the GRCh38/hg38 genome assembly:

Thank you.

txdb txdb.hsapiens.ucsc.hg19.knowngene txdb.hsapiens.ucsc.hg38.knowngene org.hs.eg.db annotation • 4.0k views

ADD COMMENT • link 8.9 years ago enricoferrero ▴ 680

score 3 · Accepted Answer · 2016-04-20

It may have to do with the alternate location that exists in GRCh38

> txs <- transcriptsBy(TxDb.Hsapiens.UCSC.hg38.knownGene)
> txs["5133"]
GRangesList object of length 1:
$5133
GRanges object with 6 ranges and 2 metadata columns:
                 seqnames                 ranges strand |     tx_id     tx_name
                    <Rle>              <IRanges>  <Rle> | <integer> <character>
  [1]                chr2 [241849881, 241852980]      - |     15141  uc010fzs.3
  [2]                chr2 [241849881, 241858906]      - |     15142  uc002wcq.4
  [3]                chr2 [241849881, 241858906]      - |     15143  uc010fzt.3
  [4] chr2_KI270776v1_alt [    61979,     65078]      - |     92941  uc032rbb.1
  [5] chr2_KI270776v1_alt [    61979,     71004]      - |     92942  uc032rbc.1
  [6] chr2_KI270776v1_alt [    61979,     71004]      - |     92943  uc032rbd.1

-------
seqinfo: 455 sequences (1 circular) from hg38 genome

> txs2 <- transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene)
> txs2["5133"]
GRangesList object of length 1:
$5133
GRanges object with 3 ranges and 2 metadata columns:
      seqnames                 ranges strand |     tx_id     tx_name
         <Rle>              <IRanges>  <Rle> | <integer> <character>
  [1]     chr2 [242792033, 242795132]      - |     13057  uc010fzs.3
  [2]     chr2 [242792033, 242801058]      - |     13058  uc002wcq.4
  [3]     chr2 [242792033, 242801058]      - |     13059  uc010fzt.3

-------
seqinfo: 93 sequences (1 circular) from hg19 genome

I'm not sure you can give a single gene location for something that may occur in more than one place.

score 3 · Accepted Answer · 2016-04-20

3