Search
Question: PDCD1 missing from TxDb.Hsapiens.UCSC.hg38.knownGene?
0
gravatar for enricoferrero
19 months ago by
enricoferrero550
United Kingdom
enricoferrero550 wrote:

Can somebody help me understand why PDCD1 is missing from TxDb.Hsapiens.UCSC.hg38.knownGene (but not from TxDb.Hsapiens.UCSC.hg19.knownGene)?

library(org.Hs.eg.db)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)

geneid <- mapIds(org.Hs.eg.db, "PDCD1", "ENTREZID", "SYMBOL")

hg19.genes <- genes(TxDb.Hsapiens.UCSC.hg19.knownGene)
geneid %in% hg19.genes$gene_id
# TRUE

hg38.genes <- genes(TxDb.Hsapiens.UCSC.hg38.knownGene)
geneid %in% hg38.genes$gene_id
# FALSE

The gene is clearly part of the GRCh38/hg38 genome assembly:

Thank you.

3
gravatar for James W. MacDonald
19 months ago by
United States
James W. MacDonald45k wrote:

It may have to do with the alternate location that exists in GRCh38

> txs <- transcriptsBy(TxDb.Hsapiens.UCSC.hg38.knownGene)
> txs["5133"]
GRangesList object of length 1:
$5133
GRanges object with 6 ranges and 2 metadata columns:
                 seqnames                 ranges strand |     tx_id     tx_name
                    <Rle>              <IRanges>  <Rle> | <integer> <character>
  [1]                chr2 [241849881, 241852980]      - |     15141  uc010fzs.3
  [2]                chr2 [241849881, 241858906]      - |     15142  uc002wcq.4
  [3]                chr2 [241849881, 241858906]      - |     15143  uc010fzt.3
  [4] chr2_KI270776v1_alt [    61979,     65078]      - |     92941  uc032rbb.1
  [5] chr2_KI270776v1_alt [    61979,     71004]      - |     92942  uc032rbc.1
  [6] chr2_KI270776v1_alt [    61979,     71004]      - |     92943  uc032rbd.1

-------
seqinfo: 455 sequences (1 circular) from hg38 genome

> txs2 <- transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene)
> txs2["5133"]
GRangesList object of length 1:
$5133
GRanges object with 3 ranges and 2 metadata columns:
      seqnames                 ranges strand |     tx_id     tx_name
         <Rle>              <IRanges>  <Rle> | <integer> <character>
  [1]     chr2 [242792033, 242795132]      - |     13057  uc010fzs.3
  [2]     chr2 [242792033, 242801058]      - |     13058  uc002wcq.4
  [3]     chr2 [242792033, 242801058]      - |     13059  uc010fzt.3

-------
seqinfo: 93 sequences (1 circular) from hg19 genome

I'm not sure you can give a single gene location for something that may occur in more than one place.

ADD COMMENTlink written 19 months ago by James W. MacDonald45k
1

Yes, that might be the reason, thanks. 

Based on your answer, I was able to rectify this in this way:

library(org.Hs.eg.db)

library(TxDb.Hsapiens.UCSC.hg38.knownGene)

geneid <- mapIds(org.Hs.eg.db, "PDCD1", "ENTREZID", "SYMBOL")

hg38.genes <- genes(keepStandardChromosomes(TxDb.Hsapiens.UCSC.hg38.knownGene))
geneid %in% hg38.genes$gene_id
# TRUE

Btw, this doesn't strike me as a sane default behaviour.

ADD REPLYlink written 19 months ago by enricoferrero550
1

This is one of those things that reasonable people can argue about for, like, forever, and still not come up with a wholly acceptable compromise.

We sort of did something similar with the annotation packages in the past, where if there were a one-to-many match, we returned an NA, because we figured it wasn't our decision to say which returned value was 'the right one'. These days the select function returns everything, and we are all like 'you are on your own, dude', but mapIds by default just returns the first of any multiple match, hiding from those allergic to help pages the fact that there may be more matches. There is a cogent argument for either default, and I tend to want to err on the side of more information rather than less, but then you get people wanting to know what they should do with all this unwanted extra complexity.

ADD REPLYlink modified 19 months ago • written 19 months ago by James W. MacDonald45k
2
gravatar for enricoferrero
19 months ago by
enricoferrero550
United Kingdom
enricoferrero550 wrote:

Use keepStandardChromosomes():

library(org.Hs.eg.db)

library(TxDb.Hsapiens.UCSC.hg38.knownGene)

geneid <- mapIds(org.Hs.eg.db, "PDCD1", "ENTREZID", "SYMBOL")

hg38.genes <- genes(keepStandardChromosomes(TxDb.Hsapiens.UCSC.hg38.knownGene))
geneid %in% hg38.genes$gene_id
# TRUE
ADD COMMENTlink written 19 months ago by enricoferrero550
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 115 users visited in the last hour