Question

Annotation: Use of org.Hs.egCHR to map Gene Entrez to Chromosome

0

Entering edit mode

Bine ▴ 50

@bine-23912

Last seen 6 months ago

UK

Dear all,

I want to find out on which Chromosome a Gene is. I am trying to adapt below code which uses the Gene Entrez to get the Chromosome Location. I already have my dataset d1 with one column containing the Gene Entrez, but I don't quite get below code adapted for that:

# select() interface:
## Objects in this package can be accessed using the select() interface
## from the AnnotationDbi package. See ?select for details.
## Bimap interface:
x <- org.Hs.egCHR
# Get the entrez gene identifiers that are mapped to a chromosome
mapped_genes <- mappedkeys(x)
# Convert to a list
xx <- as.list(x[mapped_genes])
if(length(xx) > 0) {
# Get the CHR for the first five genes
xx[1:5]
# Get the first one
xx[[1]]
}

Can anyone help me?

Thank you, Bine

org.Hs.egCHR org.Hs.eg.db • 1.0k views

ADD COMMENT • link 4.0 years ago Bine ▴ 50

score 2 · Accepted Answer · 2020-12-02

The OrgDb packages do contain genetic location data, but that's something that is intended to change, so you shouldn't rely on that.

You can use select on a TxDb package.

> z <- head(keys(org.Hs.eg.db))
> z
[1] "1"  "2"  "3"  "9"  "10" "11"

## CDS
> select(TxDb.Hsapiens.UCSC.hg38.knownGene, z, "CDSCHROM", "GENEID")
'select()' returned 1:1 mapping between keys and columns
  GENEID CDSCHROM
1      1    chr19
2      2    chr12
3      3     <NA>
4      9     chr8
5     10     chr8
6     11     <NA>

## Transcript
> select(TxDb.Hsapiens.UCSC.hg38.knownGene, z, "TXCHROM", "GENEID")
'select()' returned 1:1 mapping between keys and columns
  GENEID TXCHROM
1      1   chr19
2      2   chr12
3      3   chr12
4      9    chr8
5     10    chr8
6     11    <NA>

## Exons
> select(TxDb.Hsapiens.UCSC.hg38.knownGene, z, "EXONCHROM", "GENEID")
'select()' returned 1:1 mapping between keys and columns
  GENEID EXONCHROM
1      1     chr19
2      2     chr12
3      3     chr12
4      9      chr8
5     10      chr8
6     11      <NA>

Or probably a more modern approach

> zz <- genes(TxDb.Hsapiens.UCSC.hg38.knownGene, single.strand.genes.only = FALSE)
> zz
GRangesList object of length 27363:
$`1`
GRanges object with 1 range and 0 metadata columns:
      seqnames            ranges strand
         <Rle>         <IRanges>  <Rle>
  [1]    chr19 58345178-58362751      -
  -------
  seqinfo: 595 sequences (1 circular) from hg38 genome

$`10`
GRanges object with 1 range and 0 metadata columns:
      seqnames            ranges strand
         <Rle>         <IRanges>  <Rle>
  [1]     chr8 18391282-18401218      +
  -------
  seqinfo: 595 sequences (1 circular) from hg38 genome

$`100`
GRanges object with 1 range and 0 metadata columns:
      seqnames            ranges strand
         <Rle>         <IRanges>  <Rle>
  [1]    chr20 44619522-44652233      -
  -------
  seqinfo: 595 sequences (1 circular) from hg38 genome

...
<27360 more elements>

## Subscript
> zz[z]
Error: subscript contains invalid names

## But we already knew that from above, no?
## filter first
> z <- z[z %in% names(zz)]
> unlist(zz[z])
GRanges object with 5 ranges and 0 metadata columns:
     seqnames            ranges strand
        <Rle>         <IRanges>  <Rle>
   1    chr19 58345178-58362751      -
   2    chr12   9067664-9116229      -
   3    chr12   9228533-9275817      -
   9     chr8 18170477-18223689      +
  10     chr8 18391282-18401218      +
  -------
  seqinfo: 595 sequences (1 circular) from hg38 genome