Question

Is it possible to annotated mitochondrial genes with ENSEMBL ids and Homo.sapiens/Mus.musculus?

1

Entering edit mode

Shian Su ▴ 40

@shian-su-9869

Last seen 2.1 years ago

Walter and Eliza Hall Institute of Medi…

I'm trying to use Homo.sapiens to annotate genes from my count matrix. However I never seem to find any genes from chrM.

library(Homo.sapiens)

all_genes <- biomaRt::select(
    Homo.sapiens,
    key = keys(Homo.sapiens, keytype = "GENEID"),
    keytype = "GENEID",
    columns = c("CDSCHROM", "GENENAME")
)

sort(unique(all_genes$CDSCHROM))

This is meant to show all the genes with a GENEID and none of them appear to be from chrM

I don't know my way around annotation packages very well but this other code reveals that there is positions encoded into Homo.sapiens for the chrM chromosome, however no GeneID or Symbol is assigned.

all_trans <- transcripts(Homo.sapiens, columns=c("GENEID","SYMBOL"))
all_trans[seqnames(all_trans) == "chrM", ]

Is this the intended behaviour?

Also as a side question, what do people consider "mitochondrial genes"? Is it only the genes on the chrM or should I be considering anything where I can find "mitochondrial" in the GENENAME?

homo.sapiens annotationdbi • 3.7k views

ADD COMMENT • link updated 6.4 years ago by Johannes Rainer ★ 2.0k • written 6.4 years ago by Shian Su ▴ 40

score 4 · Accepted Answer · 2017-11-21

Regarding annotations of mitochondrial genes - if you're interested in Ensembl IDs (or have Ensembl gene IDs) you might give ensembldb and EnsDb databases a try. You can fetch mitochondrial genes from an EnsDb:

> library(EnsDb.Hsapiens.v75)
> gns <- genes(EnsDb.Hsapiens.v75, filter = ~ seq_name == "MT")
> gns
GRanges object with 37 ranges and 6 metadata columns:
                  seqnames         ranges strand |         gene_id   gene_name
                     <Rle>      <IRanges>  <Rle> |     <character> <character>
  ENSG00000210049       MT   [ 577,  647]      + | ENSG00000210049       MT-TF
  ENSG00000211459       MT   [ 648, 1601]      + | ENSG00000211459     MT-RNR1
  ENSG00000210077       MT   [1602, 1670]      + | ENSG00000210077       MT-TV
  ENSG00000210082       MT   [1671, 3229]      + | ENSG00000210082     MT-RNR2
  ENSG00000209082       MT   [3230, 3304]      + | ENSG00000209082      MT-TL1
              ...      ...            ...    ... .             ...         ...
  ENSG00000198695       MT [14149, 14673]      - | ENSG00000198695      MT-ND6
  ENSG00000210194       MT [14674, 14742]      - | ENSG00000210194       MT-TE
  ENSG00000198727       MT [14747, 15887]      + | ENSG00000198727      MT-CYB
  ENSG00000210195       MT [15888, 15953]      + | ENSG00000210195       MT-TT
  ENSG00000210196       MT [15956, 16023]      - | ENSG00000210196       MT-TP
                    gene_biotype seq_coord_system      symbol  entrezid
                     <character>      <character> <character>    <list>
  ENSG00000210049        Mt_tRNA       chromosome       MT-TF        NA
  ENSG00000211459        Mt_rRNA       chromosome     MT-RNR1        NA
  ENSG00000210077        Mt_tRNA       chromosome       MT-TV        NA
  ENSG00000210082        Mt_rRNA       chromosome     MT-RNR2 100616263
  ENSG00000209082        Mt_tRNA       chromosome      MT-TL1        NA
              ...            ...              ...         ...       ...
  ENSG00000198695 protein_coding       chromosome      MT-ND6      4541
  ENSG00000210194        Mt_tRNA       chromosome       MT-TE        NA
  ENSG00000198727 protein_coding       chromosome      MT-CYB      4519
  ENSG00000210195        Mt_tRNA       chromosome       MT-TT        NA
  ENSG00000210196        Mt_tRNA       chromosome       MT-TP        NA
  -------
  seqinfo: 1 sequence from GRCh37 genome

EnsDb.Hsapiens.v75 bases on the relatively old Ensembl 75 release data. If you want more recent ones I suggest you get them from AnnotationHub:

> library(AnnotationHub)
> query(AnnotationHub(), "EnsDb.Hsapiens.")
snapshotDate(): 2017-10-27
AnnotationHub with 4 records
# snapshotDate(): 2017-10-27
# $dataprovider: Ensembl
# $species: Homo Sapiens
# $rdataclass: EnsDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH53211"]]'

            title                            
  AH53211 | Ensembl 87 EnsDb for Homo Sapiens
  AH53715 | Ensembl 88 EnsDb for Homo Sapiens
  AH56681 | Ensembl 89 EnsDb for Homo Sapiens
  AH57757 | Ensembl 90 EnsDb for Homo Sapiens

> edb <- AnnotationHub()[["AH57757"]]

hope that helps,

cheers, jo