Question: Is it possible to annotated mitochondrial genes with ENSEMBL ids and Homo.sapiens/Mus.musculus?
gravatar for Shian Su
8 months ago by
Shian Su10
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Shian Su10 wrote:


I'm trying to use Homo.sapiens to annotate genes from my count matrix. However I never seem to find any genes from chrM.


all_genes <- biomaRt::select(
    key = keys(Homo.sapiens, keytype = "GENEID"),
    keytype = "GENEID",
    columns = c("CDSCHROM", "GENENAME")


This is meant to show all the genes with a GENEID and none of them appear to be from chrM

I don't know my way around annotation packages very well but this other code reveals that there is positions encoded into Homo.sapiens for the chrM chromosome, however no GeneID or Symbol is assigned.

all_trans <- transcripts(Homo.sapiens, columns=c("GENEID","SYMBOL"))
all_trans[seqnames(all_trans) == "chrM", ]

Is this the intended behaviour?

Also as a side question, what do people consider "mitochondrial genes"? Is it only the genes on the chrM or should I be considering anything where I can find "mitochondrial" in the GENENAME?

ADD COMMENTlink modified 8 months ago by Johannes Rainer1.3k • written 8 months ago by Shian Su10
gravatar for Johannes Rainer
8 months ago by
Johannes Rainer1.3k
Johannes Rainer1.3k wrote:

Regarding annotations of mitochondrial genes - if you're interested in Ensembl IDs (or have Ensembl gene IDs) you might give ensembldb and EnsDb databases a try. You can fetch mitochondrial genes from an EnsDb:

> library(EnsDb.Hsapiens.v75)
> gns <- genes(EnsDb.Hsapiens.v75, filter = ~ seq_name == "MT")
> gns
GRanges object with 37 ranges and 6 metadata columns:
                  seqnames         ranges strand |         gene_id   gene_name
                     <Rle>      <IRanges>  <Rle> |     <character> <character>
  ENSG00000210049       MT   [ 577,  647]      + | ENSG00000210049       MT-TF
  ENSG00000211459       MT   [ 648, 1601]      + | ENSG00000211459     MT-RNR1
  ENSG00000210077       MT   [1602, 1670]      + | ENSG00000210077       MT-TV
  ENSG00000210082       MT   [1671, 3229]      + | ENSG00000210082     MT-RNR2
  ENSG00000209082       MT   [3230, 3304]      + | ENSG00000209082      MT-TL1
              ...      ...            ...    ... .             ...         ...
  ENSG00000198695       MT [14149, 14673]      - | ENSG00000198695      MT-ND6
  ENSG00000210194       MT [14674, 14742]      - | ENSG00000210194       MT-TE
  ENSG00000198727       MT [14747, 15887]      + | ENSG00000198727      MT-CYB
  ENSG00000210195       MT [15888, 15953]      + | ENSG00000210195       MT-TT
  ENSG00000210196       MT [15956, 16023]      - | ENSG00000210196       MT-TP
                    gene_biotype seq_coord_system      symbol  entrezid
                     <character>      <character> <character>    <list>
  ENSG00000210049        Mt_tRNA       chromosome       MT-TF        NA
  ENSG00000211459        Mt_rRNA       chromosome     MT-RNR1        NA
  ENSG00000210077        Mt_tRNA       chromosome       MT-TV        NA
  ENSG00000210082        Mt_rRNA       chromosome     MT-RNR2 100616263
  ENSG00000209082        Mt_tRNA       chromosome      MT-TL1        NA
              ...            ...              ...         ...       ...
  ENSG00000198695 protein_coding       chromosome      MT-ND6      4541
  ENSG00000210194        Mt_tRNA       chromosome       MT-TE        NA
  ENSG00000198727 protein_coding       chromosome      MT-CYB      4519
  ENSG00000210195        Mt_tRNA       chromosome       MT-TT        NA
  ENSG00000210196        Mt_tRNA       chromosome       MT-TP        NA
  seqinfo: 1 sequence from GRCh37 genome


EnsDb.Hsapiens.v75 bases on the relatively old Ensembl 75 release data. If you want more recent ones I suggest you get them from AnnotationHub:

> library(AnnotationHub)
> query(AnnotationHub(), "EnsDb.Hsapiens.")
snapshotDate(): 2017-10-27
AnnotationHub with 4 records
# snapshotDate(): 2017-10-27
# $dataprovider: Ensembl
# $species: Homo Sapiens
# $rdataclass: EnsDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH53211"]]'

  AH53211 | Ensembl 87 EnsDb for Homo Sapiens
  AH53715 | Ensembl 88 EnsDb for Homo Sapiens
  AH56681 | Ensembl 89 EnsDb for Homo Sapiens
  AH57757 | Ensembl 90 EnsDb for Homo Sapiens

> edb <- AnnotationHub()[["AH57757"]]


hope that helps,

cheers, jo

ADD COMMENTlink written 8 months ago by Johannes Rainer1.3k

Thanks Johannes, that's very helpful. Do you know if there are any major differences between the AH##### databases and Homo.sapiens that I should be aware of?

ADD REPLYlink written 8 months ago by Shian Su10

Just to avoid confusion: AnnotationHub is a central repository for annotation resources. AnnotationHub contains many different databases among those EnsDb databases, TxDb databases, genomic sequences etc. I wouldn't call them AH#### databases, the AH#### is just the ID of the resource in AnnotationHub. In the example I was extracting an EnsDb database from the AnnotationHub. For more information on these you might want to have a look at the ensembldb package (vignettes).

As far as I know, the Homo.sapiens database/resource contains a TxDb database providing the genomic coordinates of genes/transcripts/exons. These TxDb are usually based on annotations from UCSC. EnsDb annotations are designed for and built from Ensembl annotations. Their versions match the Ensembl release version on which they are built (i.e. EnsDb.Hsapiens.90 contains all human annotations for Ensembl release 90).


ADD REPLYlink written 8 months ago by Johannes Rainer1.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 233 users visited in the last hour