Question: Using Ensmbl rel 80, when rel 79 is supported by BioConductor
gravatar for anthonycolombo60
4.4 years ago by
anthonycolombo600 wrote:



First thank you for any advice.


I am using external software that processes Ensmbl_GRCh38.rel80 for homo sapiens for data processing.


When I bring the data into R, the annotation that is available for H.Sapiens is rel79.


This is a discrepancy that I wish to clear up.


Should I only process data that is relevant to supported annotation libraries from bioConductor?  should I ignore these version differences (I think not) ?


Suggestions Welcome

Anthony Colombo

ensembl • 565 views
ADD COMMENTlink modified 4.4 years ago by Johannes Rainer1.5k • written 4.4 years ago by anthonycolombo600
Answer: Using Ensmbl rel 80, when rel 79 is supported by BioConductor
gravatar for Johannes Rainer
4.4 years ago by
Johannes Rainer1.5k
Johannes Rainer1.5k wrote:

Alternatively, you can use the ensembldb package. Versions 75 and 79 are available through Bioconductor, but it's really simple to generate annotation packages/databases based on any Ensembl version using ensembldb and the AnnotationHub package (check the ensembldb package vignette for alternative options):

ah <- AnnotationHub()

## query AnnotationHub for available Ensembl gtf files for Ensembl release 80
query(ah, c("Homo sapiens", "release-80"))

## get the version 80 gtf:
gtf <- ah[["AH47066"]]

## generate the annotation database
DbFile <- ensDbFromGRanges(gtf, organism="Homo_sapiens", version=80, genomeVersion="GRCh38")

## we can either generate a database package using the makeEnsembldbPackage
## , or directly load the data
Edb <- EnsDb(DbFile)

## you can then use e.g. genes to get all annotations from all genes
GRanges object with 65217 ranges and 5 metadata columns:
                            seqnames                 ranges strand   |
                               <Rle>              <IRanges>  <Rle>   |
  ENSG00000000003                  X [100627109, 100639991]      -   |
  ENSG00000000005                  X [100584802, 100599885]      +   |
  ENSG00000000419                 20 [ 50934867,  50958555]      -   |
  ENSG00000000457                  1 [169849631, 169894267]      -   |
  ENSG00000000460                  1 [169662007, 169854080]      +   |
              ...                ...                    ...    ... ...
  ENSG00000281918                  1 [113079537, 113079847]      +   |
  ENSG00000281919  CHR_HSCHR5_6_CTG1 [ 33946602,  33956490]      -   |
  ENSG00000281920                  2 [ 65623272,  65628424]      +   |
  ENSG00000281921                  3 [134261776, 134261911]      +   |
  ENSG00000281922 CHR_HSCHR17_1_CTG5 [ 46784842,  46785913]      -   |
                          gene_id     gene_name  entrezid         gene_biotype
                      <character>   <character> <integer>          <character>
  ENSG00000000003 ENSG00000000003        TSPAN6      <NA>       protein_coding
  ENSG00000000005 ENSG00000000005          TNMD      <NA>       protein_coding
  ENSG00000000419 ENSG00000000419          DPM1      <NA>       protein_coding
  ENSG00000000457 ENSG00000000457         SCYL3      <NA>       protein_coding
  ENSG00000000460 ENSG00000000460      C1orf112      <NA>       protein_coding
              ...             ...           ...       ...                  ...
  ENSG00000281918 ENSG00000281918   Metazoa_SRP      <NA>             misc_RNA
  ENSG00000281919 ENSG00000281919       SLC45A2      <NA>       protein_coding
  ENSG00000281920 ENSG00000281920 RP11-418H16.1      <NA>              lincRNA
  ENSG00000281921 ENSG00000281921    AC096967.1      <NA>                miRNA
  ENSG00000281922 ENSG00000281922 RP11-1070B7.2      <NA> processed_pseudogene
  ENSG00000000003             <NA>
  ENSG00000000005             <NA>
  ENSG00000000419             <NA>
  ENSG00000000457             <NA>
  ENSG00000000460             <NA>
              ...              ...
  ENSG00000281918             <NA>
  ENSG00000281919             <NA>
  ENSG00000281920             <NA>
  ENSG00000281921             <NA>
  ENSG00000281922             <NA>
  seqinfo: 312 sequences from GRCh38 genome
## check the vignette of the package for additional infos (e.g. filter the result, get sequences etc)


cheers, jo


ADD COMMENTlink written 4.4 years ago by Johannes Rainer1.5k
Answer: Using Ensmbl rel 80, when rel 79 is supported by BioConductor
gravatar for Diego Diez
4.4 years ago by
Diego Diez750
Diego Diez750 wrote:

A possibility is to annotate with Ensembl 80 using the biomaRt package. Take a look at this relevant recent post: Ensembl release 80 is out!

ADD COMMENTlink written 4.4 years ago by Diego Diez750
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 181 users visited in the last hour