Search
Question: Using Ensmbl rel 80, when rel 79 is supported by BioConductor
0
gravatar for anthonycolombo60
3.2 years ago by
anthonycolombo600 wrote:

Hi.

 

First thank you for any advice.

 

I am using external software that processes Ensmbl_GRCh38.rel80 for homo sapiens for data processing.

 

When I bring the data into R, the annotation that is available for H.Sapiens is rel79.

 

This is a discrepancy that I wish to clear up.

 

Should I only process data that is relevant to supported annotation libraries from bioConductor?  should I ignore these version differences (I think not) ?

 

Suggestions Welcome


Anthony Colombo

ADD COMMENTlink modified 3.2 years ago by Johannes Rainer1.3k • written 3.2 years ago by anthonycolombo600
1
gravatar for Johannes Rainer
3.2 years ago by
Johannes Rainer1.3k
Italy
Johannes Rainer1.3k wrote:

Alternatively, you can use the ensembldb package. Versions 75 and 79 are available through Bioconductor, but it's really simple to generate annotation packages/databases based on any Ensembl version using ensembldb and the AnnotationHub package (check the ensembldb package vignette for alternative options):

library(AnnotationHub)
library(ensembldb)
ah <- AnnotationHub()

## query AnnotationHub for available Ensembl gtf files for Ensembl release 80
query(ah, c("Homo sapiens", "release-80"))

## get the version 80 gtf:
gtf <- ah[["AH47066"]]

## generate the annotation database
DbFile <- ensDbFromGRanges(gtf, organism="Homo_sapiens", version=80, genomeVersion="GRCh38")

## we can either generate a database package using the makeEnsembldbPackage
## , or directly load the data
Edb <- EnsDb(DbFile)

## you can then use e.g. genes to get all annotations from all genes
genes(Edb)
GRanges object with 65217 ranges and 5 metadata columns:
                            seqnames                 ranges strand   |
                               <Rle>              <IRanges>  <Rle>   |
  ENSG00000000003                  X [100627109, 100639991]      -   |
  ENSG00000000005                  X [100584802, 100599885]      +   |
  ENSG00000000419                 20 [ 50934867,  50958555]      -   |
  ENSG00000000457                  1 [169849631, 169894267]      -   |
  ENSG00000000460                  1 [169662007, 169854080]      +   |
              ...                ...                    ...    ... ...
  ENSG00000281918                  1 [113079537, 113079847]      +   |
  ENSG00000281919  CHR_HSCHR5_6_CTG1 [ 33946602,  33956490]      -   |
  ENSG00000281920                  2 [ 65623272,  65628424]      +   |
  ENSG00000281921                  3 [134261776, 134261911]      +   |
  ENSG00000281922 CHR_HSCHR17_1_CTG5 [ 46784842,  46785913]      -   |
                          gene_id     gene_name  entrezid         gene_biotype
                      <character>   <character> <integer>          <character>
  ENSG00000000003 ENSG00000000003        TSPAN6      <NA>       protein_coding
  ENSG00000000005 ENSG00000000005          TNMD      <NA>       protein_coding
  ENSG00000000419 ENSG00000000419          DPM1      <NA>       protein_coding
  ENSG00000000457 ENSG00000000457         SCYL3      <NA>       protein_coding
  ENSG00000000460 ENSG00000000460      C1orf112      <NA>       protein_coding
              ...             ...           ...       ...                  ...
  ENSG00000281918 ENSG00000281918   Metazoa_SRP      <NA>             misc_RNA
  ENSG00000281919 ENSG00000281919       SLC45A2      <NA>       protein_coding
  ENSG00000281920 ENSG00000281920 RP11-418H16.1      <NA>              lincRNA
  ENSG00000281921 ENSG00000281921    AC096967.1      <NA>                miRNA
  ENSG00000281922 ENSG00000281922 RP11-1070B7.2      <NA> processed_pseudogene
                  seq_coord_system
                         <integer>
  ENSG00000000003             <NA>
  ENSG00000000005             <NA>
  ENSG00000000419             <NA>
  ENSG00000000457             <NA>
  ENSG00000000460             <NA>
              ...              ...
  ENSG00000281918             <NA>
  ENSG00000281919             <NA>
  ENSG00000281920             <NA>
  ENSG00000281921             <NA>
  ENSG00000281922             <NA>
  -------
  seqinfo: 312 sequences from GRCh38 genome
## check the vignette of the package for additional infos (e.g. filter the result, get sequences etc)

 

cheers, jo

 

ADD COMMENTlink written 3.2 years ago by Johannes Rainer1.3k
0
gravatar for Diego Diez
3.2 years ago by
Diego Diez730
Japan
Diego Diez730 wrote:

A possibility is to annotate with Ensembl 80 using the biomaRt package. Take a look at this relevant recent post: Ensembl release 80 is out!

ADD COMMENTlink written 3.2 years ago by Diego Diez730
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 341 users visited in the last hour