Search
Question: biomaRt: drerio_gene_ensembl dataset missing
0
gravatar for António Miguel de Jesus Domingues
4 months ago by
Germany

Whilst running a `RIPSeeper` analysis, I noticed that the dataset `drerio_gene_ensembl` which used to be available via `biomaRt` is not longer listed or accessible. To test this I first upgraded my `bioC` to make sure I am working with the latest version of `biomaRt` (2.34.0).


```r
biocLite("BiocUpgrade")
biocLite("BiocUpgrade")
```

I then followed the instructions in the [vignette](https://bioconductor.org/packages/release/bioc/vignettes/biomaRt/inst/doc/biomaRt.html) and connected to `ensembl`:


```r
library("biomaRt")
```

```
## Loading required package: methods
```

```r
ensembl <- useMart("ensembl")
dat <- listDatasets(ensembl)
str(dat)
```

```
## 'data.frame':    33 obs. of  3 variables:
##  $ dataset    :Class 'AsIs'  chr [1:33] "amelanoleuca_gene_ensembl" "dordii_gene_ensembl" "mpahari_gene_ensembl" "trubripes_gene_ensembl" ...
##  $ description:Class 'AsIs'  chr [1:33] "Panda genes (ailMel1)" "Kangaroo rat genes (Dord_2.0)" "Shrew mouse genes (PAHARI_EIJ_v1.1)" "Fugu genes (FUGU 4.0)" ...
##  $ version    :Class 'AsIs'  chr [1:33] "ailMel1" "Dord_2.0" "PAHARI_EIJ_v1.1" "FUGU 4.0" ...
```

```r
dim(dat)
```

```
## [1] 33  3
```

```r
dat[grepl("hsapiens", dat$dataset),]
```

```
## [1] dataset     description version    
## <0 rows> (or 0-length row.names)
```

```r
dat[grepl("drerio", dat$dataset),]
```

```
## [1] dataset     description version    
## <0 rows> (or 0-length row.names)
```

The tutorial lists 85 datasets wheres now it only retrieves 50. Weirdly, I noticed that numbers changed when I repeated this so I wrapped this in a loop and repeated the analysis several times:


```r
for (i in 1:10){
    print(paste("cycle:",i))
    ensembl <- useMart("ensembl")
    dat <- listDatasets(ensembl)
    print(dim(dat))
    print(paste("Is drerio present?", "drerio_gene_ensembl" %in% dat$dataset))
}
```

```
## [1] "cycle: 1"
## [1] 33  3
## [1] "Is drerio present? FALSE"
## [1] "cycle: 2"
## [1] 50  3
## [1] "Is drerio present? FALSE"
## [1] "cycle: 3"
## [1] 50  3
## [1] "Is drerio present? FALSE"
## [1] "cycle: 4"
## [1] 33  3
## [1] "Is drerio present? FALSE"
## [1] "cycle: 5"
## [1] 50  3
## [1] "Is drerio present? FALSE"
## [1] "cycle: 6"
## [1] 33  3
## [1] "Is drerio present? FALSE"
## [1] "cycle: 7"
## [1] 46  3
## [1] "Is drerio present? FALSE"
## [1] "cycle: 8"
## [1] 50  3
## [1] "Is drerio present? FALSE"
## [1] "cycle: 9"
## [1] 45  3
## [1] "Is drerio present? FALSE"
## [1] "cycle: 10"
## [1] 46  3
## [1] "Is drerio present? FALSE"
```

The number of datasets listed varies with almost each run. Importantly for me `drerio_gene_ensembl` was missing in all the tests except one.

This instability leads to:

- errors when using packages which depend on a connection to ensembl, for instance `RIPSeeker`.
- reproducibility errors for anyone not using one the stable datasets (I did not test which ones were always present but hsapiens appears to be always available).

I "solved" the issue by using an archive host:


```r
host <- "http://oct2016.archive.ensembl.org"
ensembl <- useMart("ensembl", host = "oct2016.archive.ensembl.org")
dat <- listDatasets(ensembl)
str(dat)
```

```
## 'data.frame':    69 obs. of  3 variables:
##  $ dataset    :Class 'AsIs'  chr [1:69] "oanatinus_gene_ensembl" "cporcellus_gene_ensembl" "gaculeatus_gene_ensembl" "itridecemlineatus_gene_ensembl" ...
##  $ description:Class 'AsIs'  chr [1:69] "Ornithorhynchus anatinus genes (OANA5)" "Cavia porcellus genes (cavPor3)" "Gasterosteus aculeatus genes (BROADS1)" "Ictidomys tridecemlineatus genes (spetri2)" ...
##  $ version    :Class 'AsIs'  chr [1:69] "OANA5" "cavPor3" "BROADS1" "spetri2" ...
```

```r
dim(dat)
```

```
## [1] 69  3
```

```r
dat[grepl("drerio", dat$dataset),]
```

```
##                dataset                description version
## 40 drerio_gene_ensembl Danio rerio genes (GRCz10)  GRCz10
```

but using older annotations is a bit of an hack. Has anything changed recently in ensembl or `biomaRt` that explains the missing dataset and this instability?


```r
sessionInfo()
```

```
## R version 3.4.2 (2017-09-28)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
##
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
##
## attached base packages:
## [1] methods   stats     graphics  grDevices utils     datasets  base     
##
## other attached packages:
## [1] biomaRt_2.34.0 knitr_1.17    
##
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.14         AnnotationDbi_1.40.0 magrittr_1.5        
##  [4] BiocGenerics_0.24.0  progress_1.1.2       IRanges_2.12.0      
##  [7] bit_1.1-12           R6_2.2.2             rlang_0.1.4         
## [10] stringr_1.2.0        blob_1.1.0           tools_3.4.2         
## [13] parallel_3.4.2       Biobase_2.38.0       DBI_0.7             
## [16] assertthat_0.2.0     bit64_0.9-7          digest_0.6.12       
## [19] tibble_1.3.4         S4Vectors_0.16.0     bitops_1.0-6        
## [22] RCurl_1.95-4.8       memoise_1.1.0        RSQLite_2.0         
## [25] evaluate_0.10.1      stringi_1.1.6        compiler_3.4.2      
## [28] prettyunits_1.0.2    stats4_3.4.2         XML_3.98-1.9
```

 

ADD COMMENTlink modified 4 months ago by Mike Smith2.6k • written 4 months ago by António Miguel de Jesus Domingues390

@moderators: I could not format the post properly due to an error:

Language "fr" is not one of the supported languages ['en']!

Post was copy-pasted from a markdown document generated via knitr, so no idea.

 

ADD REPLYlink modified 4 months ago • written 4 months ago by António Miguel de Jesus Domingues390

You could instead of using an old archived version go for version 90 from August (http://Aug2017.archive.ensembl.org), which in many cases will have limited differences to the very latest release

ADD REPLYlink written 4 months ago by thokall100
1

Good tip. In my case it was a little lazy because I am also using the script to run some C. elegans data analysis and this will work for both - RIPSeeker needs biomaRt/ensembl so I need an archive version before the move to Wormbase. Another hack.

ADD REPLYlink modified 4 months ago • written 4 months ago by António Miguel de Jesus Domingues390
3
gravatar for Mike Smith
4 months ago by
Mike Smith2.6k
EMBL Heidelberg / de.NBI
Mike Smith2.6k wrote:

There was an issue with one of the new primate datasets having an apostrophe in its description, which was causing listDatasets() to fail. I have patched this in version 2.35.1 and pushed it to the Bioconductor devel branch. This will take a few days to propagate, so the fastest way to get hold if it is via Github using

BiocInstaller::biocLite('grimbough/biomaRt')

If people could report back if that works or not that would be very helpful, and assuming it works I will also patch the release version of biomaRt.


library(biomaRt)
packageVersion("biomaRt")
[1] ‘2.35.1’
ensembl_mart <- useMart("ensembl")
dim( listDatasets(ensembl_mart) )
[1] 97  3
ADD COMMENTlink written 4 months ago by Mike Smith2.6k
> library(biomaRt)

> packageVersion("biomaRt")
[1] ‘2.35.2’

## [1] 97  3
## [1] "Is drerio present? TRUE"

All systems are go :) Thank you for the fix.

ADD REPLYlink written 4 months ago by António Miguel de Jesus Domingues390
2
gravatar for Mike Smith
4 months ago by
Mike Smith2.6k
EMBL Heidelberg / de.NBI
Mike Smith2.6k wrote:

Thanks for the report, I can confirm that I'm seeing the behaviour too.  I suspect this isn't a problem with the biomaRt package, but is related to the latest release of Ensembl, which is happening today (http://www.ensembl.info/blog/2017/12/12/ensembl-91-has-been-released/) and it will be back to normal in a few hours.  I'll keep an eye on it to make sure.

ADD COMMENTlink written 4 months ago by Mike Smith2.6k

The same issue was happening also yesterday. It is still likely due to the update at ensembl, but it is suboptimal that the default behaviour of useMart() uses an not yet complete instance and does in addition not generate any warnings

 

ADD REPLYlink written 4 months ago by thokall100
2
gravatar for Thomas Maurel
4 months ago by
Thomas Maurel730
United Kingdom
Thomas Maurel730 wrote:

There seems to be an issue with the new Ensembl gene mart 91 and the BiomaRt module. We are investigating with Mike Smith.

Apologies for any inconvenience caused.

ADD COMMENTlink written 4 months ago by Thomas Maurel730
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 190 users visited in the last hour