Search
Question: biomaRt: drerio_gene_ensembl dataset missing
0
gravatar for António Miguel de Jesus Domingues
11 months ago by
Germany

Whilst running a `RIPSeeper` analysis, I noticed that the dataset `drerio_gene_ensembl` which used to be available via `biomaRt` is not longer listed or accessible. To test this I first upgraded my `bioC` to make sure I am working with the latest version of `biomaRt` (2.34.0).


```r
biocLite("BiocUpgrade")
biocLite("BiocUpgrade")
```

I then followed the instructions in the [vignette](https://bioconductor.org/packages/release/bioc/vignettes/biomaRt/inst/doc/biomaRt.html) and connected to `ensembl`:


```r
library("biomaRt")
```

```
## Loading required package: methods
```

```r
ensembl <- useMart("ensembl")
dat <- listDatasets(ensembl)
str(dat)
```

```
## 'data.frame':    33 obs. of  3 variables:
##  $ dataset    :Class 'AsIs'  chr [1:33] "amelanoleuca_gene_ensembl" "dordii_gene_ensembl" "mpahari_gene_ensembl" "trubripes_gene_ensembl" ...
##  $ description:Class 'AsIs'  chr [1:33] "Panda genes (ailMel1)" "Kangaroo rat genes (Dord_2.0)" "Shrew mouse genes (PAHARI_EIJ_v1.1)" "Fugu genes (FUGU 4.0)" ...
##  $ version    :Class 'AsIs'  chr [1:33] "ailMel1" "Dord_2.0" "PAHARI_EIJ_v1.1" "FUGU 4.0" ...
```

```r
dim(dat)
```

```
## [1] 33  3
```

```r
dat[grepl("hsapiens", dat$dataset),]
```

```
## [1] dataset     description version    
## <0 rows> (or 0-length row.names)
```

```r
dat[grepl("drerio", dat$dataset),]
```

```
## [1] dataset     description version    
## <0 rows> (or 0-length row.names)
```

The tutorial lists 85 datasets wheres now it only retrieves 50. Weirdly, I noticed that numbers changed when I repeated this so I wrapped this in a loop and repeated the analysis several times:


```r
for (i in 1:10){
    print(paste("cycle:",i))
    ensembl <- useMart("ensembl")
    dat <- listDatasets(ensembl)
    print(dim(dat))
    print(paste("Is drerio present?", "drerio_gene_ensembl" %in% dat$dataset))
}
```

```
## [1] "cycle: 1"
## [1] 33  3
## [1] "Is drerio present? FALSE"
## [1] "cycle: 2"
## [1] 50  3
## [1] "Is drerio present? FALSE"
## [1] "cycle: 3"
## [1] 50  3
## [1] "Is drerio present? FALSE"
## [1] "cycle: 4"
## [1] 33  3
## [1] "Is drerio present? FALSE"
## [1] "cycle: 5"
## [1] 50  3
## [1] "Is drerio present? FALSE"
## [1] "cycle: 6"
## [1] 33  3
## [1] "Is drerio present? FALSE"
## [1] "cycle: 7"
## [1] 46  3
## [1] "Is drerio present? FALSE"
## [1] "cycle: 8"
## [1] 50  3
## [1] "Is drerio present? FALSE"
## [1] "cycle: 9"
## [1] 45  3
## [1] "Is drerio present? FALSE"
## [1] "cycle: 10"
## [1] 46  3
## [1] "Is drerio present? FALSE"
```

The number of datasets listed varies with almost each run. Importantly for me `drerio_gene_ensembl` was missing in all the tests except one.

This instability leads to:

- errors when using packages which depend on a connection to ensembl, for instance `RIPSeeker`.
- reproducibility errors for anyone not using one the stable datasets (I did not test which ones were always present but hsapiens appears to be always available).

I "solved" the issue by using an archive host:


```r
host <- "http://oct2016.archive.ensembl.org"
ensembl <- useMart("ensembl", host = "oct2016.archive.ensembl.org")
dat <- listDatasets(ensembl)
str(dat)
```

```
## 'data.frame':    69 obs. of  3 variables:
##  $ dataset    :Class 'AsIs'  chr [1:69] "oanatinus_gene_ensembl" "cporcellus_gene_ensembl" "gaculeatus_gene_ensembl" "itridecemlineatus_gene_ensembl" ...
##  $ description:Class 'AsIs'  chr [1:69] "Ornithorhynchus anatinus genes (OANA5)" "Cavia porcellus genes (cavPor3)" "Gasterosteus aculeatus genes (BROADS1)" "Ictidomys tridecemlineatus genes (spetri2)" ...
##  $ version    :Class 'AsIs'  chr [1:69] "OANA5" "cavPor3" "BROADS1" "spetri2" ...
```

```r
dim(dat)
```

```
## [1] 69  3
```

```r
dat[grepl("drerio", dat$dataset),]
```

```
##                dataset                description version
## 40 drerio_gene_ensembl Danio rerio genes (GRCz10)  GRCz10
```

but using older annotations is a bit of an hack. Has anything changed recently in ensembl or `biomaRt` that explains the missing dataset and this instability?


```r
sessionInfo()
```

```
## R version 3.4.2 (2017-09-28)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
##
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
##
## attached base packages:
## [1] methods   stats     graphics  grDevices utils     datasets  base     
##
## other attached packages:
## [1] biomaRt_2.34.0 knitr_1.17    
##
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.14         AnnotationDbi_1.40.0 magrittr_1.5        
##  [4] BiocGenerics_0.24.0  progress_1.1.2       IRanges_2.12.0      
##  [7] bit_1.1-12           R6_2.2.2             rlang_0.1.4         
## [10] stringr_1.2.0        blob_1.1.0           tools_3.4.2         
## [13] parallel_3.4.2       Biobase_2.38.0       DBI_0.7             
## [16] assertthat_0.2.0     bit64_0.9-7          digest_0.6.12       
## [19] tibble_1.3.4         S4Vectors_0.16.0     bitops_1.0-6        
## [22] RCurl_1.95-4.8       memoise_1.1.0        RSQLite_2.0         
## [25] evaluate_0.10.1      stringi_1.1.6        compiler_3.4.2      
## [28] prettyunits_1.0.2    stats4_3.4.2         XML_3.98-1.9
```

 

ADD COMMENTlink modified 11 months ago by Mike Smith3.1k • written 11 months ago by António Miguel de Jesus Domingues390

@moderators: I could not format the post properly due to an error:

Language "fr" is not one of the supported languages ['en']!

Post was copy-pasted from a markdown document generated via knitr, so no idea.

 

ADD REPLYlink modified 11 months ago • written 11 months ago by António Miguel de Jesus Domingues390

You could instead of using an old archived version go for version 90 from August (http://Aug2017.archive.ensembl.org), which in many cases will have limited differences to the very latest release

ADD REPLYlink written 11 months ago by thokall120
1

Good tip. In my case it was a little lazy because I am also using the script to run some C. elegans data analysis and this will work for both - RIPSeeker needs biomaRt/ensembl so I need an archive version before the move to Wormbase. Another hack.

ADD REPLYlink modified 11 months ago • written 11 months ago by António Miguel de Jesus Domingues390
3
gravatar for Mike Smith
11 months ago by
Mike Smith3.1k
EMBL Heidelberg / de.NBI
Mike Smith3.1k wrote:

There was an issue with one of the new primate datasets having an apostrophe in its description, which was causing listDatasets() to fail. I have patched this in version 2.35.1 and pushed it to the Bioconductor devel branch. This will take a few days to propagate, so the fastest way to get hold if it is via Github using

BiocInstaller::biocLite('grimbough/biomaRt')

If people could report back if that works or not that would be very helpful, and assuming it works I will also patch the release version of biomaRt.


library(biomaRt)
packageVersion("biomaRt")
[1] ‘2.35.1’
ensembl_mart <- useMart("ensembl")
dim( listDatasets(ensembl_mart) )
[1] 97  3
ADD COMMENTlink written 11 months ago by Mike Smith3.1k
> library(biomaRt)

> packageVersion("biomaRt")
[1] ‘2.35.2’

## [1] 97  3
## [1] "Is drerio present? TRUE"

All systems are go :) Thank you for the fix.

ADD REPLYlink written 11 months ago by António Miguel de Jesus Domingues390
2
gravatar for Mike Smith
11 months ago by
Mike Smith3.1k
EMBL Heidelberg / de.NBI
Mike Smith3.1k wrote:

Thanks for the report, I can confirm that I'm seeing the behaviour too.  I suspect this isn't a problem with the biomaRt package, but is related to the latest release of Ensembl, which is happening today (http://www.ensembl.info/blog/2017/12/12/ensembl-91-has-been-released/) and it will be back to normal in a few hours.  I'll keep an eye on it to make sure.

ADD COMMENTlink written 11 months ago by Mike Smith3.1k

The same issue was happening also yesterday. It is still likely due to the update at ensembl, but it is suboptimal that the default behaviour of useMart() uses an not yet complete instance and does in addition not generate any warnings

 

ADD REPLYlink written 11 months ago by thokall120
2
gravatar for Thomas Maurel
11 months ago by
Thomas Maurel750
United Kingdom
Thomas Maurel750 wrote:

There seems to be an issue with the new Ensembl gene mart 91 and the BiomaRt module. We are investigating with Mike Smith.

Apologies for any inconvenience caused.

ADD COMMENTlink written 11 months ago by Thomas Maurel750
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 386 users visited in the last hour