how to use biomaRt with ensembl build 37
2
4
Entering edit mode
@pablo-marin-garcia-6030
Last seen 5.6 years ago
United Kingdom
Hello, 

I usually use biomart with my own perl scripts using the RESTful service and I point to "http://grch37.ensembl.org/biomart/martservice"

But now I have been playing with BiomaRt during this week and  seems that biomart has changed their martservice from GRCh37 to GRCh38.

I read the docs for my biomaRt version: 2.18.0 (R 3.1.0) and found that I can change host and path but I am having trouble doing it.

First I checked that the biomaRt was working:

> ensembl <- useMart("ensembl", dataset="hsapiens_gene_ensembl")

But wen I tried to define a host and path I had an error:
# change host and path => ERROR
> ensembl <- useMart("ensembl", dataset="hsapiens_gene_ensembl", host="http://grch37.ensembl.org", path="/biomart/martservice")
Request to BioMart web service failed. Verify if you are still connected to the internet.  Alternatively the BioMart web service is temporarily down.  Check http://www.biomart.org and verify if this website is available.
Error: XML content does not seem to be XML:

# To check if there is a problem with the url I have put the URL as in the default @host
> ensembl <- useMart("ensembl", dataset="hsapiens_gene_ensembl", host="http://www.biomart.org", path="/biomart/martservice")
Request to BioMart web service failed. Verify if you are still connected to the internet.  Alternatively the BioMart web service is temporarily down.  Check http://www.biomart.org and verify if this website is available.
Error: XML content does not seem to be XML: 

I am missing something?

Do the host and path options need something extra I am missing? 

Why is expecting a XML at the point of initialization?

Regards

biomaRt ensembl GRCh37 • 17k views
ADD COMMENT
5
Entering edit mode
Thomas Maurel ▴ 800
@thomas-maurel-5295
Last seen 22 months ago
United Kingdom

Dear Pablo,

I am really sorry, there seems to be a typo in my previous answer, the command should be:

ensembl = useMart(biomart="ENSEMBL_MART_ENSEMBL", host="grch37.ensembl.org", path="/biomart/martservice" ,dataset="hsapiens_gene_ensembl")

 

The error you got was coming from the "ensembl" in useMart.

Hope this helps,

Regards,

Thomas

ADD COMMENT
0
Entering edit mode

Thanks Thomas, by the way I am a bit confuse with the circularity of BiomaRt. useMart needs a biomart datbase, If you don't know a biomart database you can list them with listMarts that needs the biomaRt object created with useMart that needs a biomart database....Doh?!.  From where can I have a list of biomart databases for a host BEFORE creating a biomart object?

ADD REPLY
1
Entering edit mode

You can get a list of the marts without creating an object but only for the mart databases located on biomart.org (biomaRt connect to this server by default):

library("biomaRt")
listMarts()

                                 biomart                    version
1                                ensembl                  ENSEMBL GENES 77 (SANGER UK)
2                                    snp                      ENSEMBL VARIATION 77 (SANGER UK)
3                    functional_genomics           ENSEMBL REGULATION 77 (SANGER UK)
4                                   vega                     VEGA 57  (SANGER UK)
5                          fungi_mart_23               ENSEMBL FUNGI 23 (EBI UK)

If you want to know the mart databases from an external server, you can do the following if you are using biomaRt 2.21.4 or above:

> grch37 = useMart(biomart="ENSEMBL_MART_ENSEMBL", host="grch37.ensembl.org", path="/biomart/martservice")
> listMarts(grch37)
               biomart               version
1 ENSEMBL_MART_ENSEMBL      Ensembl Genes 75
2     ENSEMBL_MART_SNP  Ensembl Variation 75
3 ENSEMBL_MART_FUNCGEN Ensembl Regulation 75
4    ENSEMBL_MART_VEGA               Vega 55
5                pride        PRIDE (EBI UK)

Then you can run the following command to get a list of all the datasets:

> listDatasets(grch37)
                          dataset                                 description         version
1          oanatinus_gene_ensembl      Ornithorhynchus anatinus genes (OANA5)           OANA5
2         cporcellus_gene_ensembl             Cavia porcellus genes (cavPor3)         cavPor3
3         gaculeatus_gene_ensembl      Gasterosteus aculeatus genes (BROADS1)         BROADS1
4          lafricana_gene_ensembl          Loxodonta africana genes (loxAfr3)         loxAfr3
5  itridecemlineatus_gene_ensembl  Ictidomys tridecemlineatus genes (spetri2)         spetri2
6         choffmanni_gene_ensembl         Choloepus hoffmanni genes (choHof1)         choHof1
7          csavignyi_gene_ensembl              Ciona savignyi genes (CSAV2.0)         CSAV2.0
8             fcatus_gene_ensembl         Felis catus genes (Felis_catus_6.2) Felis_catus_6.2
9        rnorvegicus_gene_ensembl          Rattus norvegicus genes (Rnor_5.0)        Rnor_5.0

Hope this helps,

Regards,

Thomas

ADD REPLY
0
Entering edit mode
Hi Thomas,

As you can see below, I've tried your solution but it keeps loading version 77 ...

> library(biomaRt)
> grch37 = useMart(biomart="ENSEMBL_MART_ENSEMBL", host="grch37.ensembl.org", path="/biomart/martservice")
> head(listMarts(grch37))
              biomart                             version
1             ensembl        ENSEMBL GENES 77 (SANGER UK)
2                 snp    ENSEMBL VARIATION 77 (SANGER UK)
3 functional_genomics   ENSEMBL REGULATION 77 (SANGER UK)
4                vega                VEGA 57  (SANGER UK)
5       fungi_mart_23           ENSEMBL FUNGI 23 (EBI UK)
6 fungi_variations_23 ENSEMBL FUNGI VARIATION 23 (EBI UK)

 

ADD REPLY
0
Entering edit mode

Dear Mathieu,

The listMarts function was updated in biomaRt 2.21.4 to improve the compatibility with external host so if you update your Bioconductor code to the latest version by running the following:

source("http://bioconductor.org/biocLite.R")
biocLite("BiocUpgrade")

 

Then you should be able to get the following:

> library(biomaRt)
> grch37 = useMart(biomart="ENSEMBL_MART_ENSEMBL", host="grch37.ensembl.org", path="/biomart/martservice")
> head(listMarts(grch37))
               biomart               version
1 ENSEMBL_MART_ENSEMBL      Ensembl Genes 75
2     ENSEMBL_MART_SNP  Ensembl Variation 75
3 ENSEMBL_MART_FUNCGEN Ensembl Regulation 75
4    ENSEMBL_MART_VEGA               Vega 55
5                pride        PRIDE (EBI UK)
> 

Please note that biomaRt did managed to connect to "grch37.ensembl.org", it's just the listMarts function which is returning the list of marts from biomart.org instead of grch37.ensembl.org.

Hope this helps,

Regards,

Thomas

ADD REPLY
0
Entering edit mode

Hi, Pablo

Thanks for your question. I encountered the same problem that I got quite different results using biomaRt now than about a month ago.

And regarding this circularity thing, in function listMarts, you can specify 'host' and 'path' like what you do in function useMart.

listMarts(host="grch37.ensembl.org", path="/biomart/martservice")
               biomart               version
1 ENSEMBL_MART_ENSEMBL      Ensembl Genes 75
2     ENSEMBL_MART_SNP  Ensembl Variation 75
3 ENSEMBL_MART_FUNCGEN Ensembl Regulation 75
4    ENSEMBL_MART_VEGA               Vega 55
5                pride        PRIDE (EBI UK)

Hope this helps. 

Sincerely,

Ting

ADD REPLY
3
Entering edit mode
Thomas Maurel ▴ 800
@thomas-maurel-5295
Last seen 22 months ago
United Kingdom

Dear Pablo,

If you want to connect to the Ensembl GRCh37 archive, your host should be "grch37.ensembl.org" instead of "http://grch37.ensembl.org":

ensembl <- useMart("ensembl", dataset="hsapiens_gene_ensembl", host="grch37.ensembl.org", path="/biomart/martservice")

The default behaviour of the BiomaRt package is to connect to biomart.org (which is now on GRCh38 (ensembl release 77)), so you don't have to specify any host or path information and can just run the following:

ensembl=useMart("ensembl")

Hope this helps,

Best Regards,

Thomas

 

 

 

ADD COMMENT
0
Entering edit mode

Thanks Thomas,

Now it connects but throws an error:

> ensembl <- useMart("ensembl", dataset="hsapiens_gene_ensembl", host="grch37.ensembl.org", path="/biomart/martservice")
Error in useMart("ensembl", dataset = "hsapiens_gene_ensembl", host = "grch37.ensembl.org",  :
  Incorrect BioMart name, use the listMarts function to see which BioMart databases are available

From the error seems that the datasets in the ensembl biomart host have different names. Is that right?

I tried to create the biomaRt object without dataset for calling later the listDatasets but it does not work

> ensembl <- useMart("ensembl", host="grch37.ensembl.org", path="/biomart/martservice")
Error in useMart("ensembl", host = "grch37.ensembl.org", path = "/biomart/martservice") : 
  Incorrect BioMart name, use the listMarts function to see which BioMart databases are available

Do you know how can I find the list of datasets for a given biomart host?

 

ADD REPLY

Login before adding your answer.

Traffic: 536 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6