biomaRt does not return entrezgene id
2
0
Entering edit mode
@danielcarbajo-12758
Last seen 4.2 years ago

Hello, I am trying to retrieve ENTREZGENE IDs using ENSEMBL IDs as queries using biomaRt in R, but it does not retrieve them properly, instead it returns the HGNC IDs in place of the ENTREZGENE IDs.

See this MWE:

library(biomaRt)
ensembl <- useMart("ENSEMBL_MART_ENSEMBL", dataset="hsapiens_gene_ensembl", host="www.ensembl.org")

genes <- c("ENSG00000121671", "ENSG00000142208", "ENSG00000171051", "ENSG00000115271", "ENSG00000143537")

getBM(attributes=c('ensembl_gene_id','entrezgene','hgnc_id','hgnc_symbol'), filters='ensembl_gene_id', values=genes, mart=ensembl)

It returns

ensembl_gene_id entrezgene    hgnc_id hgnc_symbol
1 ENSG00000115271        GCA HGNC:15990         GCA
2 ENSG00000121671       CRY2  HGNC:2385        CRY2
3 ENSG00000142208       AKT1   HGNC:391        AKT1
5 ENSG00000171051       FPR1  HGNC:3826        FPR1

How should I do to retrieve the ENTREZGENE IDs correctly? Thanks.


0
Entering edit mode

Can you update your post to include the output from sessionInfo()? I'd like to check what version of biomaRt you're using as I get the entrezgene IDs returned correctly.

0
Entering edit mode

I get the same as the OP:

> mart2 <- useMart("ENSEMBL_MART_ENSEMBL","celegans_gene_ensembl", host = "www.ensembl.org")

> huh2 <- getBM(c("ensembl_gene_id", "entrezgene", "wormbase_gene"),mart = mart2)

ensembl_gene_id entrezgene  wormbase_gene
1  WBGene00000001      aap-1 WBGene00000001
2  WBGene00000002      aat-1 WBGene00000002
3  WBGene00000003      aat-2 WBGene00000003
4  WBGene00000004      aat-3 WBGene00000004
5  WBGene00000005      aat-4 WBGene00000005
6  WBGene00000006      aat-5 WBGene00000006

But this is what I get if I use the Biomart mirror at useast.ensembl.org, so it seems it hasn't updated yet. If I try to hack things to use ensembl.org, I get this:

> mart2@host <- "http://ensembl.org:80/biomart/martservice"
> huh2 <- getBM(c("ensembl_gene_id", "entrezgene", "wormbase_gene"),mart = mart2)
[1] ensembl_gene_id entrezgene      wormbase_gene
<0 rows> (or 0-length row.names)

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 14393)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
[1] biomaRt_2.30.0       oligo_1.38.0         Biostrings_2.42.1
[4] XVector_0.14.1       oligoClasses_1.36.0  annotate_1.52.1
[7] XML_3.98-1.5         limma_3.30.13        org.Hs.eg.db_3.4.0
[13] org.Dm.eg.db_3.4.0   AnnotationDbi_1.36.2 IRanges_2.8.2
[16] S4Vectors_0.12.2     Biobase_2.34.0       BiocGenerics_0.20.0
0
Entering edit mode

Thanks for testing the code and reporting your findings.  I suspect this is related to the issues reported (and fixed) in A: Ensembl 88 is out!

I has made me question the effectiveness of the host argument to useMart() since I always seem to end up on the main ensembl site, presumably due to some geo-location redirections.  I'll see if I can over ride this in the biomaRt code.

0
Entering edit mode

Hi Mike,

I actually never realised this behaviour before. If this can be over ride in the biomaRt code that would be great. To over ride the automatic ensembl mirrors redirect, you can use the following flag in the URL: "?redirect=no". E.g:

http://uswest.ensembl.org/index.html?redirect=no

This should bring you straight to the uswest ensembl mirror.

Cheers,

Thomas

0
Entering edit mode

Thanks for the hint.  This might be frustrating for all involved, but it has exposed an interesting 'feature'!

I'll take a look in the next few days - there's not much point in the argument if it silently doesn't work.

0
Entering edit mode

Sorry I'm just seeing the messages given the time difference... do you still need to see sessionInfo? I guess the answer here is to just wait right?

0
Entering edit mode

sessionInfo() is always useful, and as a general rule you should include it in any post you make here as someone will inevitably ask you for it.  But in this case it look like the issue is with your local ensembl mirror rather than the biomaRt package, so you'll have to wait either for the mirror to be updated or for me to figure out how to force biomaRt to query the main site.

I suspect the ensembl fix will come first, they're normally very good at sorting issues like this.

2
Entering edit mode
Thomas Maurel ▴ 790
@thomas-maurel-5295
Last seen 6 days ago
United Kingdom

Dear all,

Just to let you know that the Ensembl mirrors have now been fixed.

Thanks a lot for your patience.

Kind Regards,

Thomas

0
Entering edit mode

It works properly now!!

0
Entering edit mode

Thanks for this.

There is now an argument ensemblRedirect = FALSE to useMart() which will force off the redirection to a local mirror.  This is available from biomaRt version 2.21.6

0
Entering edit mode
Thomas Maurel ▴ 790
@thomas-maurel-5295
Last seen 6 days ago
United Kingdom

Dear All,

I am afraid that all our Ensembl mirrors have reverted back this afternoon to how they were few days ago.

We are looking into this, please use www.ensembl.org in the meantime.

I will let you know once everything is working again.

Apologies for any inconvenience caused.

Kind Regards,

Thomas

0
Entering edit mode

So I guess the answer here is to just wait right? The code I posted used to work perfectly before... Could I get an estimation of how long will it take for it to be working again? Thanks a lot!