Question: biomaRt bug report
gravatar for luoyuan1984
9 months ago by
luoyuan19840 wrote:

Hello everyone,

I am learning how to use "biomaRt" following Bioconductor 2018 Workshops.

When I ran the query example, I got a result that is different from the tutorial.

The expected result was:

mart <- useMart("ENSEMBL_MART_ENSEMBL","hsapiens_gene_ensembl")
afyids <- c("1000_at","1001_at","1002_f_at","1007_s_at")
getBM(c("affy_hg_u95av2", "hgnc_symbol"), c("affy_hg_u95av2"), afyids, mart)
#>   affy_hg_u95av2 hgnc_symbol
#> 1        1000_at       MAPK3
#> 2      1007_s_at        DDR1
#> 3      1002_f_at            
#> 4      1002_f_at     CYP2C19
#> 5        1001_at        TIE1

But I got:

affy_hg_u95av2    hgnc_symbol

**1000_at   GDPD3** 
1007_s_at   DDR1    
**1000_at   MAPK3** 
1001_at TIE1    
1002_f_at   CYP2C19

The probe "1000_at" can map to GDPD3 and MAPK3, in which the MAPK3 should be right according to the Ensembl website.

Would you please help me to figure out the problem? Thank you very much.

Here is the system information:

Microsoft Windows [Version 10.0.14393] R: 3.5.2 RStudio: Version 1.1.456 biomaRt: 2.38.0

biomart software error • 186 views
ADD COMMENTlink modified 9 months ago by Mike Smith4.0k • written 9 months ago by luoyuan19840
Answer: biomaRt bug report
gravatar for Mike Smith
9 months ago by
Mike Smith4.0k
EMBL Heidelberg / de.NBI
Mike Smith4.0k wrote:

These a couple of points to address here. The first is that because the biomaRt package is accessing Ensembl's BioMart, which gets updated every 3 months, it's quite likely that the results of examples you find in tutorials or slides will no long match. This is because the underlying database has changed.

Just to demonstrate this, lets run your query against Ensembl version 94 (the current version is 95).

mart <- useEnsembl("ensembl","hsapiens_gene_ensembl", version = "94")
afyids <- c("1000_at","1001_at","1002_f_at","1007_s_at")
getBM(attributes = c("affy_hg_u95av2", "hgnc_symbol"), 
      filters = "affy_hg_u95av2", 
      values = afyids, 
      mart = mart)
  affy_hg_u95av2 hgnc_symbol
1      1007_s_at        DDR1
2        1000_at       MAPK3
3        1001_at        TIE1
4      1002_f_at            
5      1002_f_at     CYP2C19

Doing this we see the 5 results you see in the tutorial code, and only a single hit for 1000_at. So this isn't a problem with your code, or a bug in biomaRt, it just reflects that the data stored at Ensembl has changed. Specifying an Ensembl version in this manner can be useful for ensuring you're always working with the same annotation if a project you're part of takes longer than the Ensembl release cycle.

That said, I'm not actually sure why the annotation has changed to indicate 1000_at targets both genes. MAPK3 and GPDP3 are neighbours, but looking at the transcripts plot it doesn't seem like GPDP3 overlaps the probeset - it may be worth contacting Ensembl to find out the source of this change.


ADD COMMENTlink modified 9 months ago • written 9 months ago by Mike Smith4.0k

Thank you very much.

ADD REPLYlink written 9 months ago by luoyuan19840
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 248 users visited in the last hour