biomaRt bug report
1
0
Entering edit mode
@luoyuan1984-13788
Last seen 5.3 years ago

Hello everyone,

I am learning how to use "biomaRt" following Bioconductor 2018 Workshops.

When I ran the query example, I got a result that is different from the tutorial.

The expected result was:

mart <- useMart("ENSEMBL_MART_ENSEMBL","hsapiens_gene_ensembl")
afyids <- c("1000_at","1001_at","1002_f_at","1007_s_at")
getBM(c("affy_hg_u95av2", "hgnc_symbol"), c("affy_hg_u95av2"), afyids, mart)
#>   affy_hg_u95av2 hgnc_symbol
#> 1        1000_at       MAPK3
#> 2      1007_s_at        DDR1
#> 3      1002_f_at            
#> 4      1002_f_at     CYP2C19
#> 5        1001_at        TIE1

But I got:

affy_hg_u95av2    hgnc_symbol

**1000_at   GDPD3** 
1007_s_at   DDR1    
**1000_at   MAPK3** 
1001_at TIE1    
1002_f_at       
1002_f_at   CYP2C19

The probe "1000_at" can map to GDPD3 and MAPK3, in which the MAPK3 should be right according to the Ensembl website.

Would you please help me to figure out the problem? Thank you very much.

Here is the system information:

Microsoft Windows [Version 10.0.14393] R: 3.5.2 RStudio: Version 1.1.456 biomaRt: 2.38.0

software error biomaRt • 635 views
ADD COMMENT
0
Entering edit mode
Mike Smith ★ 6.5k
@mike-smith
Last seen 20 hours ago
EMBL Heidelberg

These a couple of points to address here. The first is that because the biomaRt package is accessing Ensembl's BioMart, which gets updated every 3 months, it's quite likely that the results of examples you find in tutorials or slides will no long match. This is because the underlying database has changed.

Just to demonstrate this, lets run your query against Ensembl version 94 (the current version is 95).

library(biomaRt)
mart <- useEnsembl("ensembl","hsapiens_gene_ensembl", version = "94")
afyids <- c("1000_at","1001_at","1002_f_at","1007_s_at")
getBM(attributes = c("affy_hg_u95av2", "hgnc_symbol"), 
      filters = "affy_hg_u95av2", 
      values = afyids, 
      mart = mart)
  affy_hg_u95av2 hgnc_symbol
1      1007_s_at        DDR1
2        1000_at       MAPK3
3        1001_at        TIE1
4      1002_f_at            
5      1002_f_at     CYP2C19

Doing this we see the 5 results you see in the tutorial code, and only a single hit for 1000_at. So this isn't a problem with your code, or a bug in biomaRt, it just reflects that the data stored at Ensembl has changed. Specifying an Ensembl version in this manner can be useful for ensuring you're always working with the same annotation if a project you're part of takes longer than the Ensembl release cycle.


That said, I'm not actually sure why the annotation has changed to indicate 1000_at targets both genes. MAPK3 and GPDP3 are neighbours, but looking at the transcripts plot it doesn't seem like GPDP3 overlaps the probeset - it may be worth contacting Ensembl to find out the source of this change.

1000_at

ADD COMMENT
0
Entering edit mode

Thank you very much.

ADD REPLY

Login before adding your answer.

Traffic: 382 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6