Problem with apostrophe in biomaRt query
1
1
Entering edit mode
@analeonpalacio-18199
Last seen 2.1 years ago
Spain

I am trying to get the snp associated to the phenotype Crohn's disease using biomaRt:

getBM(attributes = c('refsnp_id','allele','chrom_start','chrom_strand'), 
      filters = "phenotype_description", 
      values = "Crohn's disease', 
      mart = ensembl)

But I have this error:

Error in getBM(attributes = c("refsnp_id", "allele", "chrom_start", "chrom_strand"),  : 
  Query ERROR: caught BioMart::Exception: non-BioMart die(): 
not well-formed (invalid token) at line 1, column 394, byte 394 at /nfs/services/ensweb-software/sharedsw/2018_08_22/linuxbrew/Cellar/perl/5.26.2_2/lib/perl5/site_perl/5.26.2/x86_64-linux-thread-multi/XML/Parser.pm line 187.
XML::Simple called at /nfs/public/release/ensweb/latest/live/mart/www_94/biomart-perl/lib/BioMart/Query.pm line 1935.

 

I thing it is a problem with the apostrophe because when I use "Crohn disease" it works. Any idea about how to fix it?

biomaRt • 890 views
ADD COMMENT
2
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen 11 hours ago
EMBL Heidelberg

Thanks for reporting this.  There's actually a couple of different things going on here.

First, you're absolutely right that the apostrophe is breaking the query. biomaRt uses apostrophes to separate the values when it submits the query, so sticking an extra one in leaves a mismatch and causes the error you see.

I've fixed this in the devel version of biomaRt (version 2.39.1), so it uses double quotes internally & your query will be submitted successfully.


However, if you look at the Ensembl BioMart web interface (https://www.ensembl.org/biomart) the phenotype filter isn't a free text search, but actually has a preset list of values, presumably populated by phenotype annotation that is actually used in the database.

You can view the list in an R session via with the code below, but be warned it returns a single huge string that significantly slows down my R session.

library(dplyr)
listFilters(mart = ensembl, what = c("name", "options")) %>%
    filter(name == "phenotype_description")

At the moment there's no elegant way to search the list of options, I'll have think about how to improve searchFilters() so you can find appropriate values more easily.


I think for now, the values you want for Crohn's disease are probably those below. I don't know how the annotation is assigned, but using this complete set will return a different set of SNPs compared to just using one.

values = c("Crohn disease",
           "Crohn's disease",
           "Crohn's disease (need for surgery)",
           "Crohn's disease (time to surgery)",
           "Crohn's disease and celiac disease",
           "Crohn's disease and psoriasis",
           "Crohn's disease and sarcoidosis (combined)",
           "Crohn's disease-related phenotypes")
ADD COMMENT
0
Entering edit mode

biomaRt version 2.39.2 now contain the functions listFilterValues and searchFilterValues. You can use these to try and determine the appropriate values to supply to a filter like phenotype_description. Here's an example of how they work:

## Use the Ensembl human SNP dataset
ensembl <- useEnsembl(biomart = "snp", dataset = "hsapiens_snp")

## we need to use the name of the filter in the next function
## if you don't know the exact name of the filter you need
## you can search for keywords we're interested in e.g. 'phenotype'
searchFilters(ensembl, pattern = "phenotype")

## now search the 'phenotype_description' filter for the term 'crohn'
searchFilterValues(mart = ensembl, 
                   filter = "phenotype_description", 
                   pattern = "crohn")
 [1] "Chronic inflammatory diseases (ankylosing spondylitis Crohn's disease psoriasis primary sclerosing cholangitis ulcerative colitis) (pleiotropy)"
 [2] "Crohn disease"                                                                                                                                  
 [3] "Crohn's disease"                                                                                                                                
 [4] "Crohn's disease (need for surgery)"                                                                                                             
 [5] "Crohn's disease (time to surgery)"                                                                                                              
 [6] "Crohn's disease and celiac disease"                                                                                                             
 [7] "Crohn's disease and psoriasis"                                                                                                                  
 [8] "Crohn's disease and sarcoidosis (combined)"                                                                                                     
 [9] "Crohn's disease-related phenotypes"                                                                                                             
[10] "INFLAMMATORY BOWEL DISEASE 1 (CROHN DISEASE) SUSCEPTIBILITY TO"                                                                                 
[11] "Paneth cell defects in Crohn's disease"                                                                                                         
[12] "Poor prognosis in Crohn's disease"                                                                                                              
[13] "Ulcerative colitis or Crohn's disease"
ADD REPLY

Login before adding your answer.

Traffic: 686 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6