biromartr query fail
1
0
Entering edit mode
Zsolt Gyüre ▴ 10
@zsolt-gyure-24120
Last seen 12 months ago

Hi,

I'm trying to find functional genomic annotation (exon, intron, UTR, enhancer, etc) of SNP by positions. Using biomartr package, my code is:

biomartr::biomart(genes = SNV_regions$region, mart = "ENSEMBL_MART_FUNCGEN", dataset = "hsapiens_regulatory_feature", attributes = "feature_type_name", filters = "encode_region") Result: Starting BioMart query ... Ensembl site unresponsive, trying uswest mirror Error in biomaRt::getBM(attributes = as.character(c(filters, attributes)), : Invalid attribute(s): encode_region Please use the function 'listAttributes' to get valid attribute names I checked attributes and filters and both valid, according to getAttributes() and getFilters() functions. Can anyone helps me what's the problem? Thanks in advance! Zsolt annotation • 490 views ADD COMMENT 0 Entering edit mode Can you give an example of the values you have in SNV_regions$region ?

0
Entering edit mode

Of course. Like: 1:1034206:1034206

0
Entering edit mode
Mike Smith ★ 5.1k
@mike-smith
Last seen 2 hours ago
EMBL Heidelberg / de.NBI

There's quite a lot going on here, but the first thing to note is that you're using biomartr which is not a Bioconductor package, so it's unlikely that the maintainer of that package will be here to help. However, biomartr is a wrapper around biomaRt, which is a Bioconductor package, and the error messages you see are generated by biomaRt. Hopefully we can offer some guidance for that. First I'll address the error message, then maybe what you're trying to do

The first line about an unresponsive site is probably because the main Ensembl site seems to be very slow today, so biomaRt automatically tries to use a mirror site. You don't need to worry about that, and it probably wont happen all the time.

I think the Invalid attribute(s): encode_region message is correct, but I don't think it's a problem with your code. In your call you have filters = "encode_region" and the error message is about an attribute. The example below uses only biomaRt functions, but we can see that "encode_region" is not listed as an attribute.

library(biomaRt)

ensembl <- useEnsembl(biomart = "ENSEMBL_MART_FUNCGEN",
dataset = "hsapiens_regulatory_feature")

listAttributes(ensembl, what = "name")
#>  [1] "activity"                 "regulatory_stable_id"
#>  [3] "bound_seq_region_start"   "bound_seq_region_end"
#>  [5] "chromosome_name"          "chromosome_start"
#>  [7] "chromosome_end"           "feature_type_name"
#>  [9] "feature_type_description" "epigenome_name"
#> [11] "epigenome_description"    "so_accession"
#> [13] "so_name"                  "efo_id"

The Error in biomaRt::getBM(attributes = as.character(c(filters, attributes)) message gives a hint as to what might be the problem. We can see that biomartr is calling biomaRt::getBM() and for some reasons it's passing both it's filters and attributes arguments to the attributes argument of getBM(). I have no idea why it's doing that, but it looks wrong to me. Filters and Attributes are separate things in the world of BioMart. Sometimes you could get lucky as there are common names in both, but it seems like you would encounter this error frequently with biomartr, so we'll have to use biomaRt.

I'm not sure you want the encode_region filter at all. That filter indicates you want to get data back from some specific, and already defined, regions studied in the Encode project. If you want to retrieve data for arbitrary loci you probable want to use the chromosomal_region filter. Here's an example of how that might work:

ensembl <- useEnsembl(biomart = "ENSEMBL_MART_FUNCGEN",
dataset = "hsapiens_regulatory_feature")

getBM(mart = ensembl,
values = "1:1034206:1034206",
attributes = c("chromosome_name", "chromosome_start",
"chromosome_end", "feature_type_name"),
filters = "chromosomal_region")
#>   chromosome_name chromosome_start chromosome_end feature_type_name
#> 1               1          1032600        1034601          Promoter

If values is a vector of strings it will return a data.frame with entries for each that it finds. biomaRt drops values that it doesn't find in the database and can shuffle their order relative to the query, which is why we include the chromosomal positions in the results so you can check which values you got back.

However, note that the start and end values are for the whole promotor in the example above, not just the SNP we queried with, so you may still need to do some post processing.

1
Entering edit mode

Dear Mike,

Thank you for the very detailed help and the example, it worked very well!

Zsolt