biomaRt fails to get flanking sequence
1
0
Entering edit mode
mlwright • 0
@mlwright-20631
Last seen 3.7 years ago

Hi all,

I'm trying to use the biomaRt function getSequence() to fetch coding gene flanking sequences. It seemed to work fine a few weeks back, but lately, it has been shooting me an error about 50% of the time. For example, if I try to run this snippet of code:

promos<-getSequence(id = upreg$genes, type="external_gene_name", seqType="coding_gene_flank", upstream=1000, mart=ensembl)

it will shoot me this error:

Error in getBM(c(seqType, type), filters = c(type, "upstream_flank"), : Query ERROR: caught BioMart::Exception::Usage: Filter upstream_flank NOT FOUND

But if I try running the same piece of code again, it runs without issue. I have tried ensembl <- useEnsembl("ensembl", dataset = "hsapiens_gene_ensembl", mirror = "useast") with both the useast and uswest mirrors and got the same error both times.

Is this some issue with biomaRt accessing Ensembl, or possibly something related to the version of biomaRt I'm using?

Thanks!

Session info:

R version 3.5.3 (2019-03-11)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] biomaRt_2.38.0       org.Hs.eg.db_3.7.0   AnnotationDbi_1.44.0 IRanges_2.16.0      
[5] S4Vectors_0.20.1     Biobase_2.42.0       BiocGenerics_0.28.0  gage_2.32.1         
[9] ReactomePA_1.26.0   

loaded via a namespace (and not attached):
 [1] bitops_1.0-6        enrichplot_1.2.0    bit64_0.9-7         RColorBrewer_1.1-2 
 [5] progress_1.2.0      httr_1.4.0          UpSetR_1.3.3        tools_3.5.3        
 [9] backports_1.1.4     R6_2.4.0            DBI_1.0.0           lazyeval_0.2.2     
[13] colorspace_1.4-1    tidyselect_0.2.5    graphite_1.28.2     gridExtra_2.3      
[17] prettyunits_1.0.2   curl_3.3            bit_1.1-14          compiler_3.5.3     
[21] graph_1.60.0        xml2_1.2.0          labeling_0.3        triebeard_0.3.0    
[25] scales_1.0.0        checkmate_1.9.1     ggridges_0.5.1      rappdirs_0.3.1     
[29] stringr_1.4.0       digest_0.6.18       DOSE_3.8.2          XVector_0.22.0     
[33] pkgconfig_2.0.2     rlang_0.3.4         rstudioapi_0.10     RSQLite_2.1.1      
[37] gridGraphics_0.3-0  farver_1.1.0        jsonlite_1.6        BiocParallel_1.16.6
[41] GOSemSim_2.8.0      dplyr_0.8.0.1       RCurl_1.95-4.12     magrittr_1.5       
[45] ggplotify_0.0.3     GO.db_3.7.0         Matrix_1.2-17       Rcpp_1.0.1         
[49] munsell_0.5.0       viridis_0.5.1       stringi_1.4.3       yaml_2.2.0         
[53] ggraph_1.0.2        zlibbioc_1.28.0     MASS_7.3-51.1       plyr_1.8.4         
[57] qvalue_2.14.1       grid_3.5.3          blob_1.1.1          ggrepel_0.8.0      
[61] DO.db_2.9           crayon_1.3.4        lattice_0.20-38     Biostrings_2.50.2  
[65] cowplot_0.9.4       splines_3.5.3       hms_0.4.2           KEGGREST_1.22.0    
[69] knitr_1.22          pillar_1.3.1        fgsea_1.8.0         igraph_1.2.4       
[73] reshape2_1.4.3      fastmatch_1.1-0     XML_3.98-1.19       glue_1.3.1         
[77] data.table_1.12.2   png_0.1-7           tweenr_1.0.1        urltools_1.7.3     
[81] gtable_0.3.0        purrr_0.3.2         polyclip_1.10-0     assertthat_0.2.1   
[85] ggplot2_3.1.1       xfun_0.6            ggforce_0.2.1       europepmc_0.3      
[89] reactome.db_1.66.0  viridisLite_0.3.0   tibble_2.1.1        rvcheck_0.1.3      
[93] memoise_1.1.0 
biomaRt software error R • 1.2k views
ADD COMMENT
1
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen 9 hours ago
EMBL Heidelberg

This seems to be a problem with the underlying BioMart instance. If you try the same query in the browser interface you get a similarly dramatic (but slightly different) error:

ERROR: caught BioMart::Exception: non-BioMart die(): Can't locate object method "setTable" via package "BioMart::Configuration::Attribute" at /nfs/public/release/ensweb/latest/live/mart/www_96/biomart-perl/lib/BioMart/Query.pm line 1321.

Maybe one time in 20 it will work successfully. I've emailed Ensembl to bring this to their attention and will update here when they respond.


Short term fix, it looks like the most recent archive version is still working, and I suspect the sequences will not have changed much between versions. You can use the archive like so:

library(biomaRt)
ensembl <- useEnsembl( "ensembl", dataset = "hsapiens_gene_ensembl", version = "95" )
promos <- getSequence(id = c("CDC6"), 
                      type = "external_gene_name", 
                      seqType = "coding_gene_flank", 
                      upstream = 1000, 
                      mart = ensembl)

Here's the result (using as_tibble() for nicer formatting).

> dplyr::as_tibble(promos)
# A tibble: 1 x 2
  coding_gene_flank                          external_gene_name
  <chr>                                      <chr>            
1 CTTTTTTTTTTTTTTGAGACGGAGTCTCGCTCTTGTCGCCC… CDC6   
ADD COMMENT
0
Entering edit mode

Thanks! (update for 2024) The latest version today (112) suffers from this problem but "111" works!

ADD REPLY

Login before adding your answer.

Traffic: 728 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6