bioMart "The number of columns in the result table does not equal the number of attributes in the query."
1
1
Entering edit mode
noah.reid ▴ 10
@noahreid-13414
Last seen 11 weeks ago
United States

I have some code for extracting annotation information from Ensembl and it has ceased working with the usual hostname:


# Ensembl gene IDs from fundulus heteroclitus release 107. 
gids <- c("ENSFHEG00000014345","ENSFHEG00000014326","ENSFHEG00000014282","ENSFHEG00000014227","ENSFHEG00000014113","ENSFHEG00000014098","ENSFHEG00000014086","ENSFHEG00000014072","ENSFHEG00000014058","ENSFHEG00000014032","ENSFHEG00000013967","ENSFHEG00000000334","ENSFHEG00000013942","ENSFHEG00000013847","ENSFHEG00000013795","ENSFHEG00000013774","ENSFHEG00000013756","ENSFHEG00000013549","ENSFHEG00000013406","ENSFHEG00000013399")

# USE REGULAR HOST
ensemblhost <- "https://ensembl.org"

killi_mart <- useMart(biomart = "ENSEMBL_MART_ENSEMBL", host = ensemblhost, dataset = "fheteroclitus_gene_ensembl")

ann <- getBM(filter="ensembl_gene_id",value=gids,attributes=c("ensembl_gene_id","description","transcript_length"),mart=killi_mart)

This fails with error:

Error in .processResults(postRes, mart = mart, hostURLsep = sep, fullXmlQuery = fullXmlQuery,  : 
  The query to the BioMart webservice returned an invalid result.
The number of columns in the result table does not equal the number of attributes in the query.
Please report this on the support site at http://support.bioconductor.org

If I choose an "archive" hostname, the exact same code works, even though the July archive is still 107 (I think?):

gids <- c("ENSFHEG00000014345","ENSFHEG00000014326","ENSFHEG00000014282","ENSFHEG00000014227","ENSFHEG00000014113","ENSFHEG00000014098","ENSFHEG00000014086","ENSFHEG00000014072","ENSFHEG00000014058","ENSFHEG00000014032","ENSFHEG00000013967","ENSFHEG00000000334","ENSFHEG00000013942","ENSFHEG00000013847","ENSFHEG00000013795","ENSFHEG00000013774","ENSFHEG00000013756","ENSFHEG00000013549","ENSFHEG00000013406","ENSFHEG00000013399")

# USE ARCHIVE
ensemblhost <- "https://jul2022.archive.ensembl.org"

killi_mart <- useMart(biomart = "ENSEMBL_MART_ENSEMBL", host = ensemblhost, dataset = "fheteroclitus_gene_ensembl")

ann <- getBM(filter="ensembl_gene_id",value=gids,attributes=c("ensembl_gene_id","description","transcript_length"),mart=killi_mart)

Am I doing something wrong here?

here's the results of sessionInfo()

R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.2.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] biomaRt_2.52.0              goseq_1.48.0                geneLenDataBase_1.32.0     
 [4] BiasedUrn_1.07              ashr_2.2-54                 ggrepel_0.9.1              
 [7] forcats_0.5.2               stringr_1.4.1               dplyr_1.0.9                
[10] purrr_0.3.4                 readr_2.1.2                 tidyr_1.2.0                
[13] tibble_3.1.8                ggplot2_3.3.6               tidyverse_1.3.2            
[16] pheatmap_1.0.12             apeglm_1.18.0               DESeq2_1.36.0              
[19] SummarizedExperiment_1.26.1 Biobase_2.56.0              MatrixGenerics_1.8.1       
[22] matrixStats_0.62.0          GenomicRanges_1.48.0        GenomeInfoDb_1.32.3        
[25] IRanges_2.30.1              S4Vectors_0.34.0            BiocGenerics_0.42.0        

loaded via a namespace (and not attached):
  [1] googledrive_2.0.0        colorspace_2.0-3         rjson_0.2.21             ellipsis_0.3.2          
  [5] XVector_0.36.0           fs_1.5.2                 rstudioapi_0.13          farver_2.1.1            
  [9] bit64_4.0.5              AnnotationDbi_1.58.0     fansi_1.0.3              mvtnorm_1.1-3           
 [13] lubridate_1.8.0          xml2_1.3.3               codetools_0.2-18         splines_4.2.1           
 [17] cachem_1.0.6             geneplotter_1.74.0       jsonlite_1.8.0           Rsamtools_2.12.0        
 [21] broom_1.0.0              annotate_1.74.0          GO.db_3.15.0             dbplyr_2.2.1            
 [25] png_0.1-7                compiler_4.2.1           httr_1.4.4               backports_1.4.1         
 [29] assertthat_0.2.1         Matrix_1.4-1             fastmap_1.1.0            gargle_1.2.0            
 [33] cli_3.3.0                prettyunits_1.1.1        tools_4.2.1              coda_0.19-4             
 [37] gtable_0.3.0             glue_1.6.2               GenomeInfoDbData_1.2.8   rappdirs_0.3.3          
 [41] Rcpp_1.0.9               bbmle_1.0.25             cellranger_1.1.0         vctrs_0.4.1             
 [45] Biostrings_2.64.1        nlme_3.1-159             rtracklayer_1.56.1       rvest_1.0.3             
 [49] lifecycle_1.0.1          irlba_2.3.5              restfulr_0.0.15          XML_3.99-0.10           
 [53] googlesheets4_1.0.1      zlibbioc_1.42.0          MASS_7.3-58.1            scales_1.2.1            
 [57] hms_1.1.2                parallel_4.2.1           RColorBrewer_1.1-3       curl_4.3.2              
 [61] yaml_2.3.5               memoise_2.0.1            emdbook_1.3.12           bdsmatrix_1.3-6         
 [65] stringi_1.7.8            RSQLite_2.2.16           SQUAREM_2021.1           genefilter_1.78.0       
 [69] BiocIO_1.6.0             filelock_1.0.2           GenomicFeatures_1.48.3   BiocParallel_1.30.3     
 [73] truncnorm_1.0-8          rlang_1.0.4              pkgconfig_2.0.3          bitops_1.0-7            
 [77] lattice_0.20-45          invgamma_1.1             labeling_0.4.2           GenomicAlignments_1.32.1
 [81] bit_4.0.4                tidyselect_1.1.2         plyr_1.8.7               magrittr_2.0.3          
 [85] R6_2.5.1                 generics_0.1.3           DelayedArray_0.22.0      DBI_1.1.3               
 [89] mgcv_1.8-40              pillar_1.8.1             haven_2.5.0              withr_2.5.0             
 [93] survival_3.4-0           KEGGREST_1.36.3          RCurl_1.98-1.8           mixsqp_0.3-43           
 [97] modelr_0.1.9             crayon_1.5.1             utf8_1.2.2               BiocFileCache_2.4.0     
[101] tzdb_0.3.0               progress_1.2.2           locfit_1.5-9.6           grid_4.2.1              
[105] readxl_1.4.1             blob_1.2.3               digest_0.6.29            reprex_2.0.2            
[109] xtable_1.8-4             numDeriv_2016.8-1.1      munsell_0.5.0
biomaRt • 160 views
ADD COMMENT
1
Entering edit mode
Mike Smith ★ 5.8k
@mike-smith
Last seen 7 hours ago
EMBL Heidelberg / de.NBI

You are correct that at the moment the 107 archive URL will give the same results as using as the current version without any arhive/version number.

What I suspect is happening is that when you specify www.ensembl.org your query actually gets reidrected to you local ensembl mirror (useast or uswest I'd guess) whereas when you specify the complete archive URL that resolves to the main Ensembl site. If there's something wrong with the local mirror you might expect to see different behaviours from these very similar queries. However, I'd expect a problem with the mirror to more likely result in a "page not found" type error, although maybe biomaRt is just doing a bad job of converying that.

You could try to prevent the redirection (if that's what's happening) by using useEnsembl() instead of useMart() and providing the mirror argument e.g.

killi_mart <- useEnsembl(biomart = "ENSEMBL_MART_ENSEMBL", dataset = "fheteroclitus_gene_ensembl", mirror = "www")
ADD COMMENT

Login before adding your answer.

Traffic: 447 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6