Question

biomaRt asked to report: The number of columns in the result table does not equal the number of attributes in the query.

0

Entering edit mode

daniil.sarkisyan ▴ 10

@daniilsarkisyan-7626

Last seen 5.6 years ago

Sweden

As requested by biomaRt, I am reporting the issue.

library(biomaRt)
ensembl_mart <- useEnsembl(biomart = "ensembl", 
                   dataset = "hsapiens_gene_ensembl")
attributes <- biomaRt::searchAttributes(mart = ensembl_mart, "coding|cds")
attributes <- attributes[attributes$page=="sequences","name"]
attributes
## [1] "coding_transcript_flank" "coding_gene_flank"       "coding"                  "cdna_coding_start"       "cdna_coding_end"        
## [6] "cds_length"              "cds_start"               "cds_end"                 "cdna_coding_start"       "cdna_coding_end"        
##[11] "genomic_coding_start"    "genomic_coding_end"  

result <- biomaRt::getBM(attributes = attributes, filters = c("ensembl_transcript_id"), values = "ENST00000217305", mart = ensembl_mart)
## NULL
## Error in .processResults(postRes, mart = mart, sep = sep, fullXmlQuery = fullXmlQuery,  : 
##  The query to the BioMart webservice returned an invalid result.
## The number of columns in the result table does not equal the number of attributes in the query.
## Please report this on the support site at http://support.bioconductor.org

sessionInfo()
# R version 4.0.0 (2020-04-24)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 7 x64 (build 7601) Service Pack 1
# 
# Matrix products: default
# 
# locale:
#   [1] LC_COLLATE=Swedish_Sweden.1252  LC_CTYPE=Swedish_Sweden.1252    LC_MONETARY=Swedish_Sweden.1252 LC_NUMERIC=C                    LC_TIME=Swedish_Sweden.1252    
# 
# attached base packages:
#   [1] stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
#   [1] biomaRt_2.44.0
# 
# loaded via a namespace (and not attached):
#  [1] Rcpp_1.0.4.6         compiler_4.0.0       pillar_1.4.4         dbplyr_1.4.4         prettyunits_1.1.1    tools_4.0.0          progress_1.2.2       digest_0.6.25        bit_1.1-15.2         RSQLite_2.2.0        memoise_1.1.0       
# [12] BiocFileCache_1.12.0 tibble_3.0.1         lifecycle_0.2.0      pkgconfig_2.0.3      rlang_0.4.6          DBI_1.1.0            curl_4.3             parallel_4.0.0       stringr_1.4.0        httr_1.4.1           dplyr_1.0.0         
# [23] rappdirs_0.3.1       generics_0.0.2       S4Vectors_0.26.1     vctrs_0.3.1          askpass_1.1          IRanges_2.22.2       hms_0.5.3            tidyselect_1.1.0     stats4_4.0.0         bit64_0.9-7          glue_1.4.1          
# [34] Biobase_2.48.0       R6_2.4.1             AnnotationDbi_1.50.0 XML_3.99-0.3         purrr_0.3.4          blob_1.2.1           magrittr_1.5         ellipsis_0.3.1       BiocGenerics_0.34.0  assertthat_0.2.1     stringi_1.4.6       
# [45] openssl_1.4.1        crayon_1.3.4

biomart • 1.2k views

ADD COMMENT • link updated 5.6 years ago by James W. MacDonald 68k • written 5.6 years ago by daniil.sarkisyan ▴ 10

score 0 · Answer 1 · 2020-06-12

Usually when I get something like this, I try removing things to see if I can find the culprit. Or maybe just try one at a time. And if I do that, I see

> biomaRt::getBM(attributes = attributes[1], filters = c("ensembl_transcript_id"), values = "ENST00000217305", mart = ensembl_mart)
                                                                                                                          coding_transcript_flank
1 Query ERROR: caught BioMart::Exception::Usage: Requests for flank sequence must be accompanied by an upstream_flank or downstream_flank request

Which you can try to figure out by going to the ensembl.org website and playing around with the sequences attributes page. If you put in a query there is a URL button at the top that you can click to get the query URL. That helped me a bit, but more informative was ?getSequence, and looking at the code. Which led me to this:

> biomaRt::getBM(attributes = c("coding_transcript_flank","ensembl_transcript_id"), filters = c("ensembl_transcript_id", "upstream_flank"), values = list("ENST00000217305", 40), mart = ensembl_mart, checkFilters = FALSE)
                   coding_transcript_flank ensembl_transcript_id
1 CTTCTCTTTCTTCCTCCCCAGCAGGAATTGCTGAGACAGG       ENST00000217305

Where you should note two things - you should ALWAYS use the primary filter ID as an attribute as well. Because you get results back in random order, so you need a way to map results to the incoming ID you used. And you specify the upstream_flank (or downstream_flank, but not both) as part of the filters and the flank length as part of the values. Playing around, this is what I can get

> getBM(attributes[-(1:2)], c("ensembl_transcript_id"), list("ENSt00000217305"), ensembl_mart)
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         coding
1 ATGGCCTGGCAGGGGCTGGTCCTGGCTGCCTGCCTCCTCATGTTCCCCTCCACCACAGCGGACTGCCTGTCGCGGTGCTCCTTGTGTGCTGTAAAGACCCAGGATGGTCCCAAACCTATCAATCCCCTGATTTGCTCCCTGCAATGCCAGGCTGCCCTGCTGCCCTCTGAGGAATGGGAGAGATGCCAGAGCTTTCTGTCTTTTTTCACCCCCTCCACCCTTGGGCTCAATGACAAGGAGGACTTGGGGAGCAAGTCGGTTGGGGAAGGGCCCTACAGTGAGCTGGCCAAGCTCTCTGGGTCATTCCTGAAGGAGCTGGAGAAAAGCAAGTTTCTCCCAAGTATCTCAACAAAGGAGAACACTCTGAGCAAGAGCCTGGAGGAGAAGCTCAGGGGTCTCTCTGACGGGTTTAGGGAGGGAGCAGAGTCTGAGCTGATGAGGGATGCCCAGCTGAACGATGGTGCCATGGAGACTGGCACACTCTATCTCGCTGAGGAGGACCCCAAGGAGCAGGTCAAACGCTATGGGGGCTTTTTGCGCAAATACCCCAAGAGGAGCTCAGAGGTGGCTGGGGAGGGGGACGGGGATAGCATGGGCCATGAGGACCTGTACAAACGCTATGGGGGCTTCTTGCGGCGCATTCGTCCCAAGCTCAAGTGGGACAACCAGAAGCGCTATGGCGGTTTTCTCCGGCGCCAGTTCAAGGTGGTGACTCGGTCTCAGGAAGATCCGAATGCTTACTCTGGAGAGCTTTTTGATGCATAA
  cdna_coding_start cdna_coding_end cds_length cds_start cds_end
1           357;228         992;356        765     130;1 765;129
  cdna_coding_start.1 cdna_coding_end.1 genomic_coding_start genomic_coding_end
1             357;228           992;356      1980323;1982956    1980958;1983084

> biomaRt::getBM(attributes = c("gene_flank","ensembl_transcript_id"), filters = c("ensembl_transcript_id", "upstream_flank"), values = list("ENST00000217305", 40), mart = ensembl_mart, checkFilters = FALSE)
                                gene_flank ensembl_transcript_id
1 GCTCTCGTCCATAAAAGGGGGGAAGAGGCACCAGAACTGC       ENST00000217305

> biomaRt::getBM(attributes = c("coding_transcript_flank","ensembl_transcript_id"), filters = c("ensembl_transcript_id", "upstream_flank"), values = list("ENST00000217305", 40), mart = ensembl_mart, checkFilters = FALSE)
                   coding_transcript_flank ensembl_transcript_id
1 CTTCTCTTTCTTCCTCCCCAGCAGGAATTGCTGAGACAGG       ENST00000217305

But it requires separate calls, and coding_gene_flank doesn't work.