The query to the BioMart webservice returned an invalid result: biomaRt expected a character string of length 1
1
1
Entering edit mode
steppydeklin ▴ 10
@steppydeklin-9271
Last seen 5.0 years ago
European Union

Hello,

I am trying to download all the annotations from Ensembl using biomaRt, using the following code:

ensembl = useMart(biomart="ENSEMBL_MART_ENSEMBL", dataset = "mmusculus_gene_ensembl")

inputNames = read.table("/Volumes/project_svincent/raw_data/Ensembl_95_mm10_GRCm38p6.tsv",header=TRUE,sep="\t", fill=TRUE,quote="\"",stringsAsFactors = FALSE)$Gene.stable.ID

attributes_protein_coding = c("ensembl_gene_id",
           "ucsc",
           "external_gene_name",
           "chromosome_name",
           "strand",
           "start_position",
           "end_position",
           "ensembl_transcript_id",
           "transcription_start_site",
           "ensembl_exon_id",
           "refseq_mrna",
           "refseq_mrna_predicted")

ensembl_protein_coding = dplyr::tbl_df(getBM(attributes = attributes_protein_coding, filters = "ensembl_gene_id", values = inputNames, mart = ensembl))

I recover this error message... Does anyone has an idea about what is wrong?

> ensemblproteincoding = dplyr::tbldf(getBM(attributes = attributesproteincoding, filters = "ensemblgeneid", values = inputNames, mart = ensembl)) Batch submitting query [==========>--------------------------------------------------------------------------------------------------------------------------------------------] 7% eta: 22mError in getBM(attributes = attributesproteincoding, filters = "ensemblgene_id", : The query to the BioMart webservice returned an invalid result: biomaRt expected a character string of length 1. Please report this on the support site at http://support.bioconductor.org

Thanks Stéphane

biomaRt error • 1.0k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 3 hours ago
United States

Mike Smith will probably be along in a bit with a direct answer, but I wonder if you are sort of duplicating work that has already been done by Johannes Rainier, when he builds the EnsDb packages:

> library(AnnotationHub)

> hub <- AnnotationHub()
> query(hub, c("mus musculus","ensdb"))
AnnotationHub with 8 records
# snapshotDate(): 2018-10-24 
# $dataprovider: Ensembl
# $species: Mus musculus
# $rdataclass: EnsDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH53222"]]' 

            title                            
  AH53222 | Ensembl 87 EnsDb for Mus Musculus
  AH53726 | Ensembl 88 EnsDb for Mus Musculus
  AH56691 | Ensembl 89 EnsDb for Mus Musculus
  AH57770 | Ensembl 90 EnsDb for Mus Musculus
  AH60788 | Ensembl 91 EnsDb for Mus Musculus
  AH60992 | Ensembl 92 EnsDb for Mus Musculus
  AH64461 | Ensembl 93 EnsDb for Mus Musculus
  AH64944 | Ensembl 94 EnsDb for Mus musculus
> musdb <- hub[["AH64944"]]
downloading 1 resources
retrieving 1 resource
  |======================================================================| 100%


> gns <- transcriptsBy(musdb)

> gns
GRangesList object of length 55341:
$ENSMUSG00000000001 
GRanges object with 1 range and 8 metadata columns:
      seqnames              ranges strand |              tx_id     tx_biotype
         <Rle>           <IRanges>  <Rle> |        <character>    <character>
  [1]        3 108107280-108146146      - | ENSMUST00000000001 protein_coding
      tx_cds_seq_start tx_cds_seq_end            gene_id tx_support_level
             <integer>      <integer>        <character>        <integer>
  [1]        108109422      108146005 ENSMUSG00000000001                1
             tx_id_version            tx_name
               <character>        <character>
  [1] ENSMUST00000000001.4 ENSMUST00000000001

$ENSMUSG00000000003 
GRanges object with 2 ranges and 8 metadata columns:
      seqnames            ranges strand |              tx_id     tx_biotype
  [1]        X 77837901-77853623      - | ENSMUST00000000003 protein_coding
  [2]        X 77837902-77853530      - | ENSMUST00000114041 protein_coding
      tx_cds_seq_start tx_cds_seq_end            gene_id tx_support_level
  [1]         77841883       77853483 ENSMUSG00000000003                1
  [2]         77841883       77853483 ENSMUSG00000000003                2
              tx_id_version            tx_name
  [1] ENSMUST00000000003.13 ENSMUST00000000003
  [2]  ENSMUST00000114041.2 ENSMUST00000114041

$ENSMUSG00000000028 
GRanges object with 4 ranges and 8 metadata columns:
      seqnames            ranges strand |              tx_id      tx_biotype
  [1]       16 18807356-18811987      - | ENSMUST00000115585  protein_coding
  [2]       16 18780447-18811972      - | ENSMUST00000000028  protein_coding
  [3]       16 18780453-18811626      - | ENSMUST00000096990  protein_coding
  [4]       16 18810108-18811591      - | ENSMUST00000231819 retained_intron
      tx_cds_seq_start tx_cds_seq_end            gene_id tx_support_level
  [1]         18807356       18811565 ENSMUSG00000000028                2
  [2]         18781898       18811565 ENSMUSG00000000028                1
  [3]         18781898       18811565 ENSMUSG00000000028                1
  [4]             <NA>           <NA> ENSMUSG00000000028             <NA>
              tx_id_version            tx_name
  [1]  ENSMUST00000115585.1 ENSMUST00000115585
  [2] ENSMUST00000000028.13 ENSMUST00000000028
  [3]  ENSMUST00000096990.9 ENSMUST00000096990
  [4]  ENSMUST00000231819.1 ENSMUST00000231819

...
<55338 more elements>
-------
seqinfo: 117 sequences from GRCm38 genome

Which has pretty much everything but the mappings to NCBI IDs, which I would argue is a non-trivial exercise, given the differences between NCBI and EBI/EMBL.

And if you want to do some tidyverse sorcery on the results, you can always unlist that GRangesList, or convert to a DataFrame or a data.frame or (shudders) a tibble.

OR if you just wanted a DB to make queries on, you can always make direct SQL queries on the underlying SQLite DB:

> DBI::dbListTables(dbconn(musdb))
 [1] "chromosome"     "entrezgene"     "exon"           "gene"          
 [5] "metadata"       "protein"        "protein_domain" "tx"            
 [9] "tx2exon"        "uniprot" 

ADD COMMENT

Login before adding your answer.

Traffic: 779 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6