biomaRt::getBM fails with "Error in scan"
1
0
Entering edit mode
eric.blanc • 0
@ericblanc-11613
Last seen 3.7 years ago

Hi,

I'm using biomaRt to extract from ENSEMBL amino-acid sequences together with the APPRIS transcript status. I am using the getBM function, as I couldn't find a way to add the APPRIS status using getSequence. However, I get an error when I request (beside the sequence) both the SwissProt ID & the APPRIS status, but not when I request only one of them.

I have an error with:

mart <- biomaRt::useMart(biomart="ENSEMBL_MART_ENSEMBL", 
                         dataset="mmusculus_gene_ensembl", 
                         host="www.ensembl.org")

biomaRt::getBM(values="Q80YE7", 
               filters="uniprot_gn", 
               attributes=c("uniprotswissprot", "peptide", "transcript_appris"), 
               mart=mart)
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
  line 2 did not have 3 elements

but not with (edited output):

> biomaRt::getBM(values="Q80YE7", 
                 filters="uniprot_gn", 
                 attributes=c("uniprotswissprot", "peptide"), 
                 mart=mart)
  uniprotswissprot                     peptide
1           Q80YE7  MTVFRQENVDDYYDTGEELGSGQ...
2                             XYENKTDVILILELR*
3           Q80YE7  MTVFRQENVDDYYDTGEELGSGQ...

> biomaRt::getBM(values="Q80YE7", 
                 filters="uniprot_gn", 
                 attributes=c("peptide", "transcript_appris"), 
                 mart=mart)
                      peptide  transcript_appris
1  MTVFRQENVDDYYDTGEELGSGQ...         principal1
2        Sequence unavailable                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
3             XYENKTDVILILELR*                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
4  MTVFRQENVDDYYDTGEELGSGQ...

> biomaRt::getBM(values="Q80YE7", 
+                filters="uniprot_gn", 
+                attributes=c("uniprotswissprot", "transcript_appris"), 
+                mart=mart)
  uniprotswissprot transcript_appris
1           Q80YE7                  
2           Q80YE7        principal1
3                                   

I am not quite sure what to do next, any help is appreciated.

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS

Matrix products: default
BLAS: /home/eblanc/R/R-3.5.1/lib/libRblas.so
LAPACK: /home/eblanc/R/R-3.5.1/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.18         AnnotationDbi_1.42.1 magrittr_1.5        
 [4] BiocGenerics_0.26.0  hms_0.4.2            progress_1.2.0      
 [7] IRanges_2.14.12      bit_1.1-14           R6_2.2.2            
[10] rlang_0.2.2          httr_1.3.1           stringr_1.3.1       
[13] blob_1.1.1           tools_3.5.1          parallel_3.5.1      
[16] Biobase_2.40.0       DBI_1.0.0            bit64_0.9-7         
[19] digest_0.6.17        assertthat_0.2.0     crayon_1.3.4        
[22] S4Vectors_0.18.3     bitops_1.0-6         curl_3.2            
[25] RCurl_1.95-4.11      biomaRt_2.36.1       memoise_1.1.0       
[28] RSQLite_2.1.1        stringi_1.2.4        compiler_3.5.1      
[31] prettyunits_1.0.2    stats4_3.5.1         XML_3.98-1.16       
[34] pkgconfig_2.0.2     

biomaRt • 783 views
ADD COMMENT
0
Entering edit mode
Mike Smith ★ 6.5k
@mike-smith
Last seen 30 minutes ago
EMBL Heidelberg

You can't get the APPRIS status via getSequence() as it's on a different BioMart 'Page' to the sequence information. The web interface would prevent you from choosing these two options together, but biomaRt doesn't make this explicit. However, you can check the pages with the following:

> searchAttributes(mart, 'appris')
                name       description         page
20 transcript_appris APPRIS annotation feature_page
> searchAttributes(mart, '^peptide$')
        name description      page
2621 peptide     Peptide sequences

The error in scan() you're experiencing is because biomaRt modifies the FASTA format Ensembl sends for sequence information, and tries to make it into a data.frame. However, at least one of the results is missing a Uniprot/Swissprot ID and biomaRt isn't handling this very well.

My suggestion would be to submit two queries; one for the peptide sequence and one for the Uniprot & APPRIS information. In both queries we'll also ask for the Ensembl transcript ID. We're doing this because BioMart doesn't guarantee the order results are returned, so we want a unique value for each row so we can match our two sets of results back to each other. Ensembl BioMart is transcript-centric, so it's really rare for an query result not to have a unique transcript ID, making this a good choice to merge the results tables.

q1 <- biomaRt::getBM(values="Q80YE7", 
               filters="uniprot_gn", 
               attributes=c("ensembl_transcript_id",  "peptide" ), 
               mart=mart)

q2 <- biomaRt::getBM(values="Q80YE7", 
               filters="uniprot_gn", 
               attributes=c("ensembl_transcript_id", "uniprotswissprot",  "transcript_appris"), 
               mart=mart)

You can then join the two result using the common transcript ID column (and remove that if you like).

dplyr::inner_join(q1, q2, by = "ensembl_transcript_id" ) %>% 
  dplyr::select(-ensembl_transcript_id) %>%
  dplyr::as_tibble()

# A tibble: 4 x 3
  peptide                                          uniprotswissprot transcript_appr…
  <chr>                                            <chr>            <chr>           
1 MTVFRQENVDDYYDTGEELGSGQFAVVKKCREKSTGLQYAAKFIKKR… Q80YE7           principal1      
2 MTVFRQENVDDYYDTGEELGSGQFAVVKKCREKSTGLQYAAKFIKKR… Q80YE7           principal1      
3 XYENKTDVILILELR*                                 ""               ""              
4 MTVFRQENVDDYYDTGEELGSGQFAVVKKCREKSTGLQYAAKFIKKR… Q80YE7           ""              
ADD COMMENT

Login before adding your answer.

Traffic: 710 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6