Question: getBM in loop
0
gravatar for Tiphaine Martin
21 months ago by
France
Tiphaine Martin40 wrote:

Hi,

Could you advise me how to run "getBM" in a loop when the connection is too long and ask to rerun until I have a results ?

I see that there is a option called curl, but I never used. So I don't know how to use it.

 

library(biomaRt)

#connexion to ENSEMBL

mart <- useMart(biomart = "ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl")

#Extract the list of hgcn_symbol and gene_biotype for each ensembl_gene_id in my list
hgnc <- getBM(attributes=c('ensembl_gene_id','hgnc_symbol','gene_biotype'),
                   filters = 'ensembl_gene_id', values = mat$ensembl_gene_id, mart = mart)
Batch submitting query [=====================================================--------]  88% eta: 11sError in curl::curl_fetch_memory(url, handle = handle) :
  Timeout was reached: Connection timed out after 10003 milliseconds

 

> sessionInfo()

R version 3.4.3 (2017-11-30)

Platform: x86_64-apple-darwin15.6.0 (64-bit)

Running under: macOS High Sierra 10.13.2


Matrix products: default

BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib

LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib


locale:

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8


attached base packages:

[1] stats     graphics  grDevices utils     datasets  methods   base     


other attached packages:

[1] biomaRt_2.34.1 edgeR_3.20.3   limma_3.34.5


loaded via a namespace (and not attached):

[1] Rcpp_0.12.14         AnnotationDbi_1.40.0 magrittr_1.5        

[4] BiocGenerics_0.24.0  progress_1.1.2       IRanges_2.12.0      

[7] bit_1.1-12           lattice_0.20-35      R6_2.2.2            

[10] rlang_0.1.6          httr_1.3.1           stringr_1.2.0       

[13] blob_1.1.0           tools_3.4.3          parallel_3.4.3      

[16] grid_3.4.3           Biobase_2.38.0       DBI_0.7             

[19] assertthat_0.2.0     bit64_0.9-7          digest_0.6.13       

[22] tibble_1.4.1         S4Vectors_0.16.0     bitops_1.0-6        

[25] RCurl_1.95-4.9       memoise_1.1.0        RSQLite_2.0         

[28] stringi_1.1.6        compiler_3.4.3       pillar_1.0.1        

[31] prettyunits_1.0.2    stats4_3.4.3         XML_3.98-1.9        

[34] locfit_1.5-9.1      

 

biomart ensembl curl • 1.1k views
ADD COMMENTlink modified 20 months ago by Mike Smith3.9k • written 21 months ago by Tiphaine Martin40

I can't reproduce this error at the moment, but I suspect the cause was introduced a few weeks ago when I made some modification necessitated by changes to Ensembl.  I'll take a closer look and try to work around it.

ADD REPLYlink written 20 months ago by Mike Smith3.9k
Answer: getBM in loop
0
gravatar for James W. MacDonald
21 months ago by
United States
James W. MacDonald51k wrote:

I have never seen the message about batch submitting query, but you shouldn't need to do a loop anyway. I can get over 61,000 IDs mapped in maybe 20 seconds in one go.

> ensembl <- keys(EnsDb.Hsapiens.v79, keytype = "GENEID")
> head(ensembl)
[1] "ENSG00000000003" "ENSG00000000005" "ENSG00000000419" "ENSG00000000457"
[5] "ENSG00000000460" "ENSG00000000938"
> length(ensembl)
[1] 65774
> dat <- getBM(c('ensembl_gene_id','hgnc_symbol','gene_biotype'), "ensembl_gene_id", ensembl, mart)
> dim(dat)
[1] 61351     3
> head(dat)
  ensembl_gene_id hgnc_symbol   gene_biotype
1 ENSG00000000003      TSPAN6 protein_coding
2 ENSG00000000005        TNMD protein_coding
3 ENSG00000000419        DPM1 protein_coding
4 ENSG00000000457       SCYL3 protein_coding
5 ENSG00000000460    C1orf112 protein_coding
6 ENSG00000000938         FGR protein_coding

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 16299)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] EnsDb.Hsapiens.v79_2.99.0 ensembldb_2.2.0          
 [3] AnnotationFilter_1.2.0    GenomicFeatures_1.30.0   
 [5] GenomicRanges_1.30.0      GenomeInfoDb_1.14.0      
 [7] org.Hs.eg.db_3.5.0        AnnotationDbi_1.40.0     
 [9] IRanges_2.12.0            S4Vectors_0.16.0         
[11] Biobase_2.38.0            BiocGenerics_0.24.0      
[13] biomaRt_2.34.1           

 

ADD COMMENTlink written 21 months ago by James W. MacDonald51k

I made the request of more than 20,000 genes per tissue and run this request for several tissues. sorry this remark was not very clear.

I don't make a loop for each genes but by tissue. Some of these request fail. I asked to rerun and for the tissue failling and for the following tissues, manually, but I would like to have this automatically.

tiphaine

ADD REPLYlink written 21 months ago by Tiphaine Martin40

The batch query message is something I've added in the latest release of biomaRt.  Ensembl BioMart can have some issues if the list of filter values is large, biomaRt now internally chunks them into batches of 500 and submits them sequentially.  This can take a little longer, so it prints the progress bar so you know something's happening.

ADD REPLYlink written 20 months ago by Mike Smith3.9k
Answer: getBM in loop
0
gravatar for Mike Smith
20 months ago by
Mike Smith3.9k
EMBL Heidelberg / de.NBI
Mike Smith3.9k wrote:

I've modified the default timeout to 30 seconds instead of 10.  Without being able to run the code examples on your exact setup it's hard to know if this will be sufficient, but fingers crossed.  Please report back here if it still fails, we may need to do a more extensive diagnosis.

This update is available in biomaRt version 2.35.5.  It'll appear in the devel branch shortly, but you can get the updated version immediately using:

BiocInstaller::biocLite('grimbough/biomaRt')
ADD COMMENTlink written 20 months ago by Mike Smith3.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 106 users visited in the last hour