Entering edit mode
Dear BioC community,
I recently updated my R installation as well as all my BioC packages, including biomaRt.
While before updated, the code bellow (for microarray probe annotation, 40k probes) ran in less than one minutes, it now takes more than half an hour.
library(biomaRt) probes <- row.names(data_rma_ets1_f) # Connexion to BioMart hg19 (aka GRCh37) ensembl <- useMart(biomart="ENSEMBL_MART_ENSEMBL", host="grch37.ensembl.org", path="/biomart/martservice", dataset="hsapiens_gene_ensembl") # Querying Biomart to map probe names to various features annotation_ensembl <- getBM(attributes = c("affy_hta_2_0", "ensembl_gene_id", "chromosome_name", "start_position", "end_position", "strand", "entrezgene", "hgnc_symbol"), filters = "affy_hta_2_0", values = gsub("\\.1", "", probes), mart = ensembl) Batch submitting query [==-----------------------------------------------------------------------------------------------] 2% eta: 40m > sessionInfo() R version 3.4.4 (2018-03-15) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS High Sierra 10.13.5 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib locale: [1] fr_FR.UTF-8/fr_FR.UTF-8/fr_FR.UTF-8/C/fr_FR.UTF-8/fr_FR.UTF-8 attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] rtracklayer_1.38.3 EnsDb.Hsapiens.v75_2.99.0 ensembldb_2.2.2 [4] AnnotationFilter_1.2.0 TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 GenomicFeatures_1.30.3 [7] AnnotationDbi_1.40.0 GenomicRanges_1.30.3 GenomeInfoDb_1.14.0 [10] biomaRt_2.36.1 pd.hta.2.0_3.12.2 DBI_1.0.0 [13] RSQLite_2.1.1 oligo_1.42.0 Biostrings_2.46.0 [16] XVector_0.18.0 IRanges_2.12.0 S4Vectors_0.16.0 [19] Biobase_2.38.0 oligoClasses_1.40.0 BiocGenerics_0.24.0 loaded via a namespace (and not attached): [1] Rcpp_0.12.17 lattice_0.20-35 prettyunits_1.0.2 Rsamtools_1.30.0 [5] assertthat_0.2.0 digest_0.6.15 foreach_1.4.4 mime_0.5 [9] R6_2.2.2 httr_1.3.1 BiocInstaller_1.28.0 zlibbioc_1.24.0 [13] progress_1.1.2 curl_3.2 lazyeval_0.2.1 blob_1.1.1 [17] Matrix_1.2-14 preprocessCore_1.40.0 splines_3.4.4 RMySQL_0.10.15 [21] BiocParallel_1.12.0 AnnotationHub_2.10.1 stringr_1.3.1 ProtGenerics_1.10.0 [25] RCurl_1.95-4.10 bit_1.1-14 shiny_1.1.0 DelayedArray_0.4.1 [29] compiler_3.4.4 httpuv_1.4.3 pkgconfig_2.0.1 htmltools_0.3.6 [33] SummarizedExperiment_1.8.1 GenomeInfoDbData_1.0.0 interactiveDisplayBase_1.16.0 ff_2.2-14 [37] codetools_0.2-15 matrixStats_0.53.1 XML_3.98-1.11 later_0.7.2 [41] GenomicAlignments_1.14.2 bitops_1.0-6 grid_3.4.4 xtable_1.8-2 [45] magrittr_1.5 stringi_1.2.2 promises_1.0.1 affyio_1.48.0 [49] iterators_1.0.9 tools_3.4.4 bit64_0.9-7 yaml_2.1.19 [53] memoise_1.1.0 affxparser_1.50.0
Is there anything I can do to make the annotation process faster with the last release of biomaRt ?
Many thanks for your advices !
Cheers !
-Pef-
How old was your previous version? Did it used to print the 'batch submitting' message? The previous versions that didn't use the batch submission method where faster, but had a tendency to silently drop results when you queried for more than 500 values at a time.