Hi,
I am currently doing a study where we would like to focus on the synonymous snps of a set of 35 genes. I have their ensembl gene IDs, but when I query them, the timeout is reached. This happens even if I query them one at a time.
I need gene ID, Chromsomome name, position, and consequence type (to get synonymous snps). refSNP ID is a helpful as well.
What can I do differently to get this information? Or what am I doing wrong?
Thank you,
-Damien
library("biomaRt")
grch37_snp = useEnsembl(biomart="ENSEMBL_MART_SNP",dataset="hsapiens_snp",host="https://grch37.ensembl.org", GRCh = 37)
snp_attributes <- c("ensembl_gene_stable_id","chr_name","chrom_start","chrom_end","consequence_type_tv","refsnp_id","ensembl_peptide_allele")
snp_annotate <- data.frame(matrix(ncol = 8, nrow = 0))
colnames(snp_annotate) <- snp_attributes
engids <- c("ENSG00000142208", "ENSG00000149311", "ENSG00000138376", "ENSG00000012048" ,"ENSG00000139618" ,"ENSG00000158019", "ENSG00000136492") #etc there are more, just a few here for example
for (i in engids) {
print(i)
vls <- getBM(attributes = snp_attributes, values = i,filters = "ensembl_gene" ,mart = grch37_snp)
snp_annotate <- rbind(snp_annotate,vls)
Sys.sleep(5)
}
[1] "ENSG00000142208"
[1] "ENSG00000149311"
[1] "ENSG00000138376"
Error in curl::curl_fetch_memory(url, handle = handle) :
Timeout was reached: [grch37.ensembl.org:443] Operation timed out after 300000 milliseconds with 311013 bytes received
sessionInfo( )
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=sv_SE.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=sv_SE.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=sv_SE.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=sv_SE.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] biomaRt_2.54.0
loaded via a namespace (and not attached):
[1] KEGGREST_1.38.0 progress_1.2.2 tidyselect_1.2.0 purrr_1.0.1 vctrs_0.5.2 generics_0.1.3
[7] stats4_4.2.1 BiocFileCache_2.6.1 utf8_1.2.3 blob_1.2.3 XML_3.99-0.13 rlang_1.0.6
[13] pillar_1.8.1 withr_2.5.0 glue_1.6.2 DBI_1.1.3 BiocParallel_1.32.5 rappdirs_0.3.3
[19] BiocGenerics_0.44.0 bit64_4.0.5 dbplyr_2.3.1 GenomeInfoDbData_1.2.9 lifecycle_1.0.3 stringr_1.5.0
[25] zlibbioc_1.44.0 Biostrings_2.66.0 codetools_0.2-19 memoise_2.0.1 Biobase_2.58.0 IRanges_2.32.0
[31] fastmap_1.1.1 GenomeInfoDb_1.34.9 parallel_4.2.1 curl_5.0.0 AnnotationDbi_1.60.2 fansi_1.0.4
[37] Rcpp_1.0.10 filelock_1.0.2 cachem_1.0.7 S4Vectors_0.36.2 XVector_0.38.0 bit_4.0.5
[43] hms_1.1.2 png_0.1-8 digest_0.6.31 stringi_1.7.12 dplyr_1.1.0 cli_3.6.0
[49] tools_4.2.1 bitops_1.0-7 magrittr_2.0.3 RCurl_1.98-1.10 RSQLite_2.3.0 tibble_3.2.0
[55] crayon_1.5.2 pkgconfig_2.0.3 ellipsis_0.3.2 xml2_1.3.3 prettyunits_1.1.1 httr_1.4.5
[61] rstudioapi_0.14 R6_2.5.1 compiler_4.2.1
Thank you for your response.
How can I request for biomaRt to retrieve only synonymous variants? It does not appear to be an option for filters.
-Damien