Hi,
I have to find the rsIDs of hundrends of thousands of SNPs. Since I am confident with R, I decided to test biomaRt package.
It helped me to find rsIDs of hundreds of SNPs using R locally. I had the following .txt file:
CHR chr_start chr_end
8 101592213 101592213
8 106973048 106973048
8 108690829 108690829
8 102569817 102569817
8 108580746 108580746
8 108681675 108681675
8 103044620 103044620
8 104152280 104152280
Then, I run the following commands:
library(biomaRt)
#creating Mart object
snp_mart <- useMart(biomart="ENSEMBL_MART_SNP", host="grch37.ensembl.org",
path="/biomart/martservice", dataset="hsapiens_snp")
## combine the positions into a single vector
ds4$position <- apply(ds4, 1, paste, collapse = ":")
I run the previous command to obtain the following output:
"8:101592213:101592213" "8:106973048:106973048" "8:108690829:108690829" "8:102569817:102569817"
"8:108580746:108580746" "8:108681675:108681675" "8:103044620:103044620" "8:104152280:104152280"
In this way, I could use chromosomal_region
as filter in the getBM() function. Then, I run the following commands:
rescue_rsid <- function(db) {
temp <- getBM(attributes = c('refsnp_id', 'allele', 'chrom_start'),
filters = 'chromosomal_region',
values = db$position,
mart = snp_mart)
return(temp)
}
temp <- rescue_rsid(ds4)
I obtained rsIDs of these SNPs with my R session (sessionInfo() is the following):
R Under development (unstable) (2021-02-23 r80032)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=Italian_Italy.1252 LC_CTYPE=Italian_Italy.1252 LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C LC_TIME=Italian_Italy.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] biomaRt_2.47.4
loaded via a namespace (and not attached):
[1] Rcpp_1.0.6 pillar_1.5.0 dbplyr_2.1.0 compiler_4.1.0 XVector_0.31.1 prettyunits_1.1.1 tools_4.1.0
[8] zlibbioc_1.37.0 progress_1.2.2 bit_4.0.4 tibble_3.0.6 BiocFileCache_1.15.1 RSQLite_2.2.3 memoise_2.0.0
[15] lifecycle_1.0.0 pkgconfig_2.0.3 png_0.1-7 rlang_0.4.10 DBI_1.1.1 rstudioapi_0.13 filelock_1.0.2
[22] curl_4.3 parallel_4.1.0 fastmap_1.1.0 withr_2.4.1 dplyr_1.0.4 httr_1.4.2 stringr_1.4.0
[29] xml2_1.3.2 rappdirs_0.3.3 generics_0.1.0 askpass_1.1 Biostrings_2.59.2 S4Vectors_0.29.7 vctrs_0.3.6
[36] IRanges_2.25.6 hms_1.0.0 tidyselect_1.1.0 stats4_4.1.0 bit64_4.0.5 glue_1.4.2 Biobase_2.51.0
[43] R6_2.5.0 fansi_0.4.2 AnnotationDbi_1.53.1 XML_3.99-0.5 purrr_0.3.4 blob_1.2.1 magrittr_2.0.1
[50] ellipsis_0.3.1 BiocGenerics_0.37.1 assertthat_0.2.1 KEGGREST_1.31.1 utf8_1.1.4 stringi_1.5.3 openssl_1.4.3
[57] cachem_1.0.4 crayon_1.4.1
However, when I try to run the same script on a server to handle much more SNPs, the output on my Putty shell is the following:
Error in curl::curl_fetch_memory(url, handle = handle) :
Timeout was reached: [grch37.ensembl.org:80] Operation timed out after 300001 milliseconds with 0 bytes received
My uname -a
output is:
Linux platonesrv1 5.4.0-66-generic #74-Ubuntu SMP Wed Jan 27 22:54:38 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
How could I solve the question of timed out with biomaRt?
Thank you!
Hi, Mike! I came across a mistake when I tried to replicate your example. I was wondering if you like to take a look at it. I followed your instructions and established the example data.frame, loaded the libraries, and set "genome" and "all_snps" correctly. The problem occured when I used
which returned
I had no idea what "line 1" referred to as well as how to fix this problem. I'd appreciate it if you could see to this problem. Thank you!