Hi,
I've to find rsID from chromosome positions and chromosome names of several SNPs. I've found biomaRt package in R to solve my problem.
I've the following .txt file:
CHR chr_start chr_end
8 101592213 101592213
8 106973048 106973048
8 108690829 108690829
8 102569817 102569817
8 108580746 108580746
I've loaded the biomaRt package and then I've created the Mart object with the following command:
snp_mart <- useMart(biomart="ENSEMBL_MART_SNP", host="grch37.ensembl.org",
+ path="/biomart/martservice", dataset="hsapiens_snp")
Then, I created the following function to get rsIDs, alleles, and chromosome positions:
> rescue_rsid <- function(db) {
+
+ temp <- getBM(attributes = c('refsnp_id', 'allele', 'chrom_start'),
+ filters = c('chr_name', 'start', 'end'),
+ values = list(db[,which(colnames(db) == "CHR")], db[,which(colnames(db) == "chr_start")], db[,which(colnames(db) == "chr_end")]),
+ mart = snp_mart)
+ return(temp)
+ }
Then, I want to apply the previous function to all my SNPs with the following command:
apply(ds4, 1, rescue_rsid)
where ds4 is the dataframe with my filters, but I found the following results in Rconsole:
> rescue_rsid(ds4)
Error in curl::curl_fetch_memory(url, handle = handle) :
Timeout was reached: [grch37.ensembl.org:80] Operation timed out after 300012 milliseconds with 12109 bytes received
Called from: curl::curl_fetch_memory(url, handle = handle)
Browse[1]> rescue_rsid(ds4[1,])
refsnp_id allele chrom_start
1 rs62513865 C/T 101592213
Browse[1]>
However, when I run the following command:
> rescue_rsid(ds4[1,])
Rconsole reported the following result:
refsnp_id allele chrom_start
1 rs62513865 C/T 101592213
So, this would mean that the function searches rsID but I don't understand why it doesn't work for all my SNPs...
Could someone help me? Thank u!
My R session:
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)
Matrix products: default
locale:
[1] LC_COLLATE=Italian_Italy.1252 LC_CTYPE=Italian_Italy.1252 LC_MONETARY=Italian_Italy.1252
[4] LC_NUMERIC=C LC_TIME=Italian_Italy.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] biomaRt_2.44.4
loaded via a namespace (and not attached):
[1] Rcpp_1.0.6 pillar_1.4.7 compiler_4.0.3 dbplyr_2.1.0 prettyunits_1.1.1
[6] tools_4.0.3 progress_1.2.2 bit_4.0.4 tibble_3.0.6 RSQLite_2.2.3
[11] memoise_2.0.0 BiocFileCache_1.12.1 lifecycle_1.0.0 pkgconfig_2.0.3 rlang_0.4.10
[16] DBI_1.1.1 curl_4.3 parallel_4.0.3 fastmap_1.1.0 withr_2.4.1
[21] stringr_1.4.0 httr_1.4.2 dplyr_1.0.4 xml2_1.3.2 rappdirs_0.3.3
[26] generics_0.1.0 S4Vectors_0.26.1 vctrs_0.3.6 askpass_1.1 IRanges_2.22.2
[31] hms_1.0.0 tidyselect_1.1.0 stats4_4.0.3 bit64_4.0.5 glue_1.4.2
[36] Biobase_2.48.0 R6_2.5.0 AnnotationDbi_1.50.3 XML_3.99-0.5 purrr_0.3.4
[41] blob_1.2.1 magrittr_2.0.1 ellipsis_0.3.1 BiocGenerics_0.34.0 assertthat_0.2.1
[46] stringi_1.5.3 openssl_1.4.3 cachem_1.0.3 crayon_1.4.1