extract rsids of SNPs using their genomic positions
3
0
Entering edit mode
@99e71656
Last seen 9 months ago
Spain

I need to get the rsid of SNPs having their genome positions as columns in a text file under the names "chromosome", "start, "end". "start" and "end" has equal values corresponding to the genomic position of a SNP from each row. What package and functions should I use in my case?


# include your problematic code here with any corresponding output
BiocManager::install("SNPlocs.Hsapiens.dbSNP150.GRCh38")
library(SNPlocs.Hsapiens.dbSNP150.GRCh38)
data <- read.table("10_79854257_rsid.txt", header = TRUE)
gr <- GRanges(seqnames = Rle(data$chromosome),
              ranges = IRanges(start = data$start, end = data$end))
rsids <- findGRanges(gr, columns = "rsid")
data$rsid <- rsids
write.table(data, "10_79854257_with_rsids.txt", sep = "\t", quote = FALSE, row.names = FALSE)

#Error in findGRanges(gr, columns = "rsid") : 
  could not find function "findGRanges"

# please also include the results of running the following in an R session 

sessionInfo( )

R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22621)

Matrix products: default


locale:
[1] LC_COLLATE=English_United Kingdom.utf8  LC_CTYPE=English_United Kingdom.utf8   
[3] LC_MONETARY=English_United Kingdom.utf8 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.utf8    

time zone: Europe/Madrid
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] TxDb.Hsapiens.UCSC.hg38.knownGene_3.17.0 GenomicFeatures_1.52.1                  
 [3] AnnotationDbi_1.62.2                     VariantAnnotation_1.46.0                
 [5] Rsamtools_2.16.0                         SummarizedExperiment_1.30.2             
 [7] Biobase_2.60.0                           MatrixGenerics_1.12.2                   
 [9] matrixStats_1.0.0                        BSgenome.Hsapiens.UCSC.hg38_1.4.5       
[11] SNPlocs.Hsapiens.dbSNP150.GRCh38_0.99.20 BSgenome_1.68.0                         
[13] rtracklayer_1.60.0                       Biostrings_2.68.1                       
[15] XVector_0.40.0                           GenomicRanges_1.52.0                    
[17] GenomeInfoDb_1.36.1                      IRanges_2.34.1                          
[19] S4Vectors_0.38.1                         BiocGenerics_0.46.0                     

loaded via a namespace (and not attached):
  [1] rstudioapi_0.15.0        magrittr_2.0.3           TH.data_1.1-2           
  [4] rmarkdown_2.23           fs_1.6.2                 BiocIO_1.10.0           
  [7] zlibbioc_1.46.0          vctrs_0.6.3              memoise_2.0.1           
 [10] RCurl_1.98-1.12          base64enc_0.1-3          progress_1.2.2          
 [13] htmltools_0.5.5          S4Arrays_1.0.4           usethis_2.2.2           
 [16] polspline_1.1.22         curl_5.0.1               Formula_1.2-5           
 [19] htmlwidgets_1.6.2        plyr_1.8.8               sandwich_3.0-2          
 [22] zoo_1.8-12               SNPassoc_2.1-1           cachem_1.0.8            
 [25] GenomicAlignments_1.36.0 mime_0.12                lifecycle_1.0.3         
 [28] pkgconfig_2.0.3          Matrix_1.5-4.1           R6_2.5.1                
 [31] fastmap_1.1.1            GenomeInfoDbData_1.2.10  shiny_1.7.4.1           
 [34] digest_0.6.32            colorspace_2.1-0         ps_1.7.5                
 [37] pkgload_1.3.2.1          RSQLite_2.3.1            Hmisc_5.1-0             
 [40] filelock_1.0.2           fansi_1.0.4              httr_1.4.6              
 [43] compiler_4.3.1           remotes_2.4.2            bit64_4.0.5             
 [46] htmlTable_2.4.1          backports_1.4.1          BiocParallel_1.34.2     
 [49] DBI_1.1.3                pkgbuild_1.4.2           biomaRt_2.56.1          
 [52] MASS_7.3-60              quantreg_5.95            rappdirs_0.3.3          
 [55] DelayedArray_0.26.6      sessioninfo_1.2.2        rjson_0.2.21            
 [58] tools_4.3.1              foreign_0.8-84           httpuv_1.6.11           
 [61] nnet_7.3-19              glue_1.6.2               restfulr_0.0.15         
 [64] callr_3.7.3              nlme_3.1-162             promises_1.2.0.1        
 [67] grid_4.3.1               checkmate_2.2.0          cluster_2.1.4           
 [70] generics_0.1.3           gtable_0.3.3             poisbinom_1.0.1         
 [73] tidyr_1.3.0              hms_1.1.3                data.table_1.14.8       
 [76] xml2_1.3.4               utf8_1.2.3               pillar_1.9.0            
 [79] stringr_1.5.0            later_1.3.1              splines_4.3.1           
 [82] dplyr_1.1.2              BiocFileCache_2.8.0      lattice_0.21-8          
 [85] bit_4.0.5                survival_3.5-5           SparseM_1.81            
 [88] tidyselect_1.2.0         rms_6.7-0                miniUI_0.1.1.1          
 [91] knitr_1.43               gridExtra_2.3            xfun_0.39               
 [94] devtools_2.4.5           stringi_1.7.12           yaml_2.3.7              
 [97] evaluate_0.21            codetools_0.2-19         tibble_3.2.1            
[100] BiocManager_1.30.21      cli_3.6.1                rpart_4.1.19            
[103] xtable_1.8-4             munsell_0.5.0            processx_3.8.1          
[106] Rcpp_1.0.10              haplo.stats_1.9.3        dbplyr_2.3.3            
[109] png_0.1-8                XML_3.99-0.14            parallel_4.3.1          
[112] MatrixModels_0.5-1       ellipsis_0.3.2           blob_1.2.4              
[115] ggplot2_3.4.2            prettyunits_1.1.1        arsenal_3.6.3           
[118] profvis_0.3.8            urlchecker_1.0.1         bitops_1.0-7            
[121] mvtnorm_1.2-2            scales_1.2.1             purrr_1.0.1             
[124] crayon_1.5.2             rlang_1.1.1              KEGGREST_1.40.0         
[127] multcomp_1.4-25         
>
rsids SNPlocs.Hsapiens.dbSNP150.GRCh38 • 1.6k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 18 hours ago
United States

See ?snpcounts. As an example,

> library(SNPlocs.Hsapiens.dbSNP144.GRCh38)
> snps <- SNPlocs.Hsapiens.dbSNP144.GRCh38
>  my_rsids <- c("rs10458597", "rs12565286", "rs7553394")
> snpsById(snps, my_rsids, ifnotfound = "drop")
UnstitchedGPos object with 2 positions and 2 metadata columns:
      seqnames       pos strand |   RefSNP_id alleles_as_ambig
         <Rle> <integer>  <Rle> | <character>      <character>
  [1]        1    629241      * |  rs10458597                Y
  [2]        1    785910      * |  rs12565286                S
  -------
  seqinfo: 25 sequences (1 circular) from GRCh38.p2 genome
0
Entering edit mode
@herve-pages-1542
Last seen 1 day ago
Seattle, WA, United States

Sorry for not seeing this earlier.

To map from position to rsid, use snpsByOverlaps(). First create a GRanges or GPos object my_snps that contains the genomic positions of your SNPs, then do:

library(GenomicRanges)

library(SNPlocs.Hsapiens.dbSNP155.GRCh38)
snps <- SNPlocs.Hsapiens.dbSNP155.GRCh38

known_snps <- snpsByOverlaps(snps, my_snps)
hits <- findOverlaps(my_snps, known_snps)

## A sanity check (unlikely to happen):
if (anyDuplicated(queryHits(hits)))
    warning("some SNPs are mapped to more than 1 known SNP")

## Integer vector that maps the SNPs in 'my_snps' to the SNPs in 'known_snps':
mapping <- selectHits(hits, select="first")

mcols(my_snps)$RefSNP_id <- mcols(known_snps)$RefSNP_id[mapping]

For this to work properly, you need to make sure that:

  1. The SNP positions in my_snps are with respect to reference genome GRCh38.
  2. my_snps uses the same chromosome naming conventions as GRCh38.

For example, with the following SNPs:

my_snps <- GPos(Rle(c("1", "2"), c(3, 2)), pos=c(785910, 900000, 629241, 50, 900047))
my_snps
# UnstitchedGPos object with 4 positions and 0 metadata columns:
#       seqnames       pos strand
#          <Rle> <integer>  <Rle>
#   [1]        1    785910      *
#   [2]        1    900000      *
#   [3]        1    629241      *
#   [4]        2        50      *
#   [5]        2    900047      *
#   -------
#   seqinfo: 2 sequences from an unspecified genome; no seqlengths

after running the code above, my_snps will become:

my_snps
# UnstitchedGPos object with 4 positions and 1 metadata column:
#       seqnames       pos strand |    RefSNP_id
#          <Rle> <integer>  <Rle> |  <character>
#   [1]        1    785910      * |   rs12565286
#   [2]        1    900000      * |         <NA>
#   [3]        1    629241      * |   rs10458597
#   [4]        2        50      * |         <NA>
#   [5]        2    900047      * | rs1199124244
#   -------
#   seqinfo: 2 sequences from an unspecified genome; no seqlengths

Hope this helps,

H.

ADD COMMENT

Login before adding your answer.

Traffic: 557 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6