Question: using snpid2grange() and/or rsidsToGRanges() for rsid lists with missing
0
gravatar for k.askland
5.2 years ago by
k.askland20
United States
k.askland20 wrote:

Hello,

This is a follow-up question to my previous post on rsID-toHGNC mapping. 

I am having trouble retrieving the positional information for a set of rsids in which some of those rsids do not have a match in the curated dbSNP database accessed via snpid2grange() and rsidsToGRanges() functions.

I have done test runs on smaller subsets of rsids and some return results while others produce an error, e.g.,:

#test1:

>rsIDsSUBSET <- rsidList[1:100,1]
>rsids <- stri_sub(rsIDsSUBSET,3)

>gr <- snpid2grange(SNPlocs.Hsapiens.dbSNP.20120608, rsids)

# Outuput: Error in .snpid2rowidx(x, snpid) : 
  SNP id(s) not found: 35418599, 35367238, 4438386, 28833602, 9730959, 28451502, 553358

 

#test 2

>rsIDsSUBSET <- rsidList[10000:10200,1]
>rsids <- stri_sub(rsIDsSUBSET,3)

>gr <- snpid2grange(SNPlocs.Hsapiens.dbSNP.20120608, rsids)

# Outuput: GRanges with 201 ranges and 2 metadata columns

It's difficult to tell if the error is generated when ANY rsid is missing from dbSNP or only if some threshold number are missing. In either case, since I have a list of 1.9M rsids for which I need to retrieve data, this will be prohibitive.

Is there any way to get either of these functions to simply omit the rsids for which an entry is missing in dbSNP?

Thank you,
Kathleen

snplocs • 865 views
ADD COMMENTlink modified 5.2 years ago by Vincent J. Carey, Jr.6.3k • written 5.2 years ago by k.askland20
Answer: using snpid2grange() and/or rsidsToGRanges() for rsid lists with missing
0
gravatar for Vincent J. Carey, Jr.
5.2 years ago by
United States
Vincent J. Carey, Jr.6.3k wrote:

one unfound ID is enough to tip the cart

> snpid2grange(SNPlocs.Hsapiens.dbSNP.20120608,35418599)

Error in .snpid2rowidx(x, snpid) : SNP id(s) not found: 35418599

I think it is known that we want a softer landing for this sort of thing.  However, you can

dig a little deeper and perhaps simplify the prefiltering step that you will need to avoid

this in the current situation.  You have to find .snpid2rowidx, the function that throws

the error.  If you dig around appropriately you will find a utility that can be called as

ALLR = BSgenome:::.get_SNPlocs_data(SNPlocs.Hsapiens.dbSNP.20120608, "all_rsids")

Browse[3]> 35418599 %in% ALLR

[1] FALSE

Browse[3]> 6060535 %in% ALLR

[1] TRUE

These are slow operations but will help you to avoid including "nonexistent" ids in your list.

 

ADD COMMENTlink written 5.2 years ago by Vincent J. Carey, Jr.6.3k
Answer: using snpid2grange() and/or rsidsToGRanges() for rsid lists with missing
0
gravatar for Vincent J. Carey, Jr.
5.2 years ago by
United States
Vincent J. Carey, Jr.6.3k wrote:

I should also mention that when we are calling functions via ::: and finding functions with

leading dots in their names, we are doing things that are not guaranteed to persist as packages

evolve ... so this advice may not be valid in the long term.  In fact, we should all include the sessionInfo

so that the context of failing/running code can be clear:

> sessionInfo()

R version 3.1.1 (2014-07-10)

Platform: x86_64-apple-darwin13.1.0 (64-bit)

 

locale:

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

 

attached base packages:

 [1] stats4    parallel  stats     graphics  grDevices datasets  utils     tools     methods   base     

 

other attached packages:

 [1] SNPlocs.Hsapiens.dbSNP.20120608_0.99.9 BSgenome_1.33.9                       

 [3] rtracklayer_1.25.16                    Biostrings_2.33.15                    

 [5] XVector_0.5.8                          GenomicRanges_1.17.42                 

 [7] GenomeInfoDb_1.1.26                    IRanges_1.99.32                       

 [9] S4Vectors_0.2.8                        BiocGenerics_0.11.5                   

[11] rmarkdown_0.3.3                        knitr_1.6                             

[13] weaver_1.31.0                          codetools_0.2-9                       

[15] digest_0.6.4                           BiocInstaller_1.15.5                  

 

loaded via a namespace (and not attached):

 [1] base64enc_0.1-2          BatchJobs_1.4            BBmisc_1.7               BiocParallel_0.99.25    

 [5] bitops_1.0-6             brew_1.0-6               checkmate_1.4            DBI_0.3.1               

 [9] evaluate_0.5.5           fail_1.2                 foreach_1.4.2            formatR_1.0             

[13] futile.logger_1.3.7      futile.options_1.0.0     GenomicAlignments_1.1.30 htmltools_0.2.6         

[17] iterators_1.0.7          lambda.r_1.1.6           RCurl_1.95-4.3           Rsamtools_1.17.36       

[21] RSQLite_0.11.4           sendmailR_1.2-1          stringr_0.6.2            XML_3.98-1.1            

[25] zlibbioc_1.11.1         

ADD COMMENTlink written 5.2 years ago by Vincent J. Carey, Jr.6.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 415 users visited in the last hour