using snpid2grange() and/or rsidsToGRanges() for rsid lists with missing
2
0
Entering edit mode
k.askland ▴ 20
@kaskland-6850
Last seen 8.5 years ago
United States

Hello,

This is a follow-up question to my previous post on rsID-toHGNC mapping. 

I am having trouble retrieving the positional information for a set of rsids in which some of those rsids do not have a match in the curated dbSNP database accessed via snpid2grange() and rsidsToGRanges() functions.

I have done test runs on smaller subsets of rsids and some return results while others produce an error, e.g.,:

#test1:

>rsIDsSUBSET <- rsidList[1:100,1]
>rsids <- stri_sub(rsIDsSUBSET,3)

>gr <- snpid2grange(SNPlocs.Hsapiens.dbSNP.20120608, rsids)

# Outuput: Error in .snpid2rowidx(x, snpid) : 
  SNP id(s) not found: 35418599, 35367238, 4438386, 28833602, 9730959, 28451502, 553358

 

#test 2

>rsIDsSUBSET <- rsidList[10000:10200,1]
>rsids <- stri_sub(rsIDsSUBSET,3)

>gr <- snpid2grange(SNPlocs.Hsapiens.dbSNP.20120608, rsids)

# Outuput: GRanges with 201 ranges and 2 metadata columns

It's difficult to tell if the error is generated when ANY rsid is missing from dbSNP or only if some threshold number are missing. In either case, since I have a list of 1.9M rsids for which I need to retrieve data, this will be prohibitive.

Is there any way to get either of these functions to simply omit the rsids for which an entry is missing in dbSNP?

Thank you,
Kathleen

SNPlocs • 1.6k views
ADD COMMENT
0
Entering edit mode
@vincent-j-carey-jr-4
Last seen 6 weeks ago
United States

one unfound ID is enough to tip the cart

> snpid2grange(SNPlocs.Hsapiens.dbSNP.20120608,35418599)

Error in .snpid2rowidx(x, snpid) : SNP id(s) not found: 35418599

I think it is known that we want a softer landing for this sort of thing.  However, you can

dig a little deeper and perhaps simplify the prefiltering step that you will need to avoid

this in the current situation.  You have to find .snpid2rowidx, the function that throws

the error.  If you dig around appropriately you will find a utility that can be called as

ALLR = BSgenome:::.get_SNPlocs_data(SNPlocs.Hsapiens.dbSNP.20120608, "all_rsids")

Browse[3]> 35418599 %in% ALLR

[1] FALSE

Browse[3]> 6060535 %in% ALLR

[1] TRUE

These are slow operations but will help you to avoid including "nonexistent" ids in your list.

 

0
Entering edit mode
@vincent-j-carey-jr-4
Last seen 6 weeks ago
United States

I should also mention that when we are calling functions via ::: and finding functions with

leading dots in their names, we are doing things that are not guaranteed to persist as packages

evolve ... so this advice may not be valid in the long term.  In fact, we should all include the sessionInfo

so that the context of failing/running code can be clear:

> sessionInfo()

R version 3.1.1 (2014-07-10)

Platform: x86_64-apple-darwin13.1.0 (64-bit)

 

locale:

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

 

attached base packages:

 [1] stats4    parallel  stats     graphics  grDevices datasets  utils     tools     methods   base     

 

other attached packages:

 [1] SNPlocs.Hsapiens.dbSNP.20120608_0.99.9 BSgenome_1.33.9                       

 [3] rtracklayer_1.25.16                    Biostrings_2.33.15                    

 [5] XVector_0.5.8                          GenomicRanges_1.17.42                 

 [7] GenomeInfoDb_1.1.26                    IRanges_1.99.32                       

 [9] S4Vectors_0.2.8                        BiocGenerics_0.11.5                   

[11] rmarkdown_0.3.3                        knitr_1.6                             

[13] weaver_1.31.0                          codetools_0.2-9                       

[15] digest_0.6.4                           BiocInstaller_1.15.5                  

 

loaded via a namespace (and not attached):

 [1] base64enc_0.1-2          BatchJobs_1.4            BBmisc_1.7               BiocParallel_0.99.25    

 [5] bitops_1.0-6             brew_1.0-6               checkmate_1.4            DBI_0.3.1               

 [9] evaluate_0.5.5           fail_1.2                 foreach_1.4.2            formatR_1.0             

[13] futile.logger_1.3.7      futile.options_1.0.0     GenomicAlignments_1.1.30 htmltools_0.2.6         

[17] iterators_1.0.7          lambda.r_1.1.6           RCurl_1.95-4.3           Rsamtools_1.17.36       

[21] RSQLite_0.11.4           sendmailR_1.2-1          stringr_0.6.2            XML_3.98-1.1            

[25] zlibbioc_1.11.1         

Login before adding your answer.

Traffic: 1070 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6