Question: How often are SNPlocs.Hsapiens.dbSNP-* packages released? and why that time scale?
0
2.2 years ago by
Ramiro Magno100
CBMR, Faro, Portugal
Ramiro Magno100 wrote:

I am interested in accessing SNP annotation from dbSNP using a package like SNPlocs.Hsapiens.dbSNP-*. It seems however that these packages are not updated for every dbSNP build.

1. Why are these pkgs not built more often?
2. If I wanted to build a SNPlocs.Hsapiens.dbSNP-* package for the most recent build of the NCBI SNP database, what would I need to do? Would it suffice to run those tools indicated in the package's SNPlocs.Hsapiens.dbSNP***.GRCh38/inst/tools/README.TXT?
3. If I do build a new SNPlocs.Hsapiens.dbSNP-* pkg, does it still work nicely with BSgenome, namely, could I still inject SNPs and have them landing at the correction locations?

Thank you very much in advance.

snp snplocs dbsnp • 502 views
modified 2.2 years ago • written 2.2 years ago by Ramiro Magno100
Answer: How often are SNPlocs.Hsapiens.dbSNP-* packages released? and why that time scal
2
2.2 years ago by
Hervé Pagès ♦♦ 13k
United States
Hervé Pagès ♦♦ 13k wrote:

Hi Ramiro,

FWIW I recently made a SNPlocs package for dbSNP Build 149 (the latest dbSNP build) but it's only available in BioC devel (i.e. BioC 3.5, requires R 3.4):

https://bioconductor.org/packages/SNPlocs.Hsapiens.dbSNP149.GRCh38

I actually highly recommend that you upgrade your installation to use BioC devel if you're planning to do any serious work with the SNPlocs package because these packages have been refactored to allow much faster data access.

There is currently no established schedule for updating these packages. These packages are big, making them is time-consuming, and they are not heavily used, so it's kind of hard to keep up with every dbSNP new build and it would also be hard to justify spending too much resources on doing that (we have to choose our priorities). However, if the community starts to express more interest in these packages, I will update them more often e.g. make a new one for every other dbSNP build.

You could follow the instructions in SNPlocs.Hsapiens.dbSNP***.GRCh38/inst/tools/README.TXT to make your own SNPlocs package. Some users have done it before e.g. https://bioconductor.org/packages/SNPlocs.Hsapiens.dbSNP142.GRCh37. But as I said, this procedure can be tedious and time-consuming. Note that if you decide to do so, I would highly recommend that you follow the procedure described in the README.TXT file of the SNPlocs.Hsapiens.dbSNP149.GRCh38 package because the procedure has changed significantly in BioC 3.5. In particular it's faster now: it takes about 3h instead of 14h. The README.TXT file in SNPlocs.Hsapiens.dbSNP149.GRCh38 describes the most up-to-date version of this procedure. And yes, the resulting package should work for injection in a BSgenome object, and the SNPs should land at the correct positions.

H.

May I ask another thing? For each dbSNP build, NCBI provides this file RsMergeArch.bcp.gz that contains a translation table for SNPs that have been merged to new rsIDs. Is this type of resolution taken into account in SNPlocs.Hsapiens.dbSNP149.GRCh38, or do I need to translate my old SNP IDs before passing them to SNPlocs' functions?

1

Hi,

I didn't know about RsMergeArch.bcp.gz. Sounds like a valuable resource. Right now this information is not included in the SNPlocs packages and the snpsById() extractor doesn't perform any translation of the supplied ids.

H.

Hi Hervé,

Thanks anyways. I am going to install R devel and your recent SNPlocs package.

RM