Search
Question: dbSNP build for R package SNPlocs.Hsapiens.dbSNP.20080617
0
9.3 years ago by
Hervé Pagès ♦♦ 13k
United States
Hervé Pagès ♦♦ 13k wrote:
modified 3.2 years ago by genefan0 • written 9.3 years ago by Hervé Pagès ♦♦ 13k
0
9.3 years ago by
Lin Tang10
Lin Tang10 wrote:
0
9.3 years ago by
United States
James W. MacDonald47k wrote:
Hi Jim, James W. MacDonald wrote: > Hi Herve, > > I've been dealing with these data myself recently, and can confirm that > the data in March were build 129. They put the build 130 data up in > early May. > > As a side note, build 129 is known to be problematic, as there are > multiple RS numbers that map to the same location: > > http://www.ncbi.nlm.nih.gov/mailman/pipermail/dbsnp- announce/2008q2/000082.html > Indeed: > library(SNPlocs.Hsapiens.dbSNP.20080617) > data(chr1_snplocs) > sum(duplicated(chr1_snplocs$loc)) [1] 413 > which(duplicated(chr1_snplocs$loc))[1:10] [1] 2822 3030 9547 10865 12604 12641 16854 17898 21175 21977 > chr1_snplocs[chr1_snplocs$loc == chr1_snplocs$loc[2822], ] RefSNP_id alleles_as_ambig loc 2821 3766175 D 1476802 2822 59009700 W 1476802 Something that puzzled me when I first started to work on the SNPlocs.* packages (I saw this in Build 128 too). > > According to their help team, this problem has been resolved in build 130. Good. I'll make a new SNPlocs.Hsapiens.dbSNP.* from this build. Thanks! H. > > Best, > > Jim > > > > Hervé Pagès wrote: >> Hi Lin, >> >> I'm cc'ing the BioC list so other users might benefit from this. >> >> Lin Tang wrote: >>> Dear Dr. Pages, >>> >>> >>> >>> >>> I am using R package SNPlocs.Hsapiens.dbSNP.20080617 currently, I want >>> to check with you that whether this package corresponds to dbSNP build >>> 129 ? Although from the release date of this R package which is two >>> months after the release of dbSNP build 129, it is logical to be so. I >>> want to have it confirmed from you. I?d appreciate your kind reply on >>> this. Thanks! >> >> It's hard to tell. >> >> According to these pages: >> >> http://www.ncbi.nlm.nih.gov/mailman/pipermail/dbsnp- announce/2008q2/000081.html >> >> http://www.ncbi.nlm.nih.gov/projects/SNP/buildhistory.cgi >> Build 129 was released in April 2008 (note that the exact dates found >> on these >> 2 pages don't match). >> >> A similar research shows that Build 130 was released about 1 month ago. >> >> So at the time I downloaded the ds_flat_ch*.flat files from here >> ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/ASN1_flat >> in order to build SNPlocs.Hsapiens.dbSNP.20080617 (that was in March >> 2009), I assume that these files were a dump from Build 129. >> >> Note that the files under >> ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/ASN1_flat >> can change at anytime (and today they are indeed different from what they >> were back in March). It's a sad thing that the SNP team at NCBI doesn't >> provide permanent URLs for their past builds. And it doesn't help that >> the ds_flat_ch*.flat files they provide don't contain any information >> about the build that they're coming from. >> >> Anyway, in the future I'll put the Build information in the DESCRIPTION >> file of the SNPlocs packages. >> >> One last note. According to the SNP team at NCBI "Human SNPs in Build 129 >> are mapping to NCBI build 36.3". That is, to our >> BSgenome.Hsapiens.UCSC.hg18 >> package. According to UCSC, hg18 is NCBI Build 36.1 but NCBI Build >> 36.1 and >> NCBI Build 36.3 are identical from a *sequence* point of view (I think >> what >> makes them different are the annotations provided by NCBI). >> This means that, if you are planning to inject >> SNPlocs.Hsapiens.dbSNP.20080617 >> in a genome, it only makes sense to do it with >> BSgenome.Hsapiens.UCSC.hg18. >> >> In the future we will put in place a mechanism to make this injection >> safer >> i.e. check that the injected stuff and the host are compatible. >> >> Cheers, >> H. >> >> >>> >>> >>> Regards, >>> >>> Lin Tang, Ph.D. >>> >>> Scientist , Informatics | Sequenom Inc. >>> >>> T: 1 858 202 9106 | F: 1 858 202 9084 | E: ltang at sequenom.com >>> >>> >>> >>> >>> >>> THIS EMAIL MESSAGE IS FOR THE SOLE USE OF THE INTENDED RECIPIENT(S) >>> AND MAY CONTAIN CONFIDENTIAL INFORMATION. ANY UNAUTHORIZED REVIEW, >>> USE, DISCLOSURE OR DISTRIBUTION IS PROHIBITED. IF YOU ARE NOT THE >>> INTENDED RECIPIENT, PLEASE CONTACT THE SENDER BY REPLY EMAIL AND >>> DESTROY ALL COPIES OF THE ORIGINAL MESSAGE. >>> >> > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
Thanks all for the discussion. Really looking forward for the updated package! Lin -----Original Message----- From: Hervé Pagès [mailto:hpages@fhcrc.org] Sent: Thursday, June 04, 2009 10:59 AM To: James W. MacDonald Cc: Lin Tang; bioconductor Subject: Re: [BioC] dbSNP build for R package SNPlocs.Hsapiens.dbSNP.20080617 Hi Jim, James W. MacDonald wrote: > Hi Herve, > > I've been dealing with these data myself recently, and can confirm that > the data in March were build 129. They put the build 130 data up in > early May. > > As a side note, build 129 is known to be problematic, as there are > multiple RS numbers that map to the same location: > > http://www.ncbi.nlm.nih.gov/mailman/pipermail/dbsnp- announce/2008q2/000082.html > Indeed: > library(SNPlocs.Hsapiens.dbSNP.20080617) > data(chr1_snplocs) > sum(duplicated(chr1_snplocs$loc)) [1] 413 > which(duplicated(chr1_snplocs$loc))[1:10] [1] 2822 3030 9547 10865 12604 12641 16854 17898 21175 21977 > chr1_snplocs[chr1_snplocs$loc == chr1_snplocs$loc[2822], ] RefSNP_id alleles_as_ambig loc 2821 3766175 D 1476802 2822 59009700 W 1476802 Something that puzzled me when I first started to work on the SNPlocs.* packages (I saw this in Build 128 too). > > According to their help team, this problem has been resolved in build 130. Good. I'll make a new SNPlocs.Hsapiens.dbSNP.* from this build. Thanks! H. > > Best, > > Jim > > > > Hervé Pagès wrote: >> Hi Lin, >> >> I'm cc'ing the BioC list so other users might benefit from this. >> >> Lin Tang wrote: >>> Dear Dr. Pages, >>> >>> >>> >>> >>> I am using R package SNPlocs.Hsapiens.dbSNP.20080617 currently, I want >>> to check with you that whether this package corresponds to dbSNP build >>> 129 ? Although from the release date of this R package which is two >>> months after the release of dbSNP build 129, it is logical to be so. I >>> want to have it confirmed from you. I'd appreciate your kind reply on >>> this. Thanks! >> >> It's hard to tell. >> >> According to these pages: >> >> http://www.ncbi.nlm.nih.gov/mailman/pipermail/dbsnp- announce/2008q2/000081.html >> >> http://www.ncbi.nlm.nih.gov/projects/SNP/buildhistory.cgi >> Build 129 was released in April 2008 (note that the exact dates found >> on these >> 2 pages don't match). >> >> A similar research shows that Build 130 was released about 1 month ago. >> >> So at the time I downloaded the ds_flat_ch*.flat files from here >> ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/ASN1_flat >> in order to build SNPlocs.Hsapiens.dbSNP.20080617 (that was in March >> 2009), I assume that these files were a dump from Build 129. >> >> Note that the files under >> ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/ASN1_flat >> can change at anytime (and today they are indeed different from what they >> were back in March). It's a sad thing that the SNP team at NCBI doesn't >> provide permanent URLs for their past builds. And it doesn't help that >> the ds_flat_ch*.flat files they provide don't contain any information >> about the build that they're coming from. >> >> Anyway, in the future I'll put the Build information in the DESCRIPTION >> file of the SNPlocs packages. >> >> One last note. According to the SNP team at NCBI "Human SNPs in Build 129 >> are mapping to NCBI build 36.3". That is, to our >> BSgenome.Hsapiens.UCSC.hg18 >> package. According to UCSC, hg18 is NCBI Build 36.1 but NCBI Build >> 36.1 and >> NCBI Build 36.3 are identical from a *sequence* point of view (I think >> what >> makes them different are the annotations provided by NCBI). >> This means that, if you are planning to inject >> SNPlocs.Hsapiens.dbSNP.20080617 >> in a genome, it only makes sense to do it with >> BSgenome.Hsapiens.UCSC.hg18. >> >> In the future we will put in place a mechanism to make this injection >> safer >> i.e. check that the injected stuff and the host are compatible. >> >> Cheers, >> H. >> >> >>> >>> >>> Regards, >>> >>> Lin Tang, Ph.D. >>> >>> Scientist , Informatics | Sequenom Inc. >>> >>> T: 1 858 202 9106 | F: 1 858 202 9084 | E: ltang at sequenom.com >>> >>> >>> >>> >>> >>> THIS EMAIL MESSAGE IS FOR THE SOLE USE OF THE INTENDED RECIPIENT(S) >>> AND MAY CONTAIN CONFIDENTIAL INFORMATION. ANY UNAUTHORIZED REVIEW, >>> USE, DISCLOSURE OR DISTRIBUTION IS PROHIBITED. IF YOU ARE NOT THE >>> INTENDED RECIPIENT, PLEASE CONTACT THE SENDER BY REPLY EMAIL AND >>> DESTROY ALL COPIES OF THE ORIGINAL MESSAGE. >>> >> > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
Hi SNPlocs users, I've added SNPlocs.Hsapiens.dbSNP.20090506 to the BioC repo (in BioC release only, source tarball only, but that's just for now). It contains the SNP locations and alleles for Homo sapiens extracted from dbSNP BUILD 130 (the latest dbSNP build). From within R-2.9: > library(BSgenome) > available.SNPs() [1] "SNPlocs.Hsapiens.dbSNP.20071016" "SNPlocs.Hsapiens.dbSNP.20080617" [3] "SNPlocs.Hsapiens.dbSNP.20090506" Install with: source("http://bioconductor.org/biocLite.R") biocLite("SNPlocs.Hsapiens.dbSNP.20090506") Then: > library(SNPlocs.Hsapiens.dbSNP.20090506) > ?SNPlocs.Hsapiens.dbSNP.20090506 # now there is a man page! > getSNPcount() chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 920233 933616 789121 798603 706109 760249 655873 612367 496064 583240 577300 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 558759 427010 365742 331501 354239 316396 322866 268235 323041 160580 187392 chrX chrY 391414 6539 Overall, that's 10% more SNPs than in the previous build (BUILD 129). Note that, like with the previous builds, there are still different RefSNP IDs that are mapped to the same location: > chr1_snps <- getSNPlocs("chr1") > sum(duplicated(chr1_snps$loc)) [1] 950 Twice more than with BUILD 129! > which(duplicated(chr1_snps$loc))[1:10] [1] 3142 3365 7835 8161 8327 10638 12113 14060 14640 15538 > chr1_snps[chr1_snps$loc == chr1_snps$loc[3142], ] RefSNP_id alleles_as_ambig loc 3141 3766175 D 1476802 3142 59009700 W 1476802 Please let me know if you find any problem with this new package. Cheers, H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
0
3.2 years ago by
genefan0
Germany
genefan0 wrote:

Hi Herve,

I was trying to install the package SNPlocs.Hsapiens.dbSNP.20080617 using Bioconductor version 3.1 (R version 3.2.0). However, It is not available for the new R version. I'd like to ask you 1.) Is there other solution except using old R version? 2.) Will Bioconductor always exclude the old dbSNP database if R version is updated?  Thanks a lot in advance.

Best wishes,

Genefan