dbSNP build for R package SNPlocs.Hsapiens.dbSNP.20080617
3
0
Entering edit mode
@herve-pages-1542
Last seen 1 day ago
Seattle, WA, United States
Hi Lin, I'm cc'ing the BioC list so other users might benefit from this. Lin Tang wrote: > Dear Dr. Pages, > > > > > I am using R package SNPlocs.Hsapiens.dbSNP.20080617 currently, I want > to check with you that whether this package corresponds to dbSNP build > 129 ? Although from the release date of this R package which is two > months after the release of dbSNP build 129, it is logical to be so. I > want to have it confirmed from you. I?d appreciate your kind reply on > this. Thanks! It's hard to tell. According to these pages: http://www.ncbi.nlm.nih.gov/mailman/pipermail/dbsnp- announce/2008q2/000081.html http://www.ncbi.nlm.nih.gov/projects/SNP/buildhistory.cgi Build 129 was released in April 2008 (note that the exact dates found on these 2 pages don't match). A similar research shows that Build 130 was released about 1 month ago. So at the time I downloaded the ds_flat_ch*.flat files from here ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/ASN1_flat in order to build SNPlocs.Hsapiens.dbSNP.20080617 (that was in March 2009), I assume that these files were a dump from Build 129. Note that the files under ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/ASN1_flat can change at anytime (and today they are indeed different from what they were back in March). It's a sad thing that the SNP team at NCBI doesn't provide permanent URLs for their past builds. And it doesn't help that the ds_flat_ch*.flat files they provide don't contain any information about the build that they're coming from. Anyway, in the future I'll put the Build information in the DESCRIPTION file of the SNPlocs packages. One last note. According to the SNP team at NCBI "Human SNPs in Build 129 are mapping to NCBI build 36.3". That is, to our BSgenome.Hsapiens.UCSC.hg18 package. According to UCSC, hg18 is NCBI Build 36.1 but NCBI Build 36.1 and NCBI Build 36.3 are identical from a *sequence* point of view (I think what makes them different are the annotations provided by NCBI). This means that, if you are planning to inject SNPlocs.Hsapiens.dbSNP.20080617 in a genome, it only makes sense to do it with BSgenome.Hsapiens.UCSC.hg18. In the future we will put in place a mechanism to make this injection safer i.e. check that the injected stuff and the host are compatible. Cheers, H. > > > Regards, > > Lin Tang, Ph.D. > > Scientist , Informatics | Sequenom Inc. > > T: 1 858 202 9106 | F: 1 858 202 9084 | E: ltang at sequenom.com > > > > > > THIS EMAIL MESSAGE IS FOR THE SOLE USE OF THE INTENDED RECIPIENT(S) AND > MAY CONTAIN CONFIDENTIAL INFORMATION. ANY UNAUTHORIZED REVIEW, USE, > DISCLOSURE OR DISTRIBUTION IS PROHIBITED. IF YOU ARE NOT THE INTENDED > RECIPIENT, PLEASE CONTACT THE SENDER BY REPLY EMAIL AND DESTROY ALL > COPIES OF THE ORIGINAL MESSAGE. > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
SNP Cancer SNPlocs SNP Cancer SNPlocs • 2.5k views
ADD COMMENT
0
Entering edit mode
Lin Tang ▴ 10
@lin-tang-3486
Last seen 10.2 years ago
Dear Herv?, Thanks for your quick reply and extra notes! Yes, it will definitely be helpful if you would add dbSNP build information in the future release of the R package. Best regards, Lin Lin Tang, Ph.D. Scientist , Informatics | Sequenom Inc. T: 1 858 202 9106 | F: 1 858 202 9084 | E: ltang at sequenom.com -----Original Message----- From: Hervé Pagès [mailto:hpages@fhcrc.org] Sent: Wednesday, June 03, 2009 1:06 PM To: Lin Tang Cc: bioconductor Subject: Re: dbSNP build for R package SNPlocs.Hsapiens.dbSNP.20080617 Hi Lin, I'm cc'ing the BioC list so other users might benefit from this. Lin Tang wrote: > Dear Dr. Pages, > > > > > I am using R package SNPlocs.Hsapiens.dbSNP.20080617 currently, I want > to check with you that whether this package corresponds to dbSNP build > 129 ? Although from the release date of this R package which is two > months after the release of dbSNP build 129, it is logical to be so. I > want to have it confirmed from you. I'd appreciate your kind reply on > this. Thanks! It's hard to tell. According to these pages: http://www.ncbi.nlm.nih.gov/mailman/pipermail/dbsnp- announce/2008q2/000081.html http://www.ncbi.nlm.nih.gov/projects/SNP/buildhistory.cgi Build 129 was released in April 2008 (note that the exact dates found on these 2 pages don't match). A similar research shows that Build 130 was released about 1 month ago. So at the time I downloaded the ds_flat_ch*.flat files from here ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/ASN1_flat in order to build SNPlocs.Hsapiens.dbSNP.20080617 (that was in March 2009), I assume that these files were a dump from Build 129. Note that the files under ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/ASN1_flat can change at anytime (and today they are indeed different from what they were back in March). It's a sad thing that the SNP team at NCBI doesn't provide permanent URLs for their past builds. And it doesn't help that the ds_flat_ch*.flat files they provide don't contain any information about the build that they're coming from. Anyway, in the future I'll put the Build information in the DESCRIPTION file of the SNPlocs packages. One last note. According to the SNP team at NCBI "Human SNPs in Build 129 are mapping to NCBI build 36.3". That is, to our BSgenome.Hsapiens.UCSC.hg18 package. According to UCSC, hg18 is NCBI Build 36.1 but NCBI Build 36.1 and NCBI Build 36.3 are identical from a *sequence* point of view (I think what makes them different are the annotations provided by NCBI). This means that, if you are planning to inject SNPlocs.Hsapiens.dbSNP.20080617 in a genome, it only makes sense to do it with BSgenome.Hsapiens.UCSC.hg18. In the future we will put in place a mechanism to make this injection safer i.e. check that the injected stuff and the host are compatible. Cheers, H. > > > Regards, > > Lin Tang, Ph.D. > > Scientist , Informatics | Sequenom Inc. > > T: 1 858 202 9106 | F: 1 858 202 9084 | E: ltang at sequenom.com > > > > > > THIS EMAIL MESSAGE IS FOR THE SOLE USE OF THE INTENDED RECIPIENT(S) AND > MAY CONTAIN CONFIDENTIAL INFORMATION. ANY UNAUTHORIZED REVIEW, USE, > DISCLOSURE OR DISTRIBUTION IS PROHIBITED. IF YOU ARE NOT THE INTENDED > RECIPIENT, PLEASE CONTACT THE SENDER BY REPLY EMAIL AND DESTROY ALL > COPIES OF THE ORIGINAL MESSAGE. > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States
Hi Herve, I've been dealing with these data myself recently, and can confirm that the data in March were build 129. They put the build 130 data up in early May. As a side note, build 129 is known to be problematic, as there are multiple RS numbers that map to the same location: http://www.ncbi.nlm.nih.gov/mailman/pipermail/dbsnp- announce/2008q2/000082.html According to their help team, this problem has been resolved in build 130. Best, Jim Hervé Pagès wrote: > Hi Lin, > > I'm cc'ing the BioC list so other users might benefit from this. > > Lin Tang wrote: >> Dear Dr. Pages, >> >> >> >> >> I am using R package SNPlocs.Hsapiens.dbSNP.20080617 currently, I want >> to check with you that whether this package corresponds to dbSNP build >> 129 ? Although from the release date of this R package which is two >> months after the release of dbSNP build 129, it is logical to be so. I >> want to have it confirmed from you. I?d appreciate your kind reply on >> this. Thanks! > > It's hard to tell. > > According to these pages: > > http://www.ncbi.nlm.nih.gov/mailman/pipermail/dbsnp- announce/2008q2/000081.html > > http://www.ncbi.nlm.nih.gov/projects/SNP/buildhistory.cgi > Build 129 was released in April 2008 (note that the exact dates found on > these > 2 pages don't match). > > A similar research shows that Build 130 was released about 1 month ago. > > So at the time I downloaded the ds_flat_ch*.flat files from here > ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/ASN1_flat > in order to build SNPlocs.Hsapiens.dbSNP.20080617 (that was in March > 2009), I assume that these files were a dump from Build 129. > > Note that the files under > ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/ASN1_flat > can change at anytime (and today they are indeed different from what they > were back in March). It's a sad thing that the SNP team at NCBI doesn't > provide permanent URLs for their past builds. And it doesn't help that > the ds_flat_ch*.flat files they provide don't contain any information > about the build that they're coming from. > > Anyway, in the future I'll put the Build information in the DESCRIPTION > file of the SNPlocs packages. > > One last note. According to the SNP team at NCBI "Human SNPs in Build 129 > are mapping to NCBI build 36.3". That is, to our > BSgenome.Hsapiens.UCSC.hg18 > package. According to UCSC, hg18 is NCBI Build 36.1 but NCBI Build 36.1 and > NCBI Build 36.3 are identical from a *sequence* point of view (I think what > makes them different are the annotations provided by NCBI). > This means that, if you are planning to inject > SNPlocs.Hsapiens.dbSNP.20080617 > in a genome, it only makes sense to do it with BSgenome.Hsapiens.UCSC.hg18. > > In the future we will put in place a mechanism to make this injection safer > i.e. check that the injected stuff and the host are compatible. > > Cheers, > H. > > >> >> >> Regards, >> >> Lin Tang, Ph.D. >> >> Scientist , Informatics | Sequenom Inc. >> >> T: 1 858 202 9106 | F: 1 858 202 9084 | E: ltang at sequenom.com >> >> >> >> >> >> THIS EMAIL MESSAGE IS FOR THE SOLE USE OF THE INTENDED RECIPIENT(S) >> AND MAY CONTAIN CONFIDENTIAL INFORMATION. ANY UNAUTHORIZED REVIEW, >> USE, DISCLOSURE OR DISTRIBUTION IS PROHIBITED. IF YOU ARE NOT THE >> INTENDED RECIPIENT, PLEASE CONTACT THE SENDER BY REPLY EMAIL AND >> DESTROY ALL COPIES OF THE ORIGINAL MESSAGE. >> > -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826
ADD COMMENT
0
Entering edit mode
Hi Jim, James W. MacDonald wrote: > Hi Herve, > > I've been dealing with these data myself recently, and can confirm that > the data in March were build 129. They put the build 130 data up in > early May. > > As a side note, build 129 is known to be problematic, as there are > multiple RS numbers that map to the same location: > > http://www.ncbi.nlm.nih.gov/mailman/pipermail/dbsnp- announce/2008q2/000082.html > Indeed: > library(SNPlocs.Hsapiens.dbSNP.20080617) > data(chr1_snplocs) > sum(duplicated(chr1_snplocs$loc)) [1] 413 > which(duplicated(chr1_snplocs$loc))[1:10] [1] 2822 3030 9547 10865 12604 12641 16854 17898 21175 21977 > chr1_snplocs[chr1_snplocs$loc == chr1_snplocs$loc[2822], ] RefSNP_id alleles_as_ambig loc 2821 3766175 D 1476802 2822 59009700 W 1476802 Something that puzzled me when I first started to work on the SNPlocs.* packages (I saw this in Build 128 too). > > According to their help team, this problem has been resolved in build 130. Good. I'll make a new SNPlocs.Hsapiens.dbSNP.* from this build. Thanks! H. > > Best, > > Jim > > > > Hervé Pagès wrote: >> Hi Lin, >> >> I'm cc'ing the BioC list so other users might benefit from this. >> >> Lin Tang wrote: >>> Dear Dr. Pages, >>> >>> >>> >>> >>> I am using R package SNPlocs.Hsapiens.dbSNP.20080617 currently, I want >>> to check with you that whether this package corresponds to dbSNP build >>> 129 ? Although from the release date of this R package which is two >>> months after the release of dbSNP build 129, it is logical to be so. I >>> want to have it confirmed from you. I?d appreciate your kind reply on >>> this. Thanks! >> >> It's hard to tell. >> >> According to these pages: >> >> http://www.ncbi.nlm.nih.gov/mailman/pipermail/dbsnp- announce/2008q2/000081.html >> >> http://www.ncbi.nlm.nih.gov/projects/SNP/buildhistory.cgi >> Build 129 was released in April 2008 (note that the exact dates found >> on these >> 2 pages don't match). >> >> A similar research shows that Build 130 was released about 1 month ago. >> >> So at the time I downloaded the ds_flat_ch*.flat files from here >> ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/ASN1_flat >> in order to build SNPlocs.Hsapiens.dbSNP.20080617 (that was in March >> 2009), I assume that these files were a dump from Build 129. >> >> Note that the files under >> ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/ASN1_flat >> can change at anytime (and today they are indeed different from what they >> were back in March). It's a sad thing that the SNP team at NCBI doesn't >> provide permanent URLs for their past builds. And it doesn't help that >> the ds_flat_ch*.flat files they provide don't contain any information >> about the build that they're coming from. >> >> Anyway, in the future I'll put the Build information in the DESCRIPTION >> file of the SNPlocs packages. >> >> One last note. According to the SNP team at NCBI "Human SNPs in Build 129 >> are mapping to NCBI build 36.3". That is, to our >> BSgenome.Hsapiens.UCSC.hg18 >> package. According to UCSC, hg18 is NCBI Build 36.1 but NCBI Build >> 36.1 and >> NCBI Build 36.3 are identical from a *sequence* point of view (I think >> what >> makes them different are the annotations provided by NCBI). >> This means that, if you are planning to inject >> SNPlocs.Hsapiens.dbSNP.20080617 >> in a genome, it only makes sense to do it with >> BSgenome.Hsapiens.UCSC.hg18. >> >> In the future we will put in place a mechanism to make this injection >> safer >> i.e. check that the injected stuff and the host are compatible. >> >> Cheers, >> H. >> >> >>> >>> >>> Regards, >>> >>> Lin Tang, Ph.D. >>> >>> Scientist , Informatics | Sequenom Inc. >>> >>> T: 1 858 202 9106 | F: 1 858 202 9084 | E: ltang at sequenom.com >>> >>> >>> >>> >>> >>> THIS EMAIL MESSAGE IS FOR THE SOLE USE OF THE INTENDED RECIPIENT(S) >>> AND MAY CONTAIN CONFIDENTIAL INFORMATION. ANY UNAUTHORIZED REVIEW, >>> USE, DISCLOSURE OR DISTRIBUTION IS PROHIBITED. IF YOU ARE NOT THE >>> INTENDED RECIPIENT, PLEASE CONTACT THE SENDER BY REPLY EMAIL AND >>> DESTROY ALL COPIES OF THE ORIGINAL MESSAGE. >>> >> > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY
0
Entering edit mode
Thanks all for the discussion. Really looking forward for the updated package! Lin -----Original Message----- From: Hervé Pagès [mailto:hpages@fhcrc.org] Sent: Thursday, June 04, 2009 10:59 AM To: James W. MacDonald Cc: Lin Tang; bioconductor Subject: Re: [BioC] dbSNP build for R package SNPlocs.Hsapiens.dbSNP.20080617 Hi Jim, James W. MacDonald wrote: > Hi Herve, > > I've been dealing with these data myself recently, and can confirm that > the data in March were build 129. They put the build 130 data up in > early May. > > As a side note, build 129 is known to be problematic, as there are > multiple RS numbers that map to the same location: > > http://www.ncbi.nlm.nih.gov/mailman/pipermail/dbsnp- announce/2008q2/000082.html > Indeed: > library(SNPlocs.Hsapiens.dbSNP.20080617) > data(chr1_snplocs) > sum(duplicated(chr1_snplocs$loc)) [1] 413 > which(duplicated(chr1_snplocs$loc))[1:10] [1] 2822 3030 9547 10865 12604 12641 16854 17898 21175 21977 > chr1_snplocs[chr1_snplocs$loc == chr1_snplocs$loc[2822], ] RefSNP_id alleles_as_ambig loc 2821 3766175 D 1476802 2822 59009700 W 1476802 Something that puzzled me when I first started to work on the SNPlocs.* packages (I saw this in Build 128 too). > > According to their help team, this problem has been resolved in build 130. Good. I'll make a new SNPlocs.Hsapiens.dbSNP.* from this build. Thanks! H. > > Best, > > Jim > > > > Hervé Pagès wrote: >> Hi Lin, >> >> I'm cc'ing the BioC list so other users might benefit from this. >> >> Lin Tang wrote: >>> Dear Dr. Pages, >>> >>> >>> >>> >>> I am using R package SNPlocs.Hsapiens.dbSNP.20080617 currently, I want >>> to check with you that whether this package corresponds to dbSNP build >>> 129 ? Although from the release date of this R package which is two >>> months after the release of dbSNP build 129, it is logical to be so. I >>> want to have it confirmed from you. I'd appreciate your kind reply on >>> this. Thanks! >> >> It's hard to tell. >> >> According to these pages: >> >> http://www.ncbi.nlm.nih.gov/mailman/pipermail/dbsnp- announce/2008q2/000081.html >> >> http://www.ncbi.nlm.nih.gov/projects/SNP/buildhistory.cgi >> Build 129 was released in April 2008 (note that the exact dates found >> on these >> 2 pages don't match). >> >> A similar research shows that Build 130 was released about 1 month ago. >> >> So at the time I downloaded the ds_flat_ch*.flat files from here >> ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/ASN1_flat >> in order to build SNPlocs.Hsapiens.dbSNP.20080617 (that was in March >> 2009), I assume that these files were a dump from Build 129. >> >> Note that the files under >> ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/ASN1_flat >> can change at anytime (and today they are indeed different from what they >> were back in March). It's a sad thing that the SNP team at NCBI doesn't >> provide permanent URLs for their past builds. And it doesn't help that >> the ds_flat_ch*.flat files they provide don't contain any information >> about the build that they're coming from. >> >> Anyway, in the future I'll put the Build information in the DESCRIPTION >> file of the SNPlocs packages. >> >> One last note. According to the SNP team at NCBI "Human SNPs in Build 129 >> are mapping to NCBI build 36.3". That is, to our >> BSgenome.Hsapiens.UCSC.hg18 >> package. According to UCSC, hg18 is NCBI Build 36.1 but NCBI Build >> 36.1 and >> NCBI Build 36.3 are identical from a *sequence* point of view (I think >> what >> makes them different are the annotations provided by NCBI). >> This means that, if you are planning to inject >> SNPlocs.Hsapiens.dbSNP.20080617 >> in a genome, it only makes sense to do it with >> BSgenome.Hsapiens.UCSC.hg18. >> >> In the future we will put in place a mechanism to make this injection >> safer >> i.e. check that the injected stuff and the host are compatible. >> >> Cheers, >> H. >> >> >>> >>> >>> Regards, >>> >>> Lin Tang, Ph.D. >>> >>> Scientist , Informatics | Sequenom Inc. >>> >>> T: 1 858 202 9106 | F: 1 858 202 9084 | E: ltang at sequenom.com >>> >>> >>> >>> >>> >>> THIS EMAIL MESSAGE IS FOR THE SOLE USE OF THE INTENDED RECIPIENT(S) >>> AND MAY CONTAIN CONFIDENTIAL INFORMATION. ANY UNAUTHORIZED REVIEW, >>> USE, DISCLOSURE OR DISTRIBUTION IS PROHIBITED. IF YOU ARE NOT THE >>> INTENDED RECIPIENT, PLEASE CONTACT THE SENDER BY REPLY EMAIL AND >>> DESTROY ALL COPIES OF THE ORIGINAL MESSAGE. >>> >> > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY
0
Entering edit mode
Hi SNPlocs users, I've added SNPlocs.Hsapiens.dbSNP.20090506 to the BioC repo (in BioC release only, source tarball only, but that's just for now). It contains the SNP locations and alleles for Homo sapiens extracted from dbSNP BUILD 130 (the latest dbSNP build). From within R-2.9: > library(BSgenome) > available.SNPs() [1] "SNPlocs.Hsapiens.dbSNP.20071016" "SNPlocs.Hsapiens.dbSNP.20080617" [3] "SNPlocs.Hsapiens.dbSNP.20090506" Install with: source("http://bioconductor.org/biocLite.R") biocLite("SNPlocs.Hsapiens.dbSNP.20090506") Then: > library(SNPlocs.Hsapiens.dbSNP.20090506) > ?SNPlocs.Hsapiens.dbSNP.20090506 # now there is a man page! > getSNPcount() chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 920233 933616 789121 798603 706109 760249 655873 612367 496064 583240 577300 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 558759 427010 365742 331501 354239 316396 322866 268235 323041 160580 187392 chrX chrY 391414 6539 Overall, that's 10% more SNPs than in the previous build (BUILD 129). Note that, like with the previous builds, there are still different RefSNP IDs that are mapped to the same location: > chr1_snps <- getSNPlocs("chr1") > sum(duplicated(chr1_snps$loc)) [1] 950 Twice more than with BUILD 129! > which(duplicated(chr1_snps$loc))[1:10] [1] 3142 3365 7835 8161 8327 10638 12113 14060 14640 15538 > chr1_snps[chr1_snps$loc == chr1_snps$loc[3142], ] RefSNP_id alleles_as_ambig loc 3141 3766175 D 1476802 3142 59009700 W 1476802 Please let me know if you find any problem with this new package. Cheers, H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY
0
Entering edit mode
genefan • 0
@genefan-8382
Last seen 9.3 years ago
Germany

Hi Herve, 

I was trying to install the package SNPlocs.Hsapiens.dbSNP.20080617 using Bioconductor version 3.1 (R version 3.2.0). However, It is not available for the new R version. I'd like to ask you 1.) Is there other solution except using old R version? 2.) Will Bioconductor always exclude the old dbSNP database if R version is updated?  Thanks a lot in advance.

 

Best wishes,

Genefan 

ADD COMMENT

Login before adding your answer.

Traffic: 560 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6