Problem locating SNP by rsID for SNPlocs.Hsapiens.dbSNP.20120608 package Bioconductor x
1
0
Entering edit mode
@christina-chaivorapol-5712
Last seen 10.2 years ago
Hi, Has anyone ever had a case where a SNP was not found in SNPlocs.Hsapiens.dbSNP. 20120608, but is found in dbSNP 137? I am having this problem with SNP rs7775397. > library(SNPlocs.Hsapiens.dbSNP.20120608) > rsidsToGRanges('rs7775397') Error in .snpid2rowidx(x, snpid) : SNP id(s) not found: 7775397 Thanks, Christina > sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] datasets utils grDevices graphics stats methods base other attached packages: [1] SNPlocs.Hsapiens.dbSNP. 20120608_0.99.8 [2] BSgenome_1.26.1 [3] Biostrings_2.26.2 [4] GenomicRanges_1.10.5 [5] IRanges_1.16.4 [6] BiocGenerics_0.4.0 loaded via a namespace (and not attached): [1] parallel_2.15.2 stats4_2.15.2 -- Christina Chaivorapol, Ph.D. Genentech, Inc. Bioinformatics & Computational Biology [[alternative HTML version deleted]]
SNP SNPlocs SNP SNPlocs • 2.0k views
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 20 hours ago
Seattle, WA, United States
Hi Christina, According to the official announcement: http://www.ncbi.nlm.nih.gov/mailman/pipermail/dbsnp- announce/2012q2/000122.html there are 53,558,214 rs ids in dbSNP 137 for Human. But in SNPlocs.Hsapiens.dbSNP.20120608: > library(SNPlocs.Hsapiens.dbSNP.20120608) > sum(getSNPcount()) [1] 45416711 As explained in ?SNPlocs.Hsapiens.dbSNP.20120608, the package (like all other SNPlocs packages) was curated: SNPs from dbSNP were filtered to keep only those satisfying the 3 following criteria: ? The SNP is a single-base substitution i.e. its type is "snp". Other types used by dbSNP are: "in-del", "mixed", "microsatellite", "named-locus", "multinucleotide-polymorphism", etc... All those SNPs were dropped. ? The SNP is marked as notwithdrawn. ? A *single* location on the reference genome (GRCh37.p5) is reported for the SNP, and this location is on chromosomes 1-22, X, Y, MT. In the case of rs7775397, it was dropped because of this last reason. More precisely, the record in ds_flat_ch6.flat for this SNP contains the following CTG lines: CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=32261252 | NT_007592.15 | ctg-start=32201252 | ctg-end=32201252 | loctype=2 | orient=+ CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_113891.2 | ctg-start=3732030 | ctg-end=3732030 | loctype=2 | orient=+ CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167245.1 | ctg-start=3540499 | ctg-end=3540499 | loctype=2 | orient=+ CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167246.1 | ctg-start=3604088 | ctg-end=3604088 | loctype=2 | orient=+ CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167248.1 | ctg-start=3522471 | ctg-end=3522471 | loctype=2 | orient=+ CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167249.1 | ctg-start=3609047 | ctg-end=3609047 | loctype=2 | orient=+ That is, more than 1 CTG line corresponding to the reference assembly (GRCh37.p5). This is the reason why the SNP was dropped. I realize now that maybe I could keep those SNPs that have more than 1 CTG line corresponding to the reference assembly as long as exactly 1 of them actually provides a value for the chr-pos field. Would that be reasonable? Thanks, H. On 01/15/2013 05:19 PM, Christina Chaivorapol wrote: > Hi, > > Has anyone ever had a case where a SNP was not found in > SNPlocs.Hsapiens.dbSNP. > 20120608, but is found in dbSNP 137? I am having this problem with SNP > rs7775397. > >> library(SNPlocs.Hsapiens.dbSNP.20120608) >> rsidsToGRanges('rs7775397') > Error in .snpid2rowidx(x, snpid) : SNP id(s) not found: 7775397 > > Thanks, > Christina > >> sessionInfo() > R version 2.15.2 (2012-10-26) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] datasets utils grDevices graphics stats methods base > > other attached packages: > [1] SNPlocs.Hsapiens.dbSNP. > 20120608_0.99.8 > [2] BSgenome_1.26.1 > [3] Biostrings_2.26.2 > [4] GenomicRanges_1.10.5 > [5] IRanges_1.16.4 > [6] BiocGenerics_0.4.0 > > loaded via a namespace (and not attached): > [1] parallel_2.15.2 stats4_2.15.2 > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD COMMENT
0
Entering edit mode
Thanks for your help Tim and Herve. It would be very useful to include the SNPs that have a value for the chr-pos field even if they have more than 1 CTG line for my purposes since I deal with a lot of immune-related genes that tend to be difficult to map. Would it be possible to include these types of SNPs, but flag them as having more than 1 CTG line? Thanks for your help, Christina On Tue, Jan 15, 2013 at 11:00 PM, Hervé Pagès <hpages@fhcrc.org> wrote: > Hi Christina, > > According to the official announcement: > > > http://www.ncbi.nlm.nih.gov/**mailman/pipermail/dbsnp-** > announce/2012q2/000122.html<http: www.ncbi.nlm.nih.gov="" mailman="" pipe="" rmail="" dbsnp-announce="" 2012q2="" 000122.html=""> > > there are 53,558,214 rs ids in dbSNP 137 for Human. > > But in SNPlocs.Hsapiens.dbSNP.**20120608: > > > library(SNPlocs.Hsapiens.**dbSNP.20120608) > > sum(getSNPcount()) > [1] 45416711 > > As explained in ?SNPlocs.Hsapiens.dbSNP.**20120608, the package (like > all other SNPlocs packages) was curated: > > SNPs from dbSNP were filtered to keep only those satisfying the 3 > following criteria: > > • The SNP is a single-base substitution i.e. its type is "snp". > Other types used by dbSNP are: "in-del", "mixed", > "microsatellite", "named-locus", > "multinucleotide-polymorphism"**, etc... All those SNPs were > dropped. > > • The SNP is marked as notwithdrawn. > > • A *single* location on the reference genome (GRCh37.p5) is > reported for the SNP, and this location is on chromosomes > 1-22, X, Y, MT. > > In the case of rs7775397, it was dropped because of this last reason. > More precisely, the record in ds_flat_ch6.flat for this SNP contains > the following CTG lines: > > CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=32261252 | NT_007592.15 | > ctg-start=32201252 | ctg-end=32201252 | loctype=2 | orient=+ > CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_113891.2 | > ctg-start=3732030 | ctg-end=3732030 | loctype=2 | orient=+ > CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167245.1 | > ctg-start=3540499 | ctg-end=3540499 | loctype=2 | orient=+ > CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167246.1 | > ctg-start=3604088 | ctg-end=3604088 | loctype=2 | orient=+ > CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167248.1 | > ctg-start=3522471 | ctg-end=3522471 | loctype=2 | orient=+ > CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167249.1 | > ctg-start=3609047 | ctg-end=3609047 | loctype=2 | orient=+ > > That is, more than 1 CTG line corresponding to the reference assembly > (GRCh37.p5). This is the reason why the SNP was dropped. > > I realize now that maybe I could keep those SNPs that have more than > 1 CTG line corresponding to the reference assembly as long as exactly > 1 of them actually provides a value for the chr-pos field. Would that > be reasonable? > > Thanks, > H. > > > > On 01/15/2013 05:19 PM, Christina Chaivorapol wrote: > >> Hi, >> >> Has anyone ever had a case where a SNP was not found in >> SNPlocs.Hsapiens.dbSNP. >> 20120608, but is found in dbSNP 137? I am having this problem with SNP >> rs7775397. >> >> library(SNPlocs.Hsapiens.**dbSNP.20120608) >>> rsidsToGRanges('rs7775397') >>> >> Error in .snpid2rowidx(x, snpid) : SNP id(s) not found: 7775397 >> >> Thanks, >> Christina >> >> sessionInfo() >>> >> R version 2.15.2 (2012-10-26) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] datasets utils grDevices graphics stats methods base >> >> other attached packages: >> [1] SNPlocs.Hsapiens.dbSNP. >> 20120608_0.99.8 >> [2] BSgenome_1.26.1 >> [3] Biostrings_2.26.2 >> [4] GenomicRanges_1.10.5 >> [5] IRanges_1.16.4 >> [6] BiocGenerics_0.4.0 >> >> loaded via a namespace (and not attached): >> [1] parallel_2.15.2 stats4_2.15.2 >> >> > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > -- Christina Chaivorapol, Ph.D. Genentech, Inc. Bioinformatics & Computational Biology phone: 650-225-6903 chrichai@gene.com [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Christina, On 01/16/2013 10:15 AM, Christina Chaivorapol wrote: > Thanks for your help Tim and Herve. > > It would be very useful to include the SNPs that have a value for the > chr-pos field even if they have more than 1 CTG line for my purposes > since I deal with a lot of immune-related genes that tend to be > difficult to map. Would it be possible to include these types of SNPs, > but flag them as having more than 1 CTG line? So I've included them in version 0.99.9 of SNPlocs.Hsapiens.dbSNP.20120608. They're not flagged though. Note that there still is a *single* location on the reference genome that is reported for those SNPs,, because the other "locations" are reported as ? (question mark) and it seems fair to not consider ? as a location. With this new version of the package: > library(SNPlocs.Hsapiens.dbSNP.20120608) > sum(getSNPcount()) [1] 45697775 that is, 281064 more SNPs (i.e. 0.6%) compared to the previous version (i.e. 0.99.8). rs7775397 is one of them now: > rsidsToGRanges("rs7775397") GRanges with 1 range and 2 metadata columns: seqnames ranges strand | RefSNP_id alleles_as_ambig <rle> <iranges> <rle> | <character> <character> [1] ch6 [32261252, 32261252] + | 7775397 K --- seqlengths: ch1 ch2 ch3 ch4 ... chX chY chMT 249250621 243199373 198022430 191154276 ... 155270560 59373566 16569 SNPlocs.Hsapiens.dbSNP.20120608 version 0.99.9 will be available in Bioc devel (requires devel version of R i.e. R 3.0) thru biocLite() in about 45 min. Only the source package for now, which you should be able to install on Windows or Mac with biocLite( , type="source"). Let me know if you have questions about this. Cheers, H. > > Thanks for your help, > Christina > > > On Tue, Jan 15, 2013 at 11:00 PM, Hervé Pagès <hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">> wrote: > > Hi Christina, > > According to the official announcement: > > > http://www.ncbi.nlm.nih.gov/__mailman/pipermail/dbsnp- __announce/2012q2/000122.html > <http: www.ncbi.nlm.nih.gov="" mailman="" pipermail="" dbsnp-="" announce="" 2012q2="" 000122.html=""> > > there are 53,558,214 rs ids in dbSNP 137 for Human. > > But in SNPlocs.Hsapiens.dbSNP.__20120608: > > > library(SNPlocs.Hsapiens.__dbSNP.20120608) > > sum(getSNPcount()) > [1] 45416711 > > As explained in ?SNPlocs.Hsapiens.dbSNP.__20120608, the package (like > all other SNPlocs packages) was curated: > > SNPs from dbSNP were filtered to keep only those satisfying the 3 > following criteria: > > ? The SNP is a single-base substitution i.e. its type is "snp". > Other types used by dbSNP are: "in-del", "mixed", > "microsatellite", "named-locus", > "multinucleotide-polymorphism"__, etc... All those SNPs were > dropped. > > ? The SNP is marked as notwithdrawn. > > ? A *single* location on the reference genome (GRCh37.p5) is > reported for the SNP, and this location is on chromosomes > 1-22, X, Y, MT. > > In the case of rs7775397, it was dropped because of this last reason. > More precisely, the record in ds_flat_ch6.flat for this SNP contains > the following CTG lines: > > CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=32261252 | NT_007592.15 | > ctg-start=32201252 | ctg-end=32201252 | loctype=2 | orient=+ > CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_113891.2 | > ctg-start=3732030 | ctg-end=3732030 | loctype=2 | orient=+ > CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167245.1 | > ctg-start=3540499 | ctg-end=3540499 | loctype=2 | orient=+ > CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167246.1 | > ctg-start=3604088 | ctg-end=3604088 | loctype=2 | orient=+ > CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167248.1 | > ctg-start=3522471 | ctg-end=3522471 | loctype=2 | orient=+ > CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167249.1 | > ctg-start=3609047 | ctg-end=3609047 | loctype=2 | orient=+ > > That is, more than 1 CTG line corresponding to the reference assembly > (GRCh37.p5). This is the reason why the SNP was dropped. > > I realize now that maybe I could keep those SNPs that have more than > 1 CTG line corresponding to the reference assembly as long as exactly > 1 of them actually provides a value for the chr-pos field. Would that > be reasonable? > > Thanks, > H. > > > > On 01/15/2013 05:19 PM, Christina Chaivorapol wrote: > > Hi, > > Has anyone ever had a case where a SNP was not found in > SNPlocs.Hsapiens.dbSNP. > 20120608, but is found in dbSNP 137? I am having this problem > with SNP > rs7775397. > > library(SNPlocs.Hsapiens.__dbSNP.20120608) > rsidsToGRanges('rs7775397') > > Error in .snpid2rowidx(x, snpid) : SNP id(s) not found: 7775397 > > Thanks, > Christina > > sessionInfo() > > R version 2.15.2 (2012-10-26) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] datasets utils grDevices graphics stats methods base > > other attached packages: > [1] SNPlocs.Hsapiens.dbSNP. > 20120608_0.99.8 > [2] BSgenome_1.26.1 > [3] Biostrings_2.26.2 > [4] GenomicRanges_1.10.5 > [5] IRanges_1.16.4 > [6] BiocGenerics_0.4.0 > > loaded via a namespace (and not attached): > [1] parallel_2.15.2 stats4_2.15.2 > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org=""> > Phone: (206) 667-5791 <tel:%28206%29%20667-5791> > Fax: (206) 667-1319 <tel:%28206%29%20667-1319> > > > > > -- > Christina Chaivorapol, Ph.D. > Genentech, Inc. > Bioinformatics & Computational Biology > phone: 650-225-6903 > chrichai at gene.com <mailto:chrichai at="" gene.com=""> -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY

Login before adding your answer.

Traffic: 751 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6