Question: Problem locating SNP by rsID for SNPlocs.Hsapiens.dbSNP.20120608 package Bioconductor x
0
gravatar for Christina Chaivorapol
6.9 years ago by
Christina Chaivorapol40 wrote:
Hi, Has anyone ever had a case where a SNP was not found in SNPlocs.Hsapiens.dbSNP. 20120608, but is found in dbSNP 137? I am having this problem with SNP rs7775397. > library(SNPlocs.Hsapiens.dbSNP.20120608) > rsidsToGRanges('rs7775397') Error in .snpid2rowidx(x, snpid) : SNP id(s) not found: 7775397 Thanks, Christina > sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] datasets utils grDevices graphics stats methods base other attached packages: [1] SNPlocs.Hsapiens.dbSNP. 20120608_0.99.8 [2] BSgenome_1.26.1 [3] Biostrings_2.26.2 [4] GenomicRanges_1.10.5 [5] IRanges_1.16.4 [6] BiocGenerics_0.4.0 loaded via a namespace (and not attached): [1] parallel_2.15.2 stats4_2.15.2 -- Christina Chaivorapol, Ph.D. Genentech, Inc. Bioinformatics & Computational Biology [[alternative HTML version deleted]]
snp snplocs • 1.1k views
ADD COMMENTlink modified 6.9 years ago by Hervé Pagès ♦♦ 14k • written 6.9 years ago by Christina Chaivorapol40
Answer: Problem locating SNP by rsID for SNPlocs.Hsapiens.dbSNP.20120608 package Biocond
0
gravatar for Hervé Pagès
6.9 years ago by
Hervé Pagès ♦♦ 14k
United States
Hervé Pagès ♦♦ 14k wrote:
Hi Christina, According to the official announcement: http://www.ncbi.nlm.nih.gov/mailman/pipermail/dbsnp- announce/2012q2/000122.html there are 53,558,214 rs ids in dbSNP 137 for Human. But in SNPlocs.Hsapiens.dbSNP.20120608: > library(SNPlocs.Hsapiens.dbSNP.20120608) > sum(getSNPcount()) [1] 45416711 As explained in ?SNPlocs.Hsapiens.dbSNP.20120608, the package (like all other SNPlocs packages) was curated: SNPs from dbSNP were filtered to keep only those satisfying the 3 following criteria: ? The SNP is a single-base substitution i.e. its type is "snp". Other types used by dbSNP are: "in-del", "mixed", "microsatellite", "named-locus", "multinucleotide-polymorphism", etc... All those SNPs were dropped. ? The SNP is marked as notwithdrawn. ? A *single* location on the reference genome (GRCh37.p5) is reported for the SNP, and this location is on chromosomes 1-22, X, Y, MT. In the case of rs7775397, it was dropped because of this last reason. More precisely, the record in ds_flat_ch6.flat for this SNP contains the following CTG lines: CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=32261252 | NT_007592.15 | ctg-start=32201252 | ctg-end=32201252 | loctype=2 | orient=+ CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_113891.2 | ctg-start=3732030 | ctg-end=3732030 | loctype=2 | orient=+ CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167245.1 | ctg-start=3540499 | ctg-end=3540499 | loctype=2 | orient=+ CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167246.1 | ctg-start=3604088 | ctg-end=3604088 | loctype=2 | orient=+ CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167248.1 | ctg-start=3522471 | ctg-end=3522471 | loctype=2 | orient=+ CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167249.1 | ctg-start=3609047 | ctg-end=3609047 | loctype=2 | orient=+ That is, more than 1 CTG line corresponding to the reference assembly (GRCh37.p5). This is the reason why the SNP was dropped. I realize now that maybe I could keep those SNPs that have more than 1 CTG line corresponding to the reference assembly as long as exactly 1 of them actually provides a value for the chr-pos field. Would that be reasonable? Thanks, H. On 01/15/2013 05:19 PM, Christina Chaivorapol wrote: > Hi, > > Has anyone ever had a case where a SNP was not found in > SNPlocs.Hsapiens.dbSNP. > 20120608, but is found in dbSNP 137? I am having this problem with SNP > rs7775397. > >> library(SNPlocs.Hsapiens.dbSNP.20120608) >> rsidsToGRanges('rs7775397') > Error in .snpid2rowidx(x, snpid) : SNP id(s) not found: 7775397 > > Thanks, > Christina > >> sessionInfo() > R version 2.15.2 (2012-10-26) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] datasets utils grDevices graphics stats methods base > > other attached packages: > [1] SNPlocs.Hsapiens.dbSNP. > 20120608_0.99.8 > [2] BSgenome_1.26.1 > [3] Biostrings_2.26.2 > [4] GenomicRanges_1.10.5 > [5] IRanges_1.16.4 > [6] BiocGenerics_0.4.0 > > loaded via a namespace (and not attached): > [1] parallel_2.15.2 stats4_2.15.2 > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD COMMENTlink written 6.9 years ago by Hervé Pagès ♦♦ 14k
Thanks for your help Tim and Herve. It would be very useful to include the SNPs that have a value for the chr-pos field even if they have more than 1 CTG line for my purposes since I deal with a lot of immune-related genes that tend to be difficult to map. Would it be possible to include these types of SNPs, but flag them as having more than 1 CTG line? Thanks for your help, Christina On Tue, Jan 15, 2013 at 11:00 PM, Hervé Pagès <hpages@fhcrc.org> wrote: > Hi Christina, > > According to the official announcement: > > > http://www.ncbi.nlm.nih.gov/**mailman/pipermail/dbsnp-** > announce/2012q2/000122.html<http: www.ncbi.nlm.nih.gov="" mailman="" pipe="" rmail="" dbsnp-announce="" 2012q2="" 000122.html=""> > > there are 53,558,214 rs ids in dbSNP 137 for Human. > > But in SNPlocs.Hsapiens.dbSNP.**20120608: > > > library(SNPlocs.Hsapiens.**dbSNP.20120608) > > sum(getSNPcount()) > [1] 45416711 > > As explained in ?SNPlocs.Hsapiens.dbSNP.**20120608, the package (like > all other SNPlocs packages) was curated: > > SNPs from dbSNP were filtered to keep only those satisfying the 3 > following criteria: > > • The SNP is a single-base substitution i.e. its type is "snp". > Other types used by dbSNP are: "in-del", "mixed", > "microsatellite", "named-locus", > "multinucleotide-polymorphism"**, etc... All those SNPs were > dropped. > > • The SNP is marked as notwithdrawn. > > • A *single* location on the reference genome (GRCh37.p5) is > reported for the SNP, and this location is on chromosomes > 1-22, X, Y, MT. > > In the case of rs7775397, it was dropped because of this last reason. > More precisely, the record in ds_flat_ch6.flat for this SNP contains > the following CTG lines: > > CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=32261252 | NT_007592.15 | > ctg-start=32201252 | ctg-end=32201252 | loctype=2 | orient=+ > CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_113891.2 | > ctg-start=3732030 | ctg-end=3732030 | loctype=2 | orient=+ > CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167245.1 | > ctg-start=3540499 | ctg-end=3540499 | loctype=2 | orient=+ > CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167246.1 | > ctg-start=3604088 | ctg-end=3604088 | loctype=2 | orient=+ > CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167248.1 | > ctg-start=3522471 | ctg-end=3522471 | loctype=2 | orient=+ > CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167249.1 | > ctg-start=3609047 | ctg-end=3609047 | loctype=2 | orient=+ > > That is, more than 1 CTG line corresponding to the reference assembly > (GRCh37.p5). This is the reason why the SNP was dropped. > > I realize now that maybe I could keep those SNPs that have more than > 1 CTG line corresponding to the reference assembly as long as exactly > 1 of them actually provides a value for the chr-pos field. Would that > be reasonable? > > Thanks, > H. > > > > On 01/15/2013 05:19 PM, Christina Chaivorapol wrote: > >> Hi, >> >> Has anyone ever had a case where a SNP was not found in >> SNPlocs.Hsapiens.dbSNP. >> 20120608, but is found in dbSNP 137? I am having this problem with SNP >> rs7775397. >> >> library(SNPlocs.Hsapiens.**dbSNP.20120608) >>> rsidsToGRanges('rs7775397') >>> >> Error in .snpid2rowidx(x, snpid) : SNP id(s) not found: 7775397 >> >> Thanks, >> Christina >> >> sessionInfo() >>> >> R version 2.15.2 (2012-10-26) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] datasets utils grDevices graphics stats methods base >> >> other attached packages: >> [1] SNPlocs.Hsapiens.dbSNP. >> 20120608_0.99.8 >> [2] BSgenome_1.26.1 >> [3] Biostrings_2.26.2 >> [4] GenomicRanges_1.10.5 >> [5] IRanges_1.16.4 >> [6] BiocGenerics_0.4.0 >> >> loaded via a namespace (and not attached): >> [1] parallel_2.15.2 stats4_2.15.2 >> >> > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > -- Christina Chaivorapol, Ph.D. Genentech, Inc. Bioinformatics & Computational Biology phone: 650-225-6903 chrichai@gene.com [[alternative HTML version deleted]]
ADD REPLYlink written 6.9 years ago by Christina Chaivorapol40
Hi Christina, On 01/16/2013 10:15 AM, Christina Chaivorapol wrote: > Thanks for your help Tim and Herve. > > It would be very useful to include the SNPs that have a value for the > chr-pos field even if they have more than 1 CTG line for my purposes > since I deal with a lot of immune-related genes that tend to be > difficult to map. Would it be possible to include these types of SNPs, > but flag them as having more than 1 CTG line? So I've included them in version 0.99.9 of SNPlocs.Hsapiens.dbSNP.20120608. They're not flagged though. Note that there still is a *single* location on the reference genome that is reported for those SNPs,, because the other "locations" are reported as ? (question mark) and it seems fair to not consider ? as a location. With this new version of the package: > library(SNPlocs.Hsapiens.dbSNP.20120608) > sum(getSNPcount()) [1] 45697775 that is, 281064 more SNPs (i.e. 0.6%) compared to the previous version (i.e. 0.99.8). rs7775397 is one of them now: > rsidsToGRanges("rs7775397") GRanges with 1 range and 2 metadata columns: seqnames ranges strand | RefSNP_id alleles_as_ambig <rle> <iranges> <rle> | <character> <character> [1] ch6 [32261252, 32261252] + | 7775397 K --- seqlengths: ch1 ch2 ch3 ch4 ... chX chY chMT 249250621 243199373 198022430 191154276 ... 155270560 59373566 16569 SNPlocs.Hsapiens.dbSNP.20120608 version 0.99.9 will be available in Bioc devel (requires devel version of R i.e. R 3.0) thru biocLite() in about 45 min. Only the source package for now, which you should be able to install on Windows or Mac with biocLite( , type="source"). Let me know if you have questions about this. Cheers, H. > > Thanks for your help, > Christina > > > On Tue, Jan 15, 2013 at 11:00 PM, Hervé Pagès <hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">> wrote: > > Hi Christina, > > According to the official announcement: > > > http://www.ncbi.nlm.nih.gov/__mailman/pipermail/dbsnp- __announce/2012q2/000122.html > <http: www.ncbi.nlm.nih.gov="" mailman="" pipermail="" dbsnp-="" announce="" 2012q2="" 000122.html=""> > > there are 53,558,214 rs ids in dbSNP 137 for Human. > > But in SNPlocs.Hsapiens.dbSNP.__20120608: > > > library(SNPlocs.Hsapiens.__dbSNP.20120608) > > sum(getSNPcount()) > [1] 45416711 > > As explained in ?SNPlocs.Hsapiens.dbSNP.__20120608, the package (like > all other SNPlocs packages) was curated: > > SNPs from dbSNP were filtered to keep only those satisfying the 3 > following criteria: > > ? The SNP is a single-base substitution i.e. its type is "snp". > Other types used by dbSNP are: "in-del", "mixed", > "microsatellite", "named-locus", > "multinucleotide-polymorphism"__, etc... All those SNPs were > dropped. > > ? The SNP is marked as notwithdrawn. > > ? A *single* location on the reference genome (GRCh37.p5) is > reported for the SNP, and this location is on chromosomes > 1-22, X, Y, MT. > > In the case of rs7775397, it was dropped because of this last reason. > More precisely, the record in ds_flat_ch6.flat for this SNP contains > the following CTG lines: > > CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=32261252 | NT_007592.15 | > ctg-start=32201252 | ctg-end=32201252 | loctype=2 | orient=+ > CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_113891.2 | > ctg-start=3732030 | ctg-end=3732030 | loctype=2 | orient=+ > CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167245.1 | > ctg-start=3540499 | ctg-end=3540499 | loctype=2 | orient=+ > CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167246.1 | > ctg-start=3604088 | ctg-end=3604088 | loctype=2 | orient=+ > CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167248.1 | > ctg-start=3522471 | ctg-end=3522471 | loctype=2 | orient=+ > CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167249.1 | > ctg-start=3609047 | ctg-end=3609047 | loctype=2 | orient=+ > > That is, more than 1 CTG line corresponding to the reference assembly > (GRCh37.p5). This is the reason why the SNP was dropped. > > I realize now that maybe I could keep those SNPs that have more than > 1 CTG line corresponding to the reference assembly as long as exactly > 1 of them actually provides a value for the chr-pos field. Would that > be reasonable? > > Thanks, > H. > > > > On 01/15/2013 05:19 PM, Christina Chaivorapol wrote: > > Hi, > > Has anyone ever had a case where a SNP was not found in > SNPlocs.Hsapiens.dbSNP. > 20120608, but is found in dbSNP 137? I am having this problem > with SNP > rs7775397. > > library(SNPlocs.Hsapiens.__dbSNP.20120608) > rsidsToGRanges('rs7775397') > > Error in .snpid2rowidx(x, snpid) : SNP id(s) not found: 7775397 > > Thanks, > Christina > > sessionInfo() > > R version 2.15.2 (2012-10-26) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] datasets utils grDevices graphics stats methods base > > other attached packages: > [1] SNPlocs.Hsapiens.dbSNP. > 20120608_0.99.8 > [2] BSgenome_1.26.1 > [3] Biostrings_2.26.2 > [4] GenomicRanges_1.10.5 > [5] IRanges_1.16.4 > [6] BiocGenerics_0.4.0 > > loaded via a namespace (and not attached): > [1] parallel_2.15.2 stats4_2.15.2 > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org=""> > Phone: (206) 667-5791 <tel:%28206%29%20667-5791> > Fax: (206) 667-1319 <tel:%28206%29%20667-1319> > > > > > -- > Christina Chaivorapol, Ph.D. > Genentech, Inc. > Bioinformatics & Computational Biology > phone: 650-225-6903 > chrichai at gene.com <mailto:chrichai at="" gene.com=""> -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLYlink written 6.9 years ago by Hervé Pagès ♦♦ 14k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 176 users visited in the last hour