Retrieving SNP rs IDs using biomaRt getBM()
2
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.6 years ago
I have a list of chromosomal positions for which I would like to retrieve SNP rs IDs (if present at these locations). I used the following code to try and get the rs IDs at 2 locations. getBM( attributes=c("refsnp_id","chr_name","chrom_start"), filters=c("chr_name","chrom_start","chrom_end"), values=list(c(19,19), c(45412079,45415640), c(45412079,45415640)), mart) I get back the rs IDs for these 2 locations but also get a list of snps that lie within these 2 positions (a total of 82 SNPs are returned with this query). How do I query the database to return only the rs ids at the 2 specified chromosomal positions? Many thanks Sonia -- output of sessionInfo(): R version 2.11.1 (2010-05-31) x86_64-redhat-linux-gnu locale: [1] LC_CTYPE=en_US.iso885915 LC_NUMERIC=C [3] LC_TIME=en_US.iso885915 LC_COLLATE=en_US.iso885915 [5] LC_MONETARY=C LC_MESSAGES=en_US.iso885915 [7] LC_PAPER=en_US.iso885915 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] biomaRt_2.4.0 loaded via a namespace (and not attached): [1] RCurl_1.91-1 tools_2.11.1 XML_3.9-4 -- Sent via the guest posting facility at bioconductor.org.
SNP SNP • 7.1k views
ADD COMMENT
0
Entering edit mode
@steffen-durinck-4465
Last seen 9.6 years ago
Hi Sonia, The filter combination chromosome name, start position and end position is unique in that it will be interpreted by the BioMart webservice as 'Give me everything on this chromosome between the start and end position'. If you give it multiple values for chromosomes and start and end positions it will not understand what it should do. So only for this particular type of query the user has to be aware to give only one chromosome and one start and stop position. In your case to get only the two positions you would need to do two separate queries, which is not very efficient and definitely not recommendable if you have a lot of positions. So a better strategy is to to query one range e.g. values=list(19, 45412079, 45415640) And then look for the positions you're interested in in the result. Steffen On Wed, Nov 21, 2012 at 10:14 AM, Sonia Shah [guest] <guest@bioconductor.org> wrote: > > I have a list of chromosomal positions for which I would like to retrieve > SNP rs IDs (if present at these locations). I used the following code to > try and get the rs IDs at 2 locations. > > getBM( > attributes=c("refsnp_id","chr_name","chrom_start"), > filters=c("chr_name","chrom_start","chrom_end"), values=list(c(19,19), > c(45412079,45415640), c(45412079,45415640)), mart) > > I get back the rs IDs for these 2 locations but also get a list of snps > that lie within these 2 positions (a total of 82 SNPs are returned with > this query). > > How do I query the database to return only the rs ids at the 2 specified > chromosomal positions? > > Many thanks > Sonia > > -- output of sessionInfo(): > > R version 2.11.1 (2010-05-31) > x86_64-redhat-linux-gnu > > locale: > [1] LC_CTYPE=en_US.iso885915 LC_NUMERIC=C > [3] LC_TIME=en_US.iso885915 LC_COLLATE=en_US.iso885915 > [5] LC_MONETARY=C LC_MESSAGES=en_US.iso885915 > [7] LC_PAPER=en_US.iso885915 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] biomaRt_2.4.0 > > loaded via a namespace (and not attached): > [1] RCurl_1.91-1 tools_2.11.1 XML_3.9-4 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 1 day ago
Seattle, WA, United States
Hi Sonia, If you have human SNPs, an alternative is to use a SNPlocs package: library(SNPlocs.Hsapiens.dbSNP.20120608) ch19_snps <- getSNPlocs("ch19", as.GRanges=TRUE) mypos <- c(45412079, 45415640) idx <- match(mypos, start(ch19_snps)) rsids <- mcols(ch19_snps)$RefSNP_id[idx] This would scale well if you had a lot of positions (e.g. hundreds of thousands) but you need to work 1 chromosome at a time. Note that the rs IDs are stored without the "rs" prefix in the GRanges object returned by getSNPlocs(): > rsids [1] "7412" "445925" Cheers, H. On 11/21/2012 10:14 AM, Sonia Shah [guest] wrote: > > I have a list of chromosomal positions for which I would like to retrieve SNP rs IDs (if present at these locations). I used the following code to try and get the rs IDs at 2 locations. > > getBM( > attributes=c("refsnp_id","chr_name","chrom_start"), > filters=c("chr_name","chrom_start","chrom_end"), values=list(c(19,19), c(45412079,45415640), c(45412079,45415640)), mart) > > I get back the rs IDs for these 2 locations but also get a list of snps that lie within these 2 positions (a total of 82 SNPs are returned with this query). > > How do I query the database to return only the rs ids at the 2 specified chromosomal positions? > > Many thanks > Sonia > > -- output of sessionInfo(): > > R version 2.11.1 (2010-05-31) > x86_64-redhat-linux-gnu > > locale: > [1] LC_CTYPE=en_US.iso885915 LC_NUMERIC=C > [3] LC_TIME=en_US.iso885915 LC_COLLATE=en_US.iso885915 > [5] LC_MONETARY=C LC_MESSAGES=en_US.iso885915 > [7] LC_PAPER=en_US.iso885915 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] biomaRt_2.4.0 > > loaded via a namespace (and not attached): > [1] RCurl_1.91-1 tools_2.11.1 XML_3.9-4 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD COMMENT
0
Entering edit mode
Thanks for the replies. I am looking for something to query 1000's of locations, so will give SNPlocs a try. Cheers, Sonia On 21/11/2012 21:04, Hervé Pagès wrote: > Hi Sonia, > > If you have human SNPs, an alternative is to use a SNPlocs package: > > library(SNPlocs.Hsapiens.dbSNP.20120608) > ch19_snps <- getSNPlocs("ch19", as.GRanges=TRUE) > mypos <- c(45412079, 45415640) > idx <- match(mypos, start(ch19_snps)) > rsids <- mcols(ch19_snps)$RefSNP_id[idx] > > This would scale well if you had a lot of positions (e.g. hundreds of > thousands) but you need to work 1 chromosome at a time. > > Note that the rs IDs are stored without the "rs" prefix in the GRanges > object returned by getSNPlocs(): > > > rsids > [1] "7412" "445925" > > Cheers, > H. > > > On 11/21/2012 10:14 AM, Sonia Shah [guest] wrote: >> >> I have a list of chromosomal positions for which I would like to >> retrieve SNP rs IDs (if present at these locations). I used the >> following code to try and get the rs IDs at 2 locations. >> >> getBM( >> attributes=c("refsnp_id","chr_name","chrom_start"), >> filters=c("chr_name","chrom_start","chrom_end"), >> values=list(c(19,19), c(45412079,45415640), c(45412079,45415640)), mart) >> >> I get back the rs IDs for these 2 locations but also get a list of >> snps that lie within these 2 positions (a total of 82 SNPs are >> returned with this query). >> >> How do I query the database to return only the rs ids at the 2 >> specified chromosomal positions? >> >> Many thanks >> Sonia >> >> -- output of sessionInfo(): >> >> R version 2.11.1 (2010-05-31) >> x86_64-redhat-linux-gnu >> >> locale: >> [1] LC_CTYPE=en_US.iso885915 LC_NUMERIC=C >> [3] LC_TIME=en_US.iso885915 LC_COLLATE=en_US.iso885915 >> [5] LC_MONETARY=C LC_MESSAGES=en_US.iso885915 >> [7] LC_PAPER=en_US.iso885915 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] biomaRt_2.4.0 >> >> loaded via a namespace (and not attached): >> [1] RCurl_1.91-1 tools_2.11.1 XML_3.9-4 >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > -- Sonia Shah UCL Genetics Institute Room 212, Darwin Building Gower Street WC1E 6BT external: +44 (0) 20 7679 2212 +44 (0) 20 7679 4392 internal: 32212/34392
ADD REPLY

Login before adding your answer.

Traffic: 776 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6