pulling functional information for SNPs
1
0
Entering edit mode
Kay Jaja ▴ 90
@kay-jaja-3481
Last seen 9.7 years ago
Hi , I have a list of SNPS (rs numbers ) and I am interested in pulling the functional data corresponding to each SNP from a data base like ensemble, i.e.( is the gene name if the snp i sin a gene, intron, exon, non_ synonymous snp, or synonymous snp). is it possible to do this in R using BioMart or any other packages? I appreciate your help, thanks [[alternative HTML version deleted]]
SNP biomaRt SNP biomaRt • 1.3k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 3 days ago
United States
Hi Kay, Kay Jaja wrote: > Hi , > > I have a list of SNPS (rs numbers ) and I am interested in pulling the functional data corresponding to each SNP from a data base like ensemble, i.e.( is the gene name if the snp i sin a gene, intron, exon, non_ synonymous snp, or synonymous snp). > is it possible to do this in R using BioMart or any other packages? Do you mean to ask if it is possible, or is it easy? It is certainly possible, although it depends on exactly what you want. Your question is not as complete as it could be. In the future, you should try to explain exactly what you are trying to do rather than asking open-ended questions. You can get information about SNPs using biomaRt, but the available information looks pretty sparse to me when compared to the small list of interests you seem to have. But you can look to see what is available easily enough: library(biomaRt) mart <- useMart("snp","hsapiens_snp") listAttributes(mart) There are one or two vignettes that come with biomaRt that should help you get started if you like what you see. I generally don't use biomaRt for this sort of thing, instead preferring to hit the UCSC database directly. Note that what I show below might be done as easily using the rtracklayer package; you might explore the vignettes for that package as well. Anyway, I would use the RMySQL package and query directly: library(RMySQL) con <- dbConnect("MySQL", host = "genome-mysql.cse.ucsc.edu", dbname = "hg18", user = "genome") ## what type of info is available? > dbGetQuery(con, "select * from snp129 where name='rs25';") bin chrom chromStart chromEnd name score strand refNCBI refUCSC observed 1 673 chr7 11550666 11550667 rs25 0 - T T A/G molType class valid avHet avHetSE func 1 genomic single by-cluster,by-frequency,by-hapmap 0.499586 0.014383 intron locType weight 1 exact 1 Note two things here. First, you don't know the return order, so you should always ask for the database to return what you are querying on (this is true of biomaRt as well). Second, if you are querying lots of SNPs, just do it in one big query instead of one by one. Repeatedly querying an online database will get you banned. So say your rs IDs are in a vector rsid, and you want the chromosome, the position, the bases, and the function (intron, exon, intragenic, etc). sql <- paste("select name, chrom, chromEnd, observed, func from snp129 where name in ('", paste(rsid, collapse = "','"), "');", sep = "") there are a lot of ' and " in there, because we want something that looks like this: select name, chrom, chromEnd, observed, func from snp129 where name in ('rs25','rs26','rs27','rs28'); so you want to make sure the sql statement looks OK first. Then just do dat <- dbGetQuery(con, sql) > rsid <- c("rs25","rs26","rs27","rs28") > rsid [1] "rs25" "rs26" "rs27" "rs28" > sql <- paste("select name, chrom, chromEnd, observed, func from snp129 where name in ('", paste(rsid, collapse = "','"), "');", sep = "") > sql [1] "select name, chrom, chromEnd, observed, func from snp129 where name in ('rs25','rs26','rs27','rs28');" > z <- dbGetQuery(con, sql) > z name chrom chromEnd observed func 1 rs25 chr7 11550667 A/G intron 2 rs26 chr7 11549996 -/A/G intron 3 rs27 chr7 11549750 C/G intron 4 rs28 chr7 11562590 A/G intron Best, Jim > > I appreciate your help, > thanks > > > > [[alternative HTML version deleted]] > > > > -------------------------------------------------------------------- ---- > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
ADD COMMENT

Login before adding your answer.

Traffic: 622 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6