SNP chip
1
0
Entering edit mode
@adrian-johnson-2728
Last seen 4.0 years ago
Hi: I have a snp array 6 (affymetrix) data table that looks like the following: SNPID Call SNP_A-8373748 BB SNP_A-2210818 AB SNP_A-4290346 BB SNP_A-2219708 AA I want to be able to convert this data into a .BED file like format that will look like following: Chromosome Position Reference Base Call chr19 2094894 A T chr19 2095300 G A Is it possible through bioconductor? Thanks for your time. -Adrian
SNP convert SNP convert • 1.2k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 hour ago
United States
Hi Adrian, On 7/8/2010 12:59 PM, Adrian Johnson wrote: > Hi: > > > I have a snp array 6 (affymetrix) data table that looks like the following: > > > SNPID Call > SNP_A-8373748 BB > SNP_A-2210818 AB > SNP_A-4290346 BB > SNP_A-2219708 AA > > > I want to be able to convert this data into a .BED file like format > that will look like following: Well, doing what you want will be difficult, as I don't know how you are going from genotype to 'Reference Base' and 'Call', nor do I really know what you mean by either of those in this situation. Anyway, that's a YP, not an MP (for you Boogie Nights fans out there ;-D). Here is I think what you need to know to get close to what you are trying to do. > library(pd.genomewidesnp.6) > con <- db(pd.genomewidesnp.6) ## fake up some IDs > ids <- dbGetQuery(con, "select man_fsetid from featureSet limit 20;") > ids man_fsetid 1 SNP_A-2131660 2 SNP_A-1967418 3 SNP_A-1969580 4 SNP_A-4263484 5 SNP_A-1978185 6 SNP_A-4264431 7 SNP_A-1980898 8 SNP_A-1983139 9 SNP_A-4265735 10 SNP_A-1995832 11 SNP_A-1995893 12 SNP_A-1997689 13 SNP_A-1997709 14 SNP_A-1997896 15 SNP_A-1997922 16 SNP_A-2000230 17 SNP_A-2000332 18 SNP_A-2000337 19 SNP_A-2000342 20 SNP_A-4268173 ## now a simple SQL query > dbGetQuery(con, paste("select chrom, physical_pos, allele_a, allele_b from featureSet where man_fsetid in ('", paste(ids[,1], collapse="','"), "');", sep = "")) chrom physical_pos allele_a allele_b 1 1 2224111 A G 2 1 2319424 A G 3 1 2926730 C T 4 1 3084986 C G 5 1 3155127 A C 6 1 3695086 C G 7 1 3710825 A G 8 1 3753024 A G 9 1 3753427 A G 10 1 3756100 A G 11 1 3756146 A C 12 1 4240737 A G 13 1 4243294 C G 14 1 4243405 A C 15 1 4243441 C T 16 1 1145994 C T 17 1 2543484 C T 18 1 2941694 C T 19 1 3292731 C T 20 1 4276892 C T If you don't know any SQL, note that there are a mixture of " and ' in that paste statement, as we want to end up with a query that looks like this: "select chrom, physical_pos, allele_a, allele_b from featureSet where man_fsetid in ('SNP_A-2131660','SNP_A-1967418','SNP_A-1969580','SNP_A-4263484','SNP_ A-1978185','SNP_A-4264431','SNP_A-1980898','SNP_A-1983139','SNP_A-4265 735','SNP_A-1995832','SNP_A-1995893','SNP_A-1997689','SNP_A-1997709',' SNP_A-1997896','SNP_A-1997922','SNP_A-2000230','SNP_A-2000332','SNP_A- 2000337','SNP_A-2000342','SNP_A-4268173');" Best, Jim > > Chromosome Position Reference Base Call > chr19 2094894 A T > chr19 2095300 G A > > > Is it possible through bioconductor? Thanks for your time. > > -Adrian > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
ADD COMMENT
0
Entering edit mode
Hi Jim, the idea is to compare the snp chip data to SNP calls derived from sequencing data. For the same samples, I have both SNP chip and sequencing data. To estimate the sensitivity and specificity, I want to compare the SNP chip data to sequencing data. >From sequencing data, I have a pileup-format file in a BED format: Chromosome From To Reference Base SNP call I have no experience working with SNP chip data. All I have is the data that looks like SNPID Call SNP_A-8373748 BB SNP_A-2210818 AB SNP_A-4290346 BB SNP_A-2219708 AA There are many questions: 1. I need to know where the SNP ID is Chromosome and position. From this I can deduce from what base it is on genome. 2. Convert what BB,AB and AA mean. Thanks Adrian On Thu, Jul 8, 2010 at 4:41 PM, James W. MacDonald <jmacdon at="" med.umich.edu=""> wrote: > Hi Adrian, > > On 7/8/2010 12:59 PM, Adrian Johnson wrote: >> >> Hi: >> >> >> I have a snp array 6 (affymetrix) data table that looks like the >> following: >> >> >> SNPID ? ? ? ? ? ? ? ? ? Call >> SNP_A-8373748 ? BB >> SNP_A-2210818 ? AB >> SNP_A-4290346 ? BB >> SNP_A-2219708 ? AA >> >> >> I want to be able to convert this data into a .BED file like format >> that will look like following: > > Well, doing what you want will be difficult, as I don't know how you are > going from genotype to 'Reference Base' and 'Call', nor do I really know > what you mean by either of those in this situation. Anyway, that's a YP, not > an MP (for you Boogie Nights fans out there ;-D). > > Here is I think what you need to know to get close to what you are trying to > do. > >> library(pd.genomewidesnp.6) >> con <- db(pd.genomewidesnp.6) > ## fake up some IDs >> ids <- dbGetQuery(con, "select man_fsetid from featureSet limit 20;") >> ids > ? ? ?man_fsetid > 1 ?SNP_A-2131660 > 2 ?SNP_A-1967418 > 3 ?SNP_A-1969580 > 4 ?SNP_A-4263484 > 5 ?SNP_A-1978185 > 6 ?SNP_A-4264431 > 7 ?SNP_A-1980898 > 8 ?SNP_A-1983139 > 9 ?SNP_A-4265735 > 10 SNP_A-1995832 > 11 SNP_A-1995893 > 12 SNP_A-1997689 > 13 SNP_A-1997709 > 14 SNP_A-1997896 > 15 SNP_A-1997922 > 16 SNP_A-2000230 > 17 SNP_A-2000332 > 18 SNP_A-2000337 > 19 SNP_A-2000342 > 20 SNP_A-4268173 > > ## now a simple SQL query > >> dbGetQuery(con, paste("select chrom, physical_pos, allele_a, allele_b from >> featureSet where man_fsetid in ('", paste(ids[,1], collapse="','"), "');", >> sep = "")) > ? chrom physical_pos allele_a allele_b > 1 ? ? ?1 ? ? ?2224111 ? ? ? ?A ? ? ? ?G > 2 ? ? ?1 ? ? ?2319424 ? ? ? ?A ? ? ? ?G > 3 ? ? ?1 ? ? ?2926730 ? ? ? ?C ? ? ? ?T > 4 ? ? ?1 ? ? ?3084986 ? ? ? ?C ? ? ? ?G > 5 ? ? ?1 ? ? ?3155127 ? ? ? ?A ? ? ? ?C > 6 ? ? ?1 ? ? ?3695086 ? ? ? ?C ? ? ? ?G > 7 ? ? ?1 ? ? ?3710825 ? ? ? ?A ? ? ? ?G > 8 ? ? ?1 ? ? ?3753024 ? ? ? ?A ? ? ? ?G > 9 ? ? ?1 ? ? ?3753427 ? ? ? ?A ? ? ? ?G > 10 ? ? 1 ? ? ?3756100 ? ? ? ?A ? ? ? ?G > 11 ? ? 1 ? ? ?3756146 ? ? ? ?A ? ? ? ?C > 12 ? ? 1 ? ? ?4240737 ? ? ? ?A ? ? ? ?G > 13 ? ? 1 ? ? ?4243294 ? ? ? ?C ? ? ? ?G > 14 ? ? 1 ? ? ?4243405 ? ? ? ?A ? ? ? ?C > 15 ? ? 1 ? ? ?4243441 ? ? ? ?C ? ? ? ?T > 16 ? ? 1 ? ? ?1145994 ? ? ? ?C ? ? ? ?T > 17 ? ? 1 ? ? ?2543484 ? ? ? ?C ? ? ? ?T > 18 ? ? 1 ? ? ?2941694 ? ? ? ?C ? ? ? ?T > 19 ? ? 1 ? ? ?3292731 ? ? ? ?C ? ? ? ?T > 20 ? ? 1 ? ? ?4276892 ? ? ? ?C ? ? ? ?T > > If you don't know any SQL, note that there are a mixture of " and ' in that > paste statement, as we want to end up with a query that looks like this: > > "select chrom, physical_pos, allele_a, allele_b from featureSet where > man_fsetid in > ('SNP_A-2131660','SNP_A-1967418','SNP_A-1969580','SNP_A-4263484','SN P_A-1978185','SNP_A-4264431','SNP_A-1980898','SNP_A-1983139','SNP_A-42 65735','SNP_A-1995832','SNP_A-1995893','SNP_A-1997689','SNP_A-1997709' ,'SNP_A-1997896','SNP_A-1997922','SNP_A-2000230','SNP_A-2000332','SNP_ A-2000337','SNP_A-2000342','SNP_A-4268173');" > > Best, > > Jim > > > >> >> Chromosome ? ? Position ? ? ? ?Reference Base ? ? ? Call >> chr19 ? ? ? ? ? ? ? ?2094894 ? ? ? ? ? ? A ? ? ? ? ? ? ? ? ? ? ? ?T >> chr19 ? ? ? ? ? ? ? ?2095300 ? ? ? ? ? ? G ? ? ? ? ? ? ? ? ? ? ? ?A >> >> >> Is it possible through bioconductor? ?Thanks for your time. >> >> -Adrian >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > University of Michigan > Department of Human Genetics > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 > ********************************************************** > Electronic Mail is not secure, may not be read every day, and should not be > used for urgent or sensitive issues >
ADD REPLY

Login before adding your answer.

Traffic: 930 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6