Hi Adrian,
On 7/8/2010 12:59 PM, Adrian Johnson wrote:
> Hi:
>
>
> I have a snp array 6 (affymetrix) data table that looks like the
following:
>
>
> SNPID Call
> SNP_A-8373748 BB
> SNP_A-2210818 AB
> SNP_A-4290346 BB
> SNP_A-2219708 AA
>
>
> I want to be able to convert this data into a .BED file like format
> that will look like following:
Well, doing what you want will be difficult, as I don't know how you
are
going from genotype to 'Reference Base' and 'Call', nor do I really
know
what you mean by either of those in this situation. Anyway, that's a
YP,
not an MP (for you Boogie Nights fans out there ;-D).
Here is I think what you need to know to get close to what you are
trying to do.
> library(pd.genomewidesnp.6)
> con <- db(pd.genomewidesnp.6)
## fake up some IDs
> ids <- dbGetQuery(con, "select man_fsetid from featureSet limit
20;")
> ids
man_fsetid
1 SNP_A-2131660
2 SNP_A-1967418
3 SNP_A-1969580
4 SNP_A-4263484
5 SNP_A-1978185
6 SNP_A-4264431
7 SNP_A-1980898
8 SNP_A-1983139
9 SNP_A-4265735
10 SNP_A-1995832
11 SNP_A-1995893
12 SNP_A-1997689
13 SNP_A-1997709
14 SNP_A-1997896
15 SNP_A-1997922
16 SNP_A-2000230
17 SNP_A-2000332
18 SNP_A-2000337
19 SNP_A-2000342
20 SNP_A-4268173
## now a simple SQL query
> dbGetQuery(con, paste("select chrom, physical_pos, allele_a,
allele_b
from featureSet where man_fsetid in ('", paste(ids[,1],
collapse="','"),
"');", sep = ""))
chrom physical_pos allele_a allele_b
1 1 2224111 A G
2 1 2319424 A G
3 1 2926730 C T
4 1 3084986 C G
5 1 3155127 A C
6 1 3695086 C G
7 1 3710825 A G
8 1 3753024 A G
9 1 3753427 A G
10 1 3756100 A G
11 1 3756146 A C
12 1 4240737 A G
13 1 4243294 C G
14 1 4243405 A C
15 1 4243441 C T
16 1 1145994 C T
17 1 2543484 C T
18 1 2941694 C T
19 1 3292731 C T
20 1 4276892 C T
If you don't know any SQL, note that there are a mixture of " and ' in
that paste statement, as we want to end up with a query that looks
like
this:
"select chrom, physical_pos, allele_a, allele_b from featureSet where
man_fsetid in
('SNP_A-2131660','SNP_A-1967418','SNP_A-1969580','SNP_A-4263484','SNP_
A-1978185','SNP_A-4264431','SNP_A-1980898','SNP_A-1983139','SNP_A-4265
735','SNP_A-1995832','SNP_A-1995893','SNP_A-1997689','SNP_A-1997709','
SNP_A-1997896','SNP_A-1997922','SNP_A-2000230','SNP_A-2000332','SNP_A-
2000337','SNP_A-2000342','SNP_A-4268173');"
Best,
Jim
>
> Chromosome Position Reference Base Call
> chr19 2094894 A T
> chr19 2095300 G A
>
>
> Is it possible through bioconductor? Thanks for your time.
>
> -Adrian
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
>
https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should
not be used for urgent or sensitive issues