Hi Lavinia,
The GWASTools package was designed to work with this type of data.
You can download annotation for Illumina arrays from their website:
https://icom.illumina.com/. They now require that you register with
their site to download files. Once you have logged in, click
"Downloads" in the menu on the left and then "Genotyping/LOH/CNV" in
the
menu on the right, and look for the Human Omni1 Quad link. The file
that you want is called HumanOmni1-Quad_v1-0_H_csv.zip, and looks like
this:
IlmnID,Name,IlmnStrand,SNP,AddressA_ID,AlleleA_ProbeSeq,AddressB_ID,Al
leleB_ProbeSeq,GenomeBuild,Chr,MapInfo,Ploidy,Species,Source,SourceVer
sion,SourceStrand,SourceSeq,TopGenomicSeq,BeadSetID,Exp_Clusters,Inten
sity_Only,RefStrand
200006-0_T_R_1853021091,200006,TOP,[A/G],0060702346,AGACTGTGGATGAATAAT
GCTGGTGAGTGTCTGGCCCTCGGGGAGGCCCA,,,37.1,9,139926402,diploid,Homo
sapiens,ILLUMINA,0,BOT,ACATGCCCCACTCAGCGCCACCCCCGTCCTCCCCTCCCAGGTTGCCT
AGCTGTCCCCAGC[T/C]TGGGCCTCCCCGAGGGCCAGACACTCACCAGCATTATTCATCCACAGTCTCC
CAGGATCA,TGATCCTGGGAGACTGTGGATGAATAATGCTGGTGAGTGTCTGGCCCTCGGGGAGGCCCA[
A/G]GCTGGGGACAGCTAGGCAACCTGGGAGGGGAGGACGGGGGTGGCGCTGAGTGGGGCATGT,163,3
,0,-
The "SNP" column tells you the A/B allele designation for a particular
SNP (format [A/B]) and the "IlmnStrand" column tells you whether that
SNP is on the TOP or BOT strand. (See here for a useful article on
how
to convert between different strand designations:
http://www.sciencedirect.com/science/article/pii/S0168952512000704)
Stephanie Gogarten
Research Scientist, Biostatistics
University of Washington
On 7/16/12 3:00 AM, bioconductor-request at r-project.org wrote:
> Message: 3
> Date: Mon, 16 Jul 2012 13:59:33 +1000
> From: "Lavinia Gordon"<lavinia.gordon at="" mcri.edu.au="">
> To:<bioconductor at="" r-project.org="">
> Subject: [BioC] Translating AB/BB/AA into a SNP with Illumina data
> Message-ID:<87223629775F2049917889888F597633FD720F at
murmx.mcri.edu.au>
> Content-Type: text/plain; charset="us-ascii"
>
> Dear all,
>
> I am working with Illumina Human Omni1 Quad data. I only have
access to
> processed data, e.g:
> ID_REF VALUE Score Theta R B Allele Freq Log R
Ratio
> 200006 AB 0.8273118 0.4800678 2.651576
> 0.5337635 0.1516016
>
> I would like to know what the SNP is at this position and wondered
if
> there are any components within the Bioconductor packages that can
deal
> with this data, taking into account the TOP/BTM strand approach that
> Illumina uses. I have previously had great success with crlmm, but
that
> was working from the raw IDAT files.
>
> With thanks for your time,
>
> Lavinia Gordon
> Senior Research Officer
> Quantitative Sciences Core, Bioinformatics
>
> Murdoch Childrens Research Institute
> The Royal Children's Hospital
> Flemington Road Parkville Victoria 3052 Australia
> T 03 8341 6221
> www.mcri.edu.au