Question

Reading Illumina IDAT files

0

Entering edit mode

Mark Cowley ▴ 910

@mark-cowley-2951

Last seen 9.6 years ago

Dear list, I'd like to be able to parse Illumina gene expression IDAT files & i've been playing with the crlmm:::readIDAT function, which is designed to read Illumina Infinium IDAT files. This function dies on about the 9th line or so because 'nFields' is a very large negative number (see below). I'm trying to read in a MouseRef- 8_V2_0_R1_11278551_A.bgx.xml type of array, but would like to be able to read all types of gene expression arrays. Here is the output that I get library(ff) library(crlmm) f <- "4687778079_A_Grn.idat" debug(crlmm:::readIDAT) crlmm:::readIDAT(f) #<snip> Browse[2]> debug: fileSize <- file.info(idatFile)$size Browse[2]> debug: tempCon <- file(idatFile, "rb") Browse[2]> debug: prefixCheck <- readChar(tempCon, 4) Browse[2]> debug: if (prefixCheck != "IDAT") { } Browse[2]> prefixCheck [1] "IDAT" Browse[2]> debug: NULL Browse[2]> debug: versionNumber <- readBin(tempCon, "integer", n = 1, size = 8, endian = "little", signed = FALSE) Browse[2]> debug: nFields <- readBin(tempCon, "integer", n = 1, size = 4, endian = "little", signed = FALSE) Browse[2]> versionNumber [1] 1 Browse[2]> debug: fields <- matrix(0, nFields, 3) Browse[2]> nFields [1] -1398219826 Browse[2]> Error in matrix(0, nFields, 3) : invalid 'nrow' value (< 0) I've also come across the illumina.py file within the glu-genetics project at googlecode, which as far as I can tell is python code to parse illumina arrays, based upon this crlmm code. Between crlmm's code & the glu-genetics code, I gather that the readIDAT function only reads IDAT version 3 files, whereas i'm pretty sure mine are IDAT version 1 (as indicated by the versionNumber value above I don't know whether Infinium IDAT's are indeed a different version to gene expression IDAT's, but I was hoping someone could point me in the right direction. Does anyone have a parser for generic IDAT files, or does anyone know how to reverse engineer binary files? cheers, Mark ---------------------------------------------------------------------- Mark Cowley, PhD Peter Wills Bioinformatics Centre Garvan Institute of Medical Research ---------------------------------------------------------------------- sessionInfo() R version 2.11.0 (2010-04-22) x86_64-apple-darwin9.8.0 locale: [1] en_AU.UTF-8/en_AU.UTF-8/C/C/en_AU.UTF-8/en_AU.UTF-8 attached base packages: [1] tools stats graphics grDevices utils datasets methods base other attached packages: [1] crlmm_1.6.2 oligoClasses_1.10.0 Biobase_2.8.0 ff_2.1-2 bit_1.1-4 loaded via a namespace (and not attached): [1] affyio_1.16.0 annotate_1.26.0 AnnotationDbi_1.10.1 Biostrings_2.16.2 [5] DBI_0.2-5 ellipse_0.3-5 genefilter_1.30.0 IRanges_1.6.4 [9] mvtnorm_0.9-9 preprocessCore_1.10.0 RSQLite_0.9-0 splines_2.11.0 [13] survival_2.35-8 xtable_1.5-6 [[alternative HTML version deleted]]

crlmm crlmm • 1.8k views

ADD COMMENT • link updated 13.8 years ago by Matthew Ritchie ▴ 1000 • written 13.8 years ago by Mark Cowley ▴ 910

score 0 · Answer 1 · 2010-06-24

Hi Mark, The function you refer to is only able to handle idat files from current version Infinium genotyping arrays. Idats from expression arrays are not supported - it is my understanding that the information in these files is encrypted, which adds an extra layer of complexity to the whole operation. Idats from Infinium arrays scanned using older scanner settings also produce errors - we have added a check for this in the devel version of crlmm. Best wishes, Matt > Dear list, > > I'd like to be able to parse Illumina gene expression IDAT files & i've > been playing with the crlmm:::readIDAT function, which is designed to read > Illumina Infinium IDAT files. This function dies on about the 9th line or > so because 'nFields' is a very large negative number (see below). I'm > trying to read in a MouseRef-8_V2_0_R1_11278551_A.bgx.xml type of array, > but would like to be able to read all types of gene expression arrays. > > Here is the output that I get > > library(ff) > library(crlmm) > f <- "4687778079_A_Grn.idat" > debug(crlmm:::readIDAT) > crlmm:::readIDAT(f) > #<snip> > Browse[2]> > debug: fileSize <- file.info(idatFile)$size > Browse[2]> > debug: tempCon <- file(idatFile, "rb") > Browse[2]> > debug: prefixCheck <- readChar(tempCon, 4) > Browse[2]> > debug: if (prefixCheck != "IDAT") { > } > Browse[2]> prefixCheck > [1] "IDAT" > Browse[2]> > debug: NULL > Browse[2]> > debug: versionNumber <- readBin(tempCon, "integer", n = 1, size = 8, > endian = "little", signed = FALSE) > Browse[2]> > debug: nFields <- readBin(tempCon, "integer", n = 1, size = 4, endian = > "little", > signed = FALSE) > Browse[2]> versionNumber > [1] 1 > Browse[2]> > debug: fields <- matrix(0, nFields, 3) > Browse[2]> nFields > [1] -1398219826 > Browse[2]> > Error in matrix(0, nFields, 3) : invalid 'nrow' value (< 0) > > > I've also come across the illumina.py file within the glu-genetics project > at googlecode, which as far as I can tell is python code to parse illumina > arrays, based upon this crlmm code. Between crlmm's code & the > glu-genetics code, I gather that the readIDAT function only reads IDAT > version 3 files, whereas i'm pretty sure mine are IDAT version 1 (as > indicated by the versionNumber value above > > I don't know whether Infinium IDAT's are indeed a different version to > gene expression IDAT's, but I was hoping someone could point me in the > right direction. Does anyone have a parser for generic IDAT files, or does > anyone know how to reverse engineer binary files? > > cheers, > Mark > > ---------------------------------------------------------------------- > Mark Cowley, PhD > > Peter Wills Bioinformatics Centre > Garvan Institute of Medical Research > ---------------------------------------------------------------------- > > sessionInfo() > R version 2.11.0 (2010-04-22) > x86_64-apple-darwin9.8.0 > > locale: > [1] en_AU.UTF-8/en_AU.UTF-8/C/C/en_AU.UTF-8/en_AU.UTF-8 > > attached base packages: > [1] tools stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] crlmm_1.6.2 oligoClasses_1.10.0 Biobase_2.8.0 ff_2.1-2 > bit_1.1-4 > > loaded via a namespace (and not attached): > [1] affyio_1.16.0 annotate_1.26.0 AnnotationDbi_1.10.1 > Biostrings_2.16.2 > [5] DBI_0.2-5 ellipse_0.3-5 genefilter_1.30.0 > IRanges_1.6.4 > [9] mvtnorm_0.9-9 preprocessCore_1.10.0 RSQLite_0.9-0 > splines_2.11.0 > [13] survival_2.35-8 xtable_1.5-6 ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}