Search
Question: Error with read.idat() function in illuminaio package
0
2.9 years ago by
United States
Stephen Piccolo510 wrote:

I'm trying to read in some idat files and a bgx file using the illuminaio package. The data files can be found here: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE54839

When I try to execute the following command:

I get the following error message:

Reading manifest file /tmp/GSE54839/GPL6947_HumanHT-12_V3_0_R1_11283641_A.bgx ... Done

     /tmp/GSE54839/GSM1324893_4746900020_G_Grn.idat ... Done

Error in tmp$Quants[, "IllumicodeBinData"] : subscript out of bounds Calls: normalizeLimma -> read.idat -> match Below is my sessionInfo(): R version 3.2.1 (2015-06-18) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.2 LTS locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C  attached base packages: [1] stats4 parallel methods stats graphics grDevices utils [8] datasets base  other attached packages: [1] illuminaio_0.10.0 magrittr_1.5 readr_0.1.1 [4] dplyr_0.4.2 GEOquery_2.34.0 oligo_1.32.0 [7] Biostrings_2.36.1 XVector_0.8.0 IRanges_2.2.5 [10] S4Vectors_0.6.2 Biobase_2.28.0 oligoClasses_1.30.0 [13] BiocGenerics_0.14.0 limma_3.24.14  loaded via a namespace (and not attached): [1] Rcpp_0.12.0 affxparser_1.40.0 GenomicRanges_1.20.5 [4] splines_3.2.1 zlibbioc_1.14.0 bit_1.1-12 [7] R6_2.1.0 foreach_1.4.2 GenomeInfoDb_1.4.1 [10] tools_3.2.1 base64_1.1 ff_2.2-13 [13] DBI_0.3.1 iterators_1.0.7 assertthat_0.1 [16] preprocessCore_1.30.0 affyio_1.36.0 bitops_1.0-6 [19] codetools_0.2-14 RCurl_1.95-4.7 RSQLite_1.0.0 [22] BiocInstaller_1.18.4 XML_3.98-1.3 Any ideas on what I can try? ADD COMMENTlink modified 13 months ago by Matthew Ritchie730 • written 2.9 years ago by Stephen Piccolo510 1 13 months ago by Australia Matthew Ritchie730 wrote: I've made some changes in limma to accommodate idat files in SNP format. I'm not sure when this change will become publicly available though, so in the meantime, you can download an rda file containing data from this experiment read in using the commands below from http://bioinf.wehi.edu.au/folders/mritchie/idat.rda library(limma) files = dir(pattern="idat") bgxfile = dir(pattern="bgx") data = read.idat(files, bgxfile) save(bgxfile, data, files, file="idat.rda") summary(data$E[,1:2])
GSM1324893_4746900020_G_Grn GSM1324894_4746900020_H_Grn
Min.   :   87.0             Min.   :   83.0
1st Qu.:  129.0             1st Qu.:  120.0
Median :  147.0             Median :  136.0
Mean   :  671.9             Mean   :  616.8
3rd Qu.:  273.0             3rd Qu.:  249.0
Max.   :53899.0             Max.   :53203.0
0
2.9 years ago by
Mike Smith2.7k
EMBL Heidelberg / de.NBI
Mike Smith2.7k wrote:

Hi Stephen,

I just took a look at the data, and it seems the idats are in two different formats.  Expression arrays are typically in an encrypted format, with a differently structured plain binary for genotyping and methylation arrays.  If you look at the file sizes you'll see you've got half in one, and half in the other.  I've never seen expression data in the unencrypted format before, but it's worth noting that they were generated a few months apart with different version of the Illumina's scanning software.

illuminaio is able to read either fine (you can do that with readIDAT()), but you get differently names parts in the output. limma is only ever expecting the output from the encrypted format, and so read.idat() falls over.

You can read half the files using:

idatFiles <- list.files(pattern = "_4[89](.+).idat$") read.idat(idatFiles, bgxFilePath) To do the rest, you could probably read each file individually with readIDAT() and create a flat text file to be read using read.ilmn(), but it may be simpler to email the limma maintainer and tweak the code to cope with either file format. ADD COMMENTlink written 2.9 years ago by Mike Smith2.7k 0 13 months ago by zamanijavad670 wrote: Dear Mike. I have encountered this problem too. but i can not solve problem. My experiment GEO ID GSE63808. > read.idat(idatfiles[120:125], "GPL6947_HumanHT-12_V3_0_R1_11283641_A.bgx") Reading manifest file GPL6947_HumanHT-12_V3_0_R1_11283641_A.bgx ... Done GSM1565697_4493594310_I_Grn.idat ... Done Error in tmp$Quants[, "IllumicodeBinData"] : subscript out of bounds
> sessionInfo()
R version 3.2.4 Revised (2016-03-16 r70336)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] illuminaio_0.12.0    limma_3.26.9         BiocInstaller_1.20.3

loaded via a namespace (and not attached):
[1] tools_3.2.4   base64_2.0    openssl_0.9.6

Normally it's a good idea to ask a new question here, rather than adding an 'answer' to an existing one.  People are more likely to notice a new question with no answers, than bumping something old.

Looking at this, the data are unusual, but it's basically the same problem as was identified before.  The idat files you have downloaded are not in the format one would normally expect to see from an expression array, but rather they are structured like those from a genotyping array.  illuminaio is able to read them fine e.g.

library(illuminaio)

tar_file <- tempfile()
destfile = tar_file)
untar(tar_file, exdir = tempdir())

## get a list of all the extracted files
files <- list.files(tempdir(), pattern = ".idat.gz\$")

## now read the first file
idat <- readIDAT(files[1])

You can check the idat object and see it's a list with sensible entries (at least sensible for some Illumina platforms).  You'll notice there is an entry called 'nSNPsRead' which you wouldn't expect for an expression array.

> is(idat)
[1] "list"   "vector"
> names(idat)
[1] "fileSize"      "versionNumber" "nFields"       "fields"        "nSNPsRead"     "Quants"        "MidBlock"
[8] "RedGreen"      "Barcode"       "ChipType"      "RunInfo"       "Unknowns"   

The 'issue' here lies with limma, which looks for the expected structure and then fails when it can't find it.  The mean and standard deviation for each bead type is still present in the Quants matrix, so I would suggestion either convincing the limma authors to add code to handle this really weird case, or work out how you can convert the Quants matrix into a text file format that limma can handle directly.

Dear smith