Search
Question: Error with read.idat() function in illuminaio package
0
gravatar for Stephen Piccolo
2.1 years ago by
United States
Stephen Piccolo490 wrote:

I'm trying to read in some idat files and a bgx file using the illuminaio package. The data files can be found here: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE54839

When I try to execute the following command:

read.idat(idatFilePaths, bgxFilePath)

I get the following error message:

Reading manifest file /tmp/GSE54839/GPL6947_HumanHT-12_V3_0_R1_11283641_A.bgx ... Done

     /tmp/GSE54839/GSM1324893_4746900020_G_Grn.idat ... Done

Error in tmp$Quants[, "IllumicodeBinData"] : subscript out of bounds

Calls: normalizeLimma -> read.idat -> match

Below is my sessionInfo():

R version 3.2.1 (2015-06-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  methods   stats     graphics  grDevices utils    
[8] datasets  base     

other attached packages:
 [1] illuminaio_0.10.0   magrittr_1.5        readr_0.1.1        
 [4] dplyr_0.4.2         GEOquery_2.34.0     oligo_1.32.0       
 [7] Biostrings_2.36.1   XVector_0.8.0       IRanges_2.2.5      
[10] S4Vectors_0.6.2     Biobase_2.28.0      oligoClasses_1.30.0
[13] BiocGenerics_0.14.0 limma_3.24.14      

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.0           affxparser_1.40.0     GenomicRanges_1.20.5 
 [4] splines_3.2.1         zlibbioc_1.14.0       bit_1.1-12           
 [7] R6_2.1.0              foreach_1.4.2         GenomeInfoDb_1.4.1   
[10] tools_3.2.1           base64_1.1            ff_2.2-13            
[13] DBI_0.3.1             iterators_1.0.7       assertthat_0.1       
[16] preprocessCore_1.30.0 affyio_1.36.0         bitops_1.0-6         
[19] codetools_0.2-14      RCurl_1.95-4.7        RSQLite_1.0.0        
[22] BiocInstaller_1.18.4  XML_3.98-1.3

Any ideas on what I can try?

ADD COMMENTlink modified 4 months ago by Matthew Ritchie660 • written 2.1 years ago by Stephen Piccolo490
1
gravatar for Matthew Ritchie
4 months ago by
Australia
Matthew Ritchie660 wrote:

I've made some changes in limma to accommodate idat files in SNP format. I'm not sure when this change will become publicly available though, so in the meantime, you can download an rda file containing data from this experiment read in using the commands below from http://bioinf.wehi.edu.au/folders/mritchie/idat.rda

library(limma)
files = dir(pattern="idat")
bgxfile = dir(pattern="bgx")
data = read.idat(files, bgxfile)

save(bgxfile, data, files, file="idat.rda")

summary(data$E[,1:2])
GSM1324893_4746900020_G_Grn GSM1324894_4746900020_H_Grn
Min.   :   87.0             Min.   :   83.0           
1st Qu.:  129.0             1st Qu.:  120.0           
Median :  147.0             Median :  136.0           
Mean   :  671.9             Mean   :  616.8           
3rd Qu.:  273.0             3rd Qu.:  249.0           
Max.   :53899.0             Max.   :53203.0
ADD COMMENTlink written 4 months ago by Matthew Ritchie660
0
gravatar for Mike Smith
2.1 years ago by
Mike Smith2.0k
EMBL Heidelberg / de.NBI
Mike Smith2.0k wrote:

Hi Stephen,

I just took a look at the data, and it seems the idats are in two different formats.  Expression arrays are typically in an encrypted format, with a differently structured plain binary for genotyping and methylation arrays.  If you look at the file sizes you'll see you've got half in one, and half in the other.  I've never seen expression data in the unencrypted format before, but it's worth noting that they were generated a few months apart with different version of the Illumina's scanning software.

illuminaio is able to read either fine (you can do that with readIDAT()), but you get differently names parts in the output. limma is only ever expecting the output from the encrypted format, and so read.idat() falls over.

You can read half the files using:

idatFiles <- list.files(pattern = "_4[89](.+).idat$")
read.idat(idatFiles, bgxFilePath)

To do the rest, you could probably read each file individually with readIDAT() and create a flat text file to be read using read.ilmn(), but it may be simpler to email the limma maintainer and tweak the code to cope with either file format.

ADD COMMENTlink written 2.1 years ago by Mike Smith2.0k
0
gravatar for zamanijavad67
4 months ago by
zamanijavad670 wrote:

Dear Mike. I have encountered this problem too. but i can not solve problem. My experiment GEO ID GSE63808.

> read.idat(idatfiles[120:125], "GPL6947_HumanHT-12_V3_0_R1_11283641_A.bgx")
Reading manifest file GPL6947_HumanHT-12_V3_0_R1_11283641_A.bgx ... Done
	 GSM1565697_4493594310_I_Grn.idat ... Done
Error in tmp$Quants[, "IllumicodeBinData"] : subscript out of bounds
> sessionInfo()
R version 3.2.4 Revised (2016-03-16 r70336)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] illuminaio_0.12.0    limma_3.26.9         BiocInstaller_1.20.3

loaded via a namespace (and not attached):
[1] tools_3.2.4   base64_2.0    openssl_0.9.6
ADD COMMENTlink written 4 months ago by zamanijavad670

Normally it's a good idea to ask a new question here, rather than adding an 'answer' to an existing one.  People are more likely to notice a new question with no answers, than bumping something old.


Looking at this, the data are unusual, but it's basically the same problem as was identified before.  The idat files you have downloaded are not in the format one would normally expect to see from an expression array, but rather they are structured like those from a genotyping array.  illuminaio is able to read them fine e.g.

library(illuminaio)

## download and untar the file from GEO
tar_file <- tempfile()
download.file("https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE63808&format=file", 
              destfile = tar_file)
untar(tar_file, exdir = tempdir())

## get a list of all the extracted files
files <- list.files(tempdir(), pattern = ".idat.gz$")

## now read the first file
idat <- readIDAT(files[1])

You can check the idat object and see it's a list with sensible entries (at least sensible for some Illumina platforms).  You'll notice there is an entry called 'nSNPsRead' which you wouldn't expect for an expression array.

> is(idat)
[1] "list"   "vector"
> names(idat)
 [1] "fileSize"      "versionNumber" "nFields"       "fields"        "nSNPsRead"     "Quants"        "MidBlock"     
 [8] "RedGreen"      "Barcode"       "ChipType"      "RunInfo"       "Unknowns"   

The 'issue' here lies with limma, which looks for the expected structure and then fails when it can't find it.  The mean and standard deviation for each bead type is still present in the Quants matrix, so I would suggestion either convincing the limma authors to add code to handle this really weird case, or work out how you can convert the Quants matrix into a text file format that limma can handle directly.

ADD REPLYlink written 4 months ago by Mike Smith2.0k

Dear smith

thank your reply. after reading by readIDAT what should i do. illuminaio reads only single sample. what after?

ADD REPLYlink written 4 months ago by zamanijavad670
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 176 users visited in the last hour