Error with read.idat() function in illuminaio package
3
0
Entering edit mode
@stephen-piccolo-6761
Last seen 3.6 years ago
United States

I'm trying to read in some idat files and a bgx file using the illuminaio package. The data files can be found here: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE54839

When I try to execute the following command:

read.idat(idatFilePaths, bgxFilePath)

I get the following error message:

Reading manifest file /tmp/GSE54839/GPL6947_HumanHT-12_V3_0_R1_11283641_A.bgx ... Done

     /tmp/GSE54839/GSM1324893_4746900020_G_Grn.idat ... Done

Error in tmp$Quants[, "IllumicodeBinData"] : subscript out of bounds

Calls: normalizeLimma -> read.idat -> match

Below is my sessionInfo():

R version 3.2.1 (2015-06-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  methods   stats     graphics  grDevices utils    
[8] datasets  base     

other attached packages:
 [1] illuminaio_0.10.0   magrittr_1.5        readr_0.1.1        
 [4] dplyr_0.4.2         GEOquery_2.34.0     oligo_1.32.0       
 [7] Biostrings_2.36.1   XVector_0.8.0       IRanges_2.2.5      
[10] S4Vectors_0.6.2     Biobase_2.28.0      oligoClasses_1.30.0
[13] BiocGenerics_0.14.0 limma_3.24.14      

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.0           affxparser_1.40.0     GenomicRanges_1.20.5 
 [4] splines_3.2.1         zlibbioc_1.14.0       bit_1.1-12           
 [7] R6_2.1.0              foreach_1.4.2         GenomeInfoDb_1.4.1   
[10] tools_3.2.1           base64_1.1            ff_2.2-13            
[13] DBI_0.3.1             iterators_1.0.7       assertthat_0.1       
[16] preprocessCore_1.30.0 affyio_1.36.0         bitops_1.0-6         
[19] codetools_0.2-14      RCurl_1.95-4.7        RSQLite_1.0.0        
[22] BiocInstaller_1.18.4  XML_3.98-1.3

Any ideas on what I can try?

illuminaio normalization • 2.9k views
ADD COMMENT
1
Entering edit mode
Matthew Ritchie ▴ 1000
@matthew-ritchie-650
Last seen 20 months ago
Australia

I've made some changes in limma to accommodate idat files in SNP format. I'm not sure when this change will become publicly available though, so in the meantime, you can download an rda file containing data from this experiment read in using the commands below from http://bioinf.wehi.edu.au/folders/mritchie/idat.rda

library(limma)
files = dir(pattern="idat")
bgxfile = dir(pattern="bgx")
data = read.idat(files, bgxfile)

save(bgxfile, data, files, file="idat.rda")

summary(data$E[,1:2])
GSM1324893_4746900020_G_Grn GSM1324894_4746900020_H_Grn
Min.   :   87.0             Min.   :   83.0           
1st Qu.:  129.0             1st Qu.:  120.0           
Median :  147.0             Median :  136.0           
Mean   :  671.9             Mean   :  616.8           
3rd Qu.:  273.0             3rd Qu.:  249.0           
Max.   :53899.0             Max.   :53203.0
ADD COMMENT
0
Entering edit mode
Mike Smith ★ 6.5k
@mike-smith
Last seen 11 hours ago
EMBL Heidelberg

Hi Stephen,

I just took a look at the data, and it seems the idats are in two different formats.  Expression arrays are typically in an encrypted format, with a differently structured plain binary for genotyping and methylation arrays.  If you look at the file sizes you'll see you've got half in one, and half in the other.  I've never seen expression data in the unencrypted format before, but it's worth noting that they were generated a few months apart with different version of the Illumina's scanning software.

illuminaio is able to read either fine (you can do that with readIDAT()), but you get differently names parts in the output. limma is only ever expecting the output from the encrypted format, and so read.idat() falls over.

You can read half the files using:

idatFiles <- list.files(pattern = "_4[89](.+).idat$")
read.idat(idatFiles, bgxFilePath)

To do the rest, you could probably read each file individually with readIDAT() and create a flat text file to be read using read.ilmn(), but it may be simpler to email the limma maintainer and tweak the code to cope with either file format.

ADD COMMENT
0
Entering edit mode
@zamanijavad67-12946
Last seen 7.0 years ago

Dear Mike. I have encountered this problem too. but i can not solve problem. My experiment GEO ID GSE63808.

> read.idat(idatfiles[120:125], "GPL6947_HumanHT-12_V3_0_R1_11283641_A.bgx")
Reading manifest file GPL6947_HumanHT-12_V3_0_R1_11283641_A.bgx ... Done
	 GSM1565697_4493594310_I_Grn.idat ... Done
Error in tmp$Quants[, "IllumicodeBinData"] : subscript out of bounds
> sessionInfo()
R version 3.2.4 Revised (2016-03-16 r70336)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] illuminaio_0.12.0    limma_3.26.9         BiocInstaller_1.20.3

loaded via a namespace (and not attached):
[1] tools_3.2.4   base64_2.0    openssl_0.9.6
ADD COMMENT
0
Entering edit mode

Normally it's a good idea to ask a new question here, rather than adding an 'answer' to an existing one.  People are more likely to notice a new question with no answers, than bumping something old.


Looking at this, the data are unusual, but it's basically the same problem as was identified before.  The idat files you have downloaded are not in the format one would normally expect to see from an expression array, but rather they are structured like those from a genotyping array.  illuminaio is able to read them fine e.g.

library(illuminaio)

## download and untar the file from GEO
tar_file <- tempfile()
download.file("https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE63808&format=file", 
              destfile = tar_file)
untar(tar_file, exdir = tempdir())

## get a list of all the extracted files
files <- list.files(tempdir(), pattern = ".idat.gz$")

## now read the first file
idat <- readIDAT(files[1])

You can check the idat object and see it's a list with sensible entries (at least sensible for some Illumina platforms).  You'll notice there is an entry called 'nSNPsRead' which you wouldn't expect for an expression array.

> is(idat)
[1] "list"   "vector"
> names(idat)
 [1] "fileSize"      "versionNumber" "nFields"       "fields"        "nSNPsRead"     "Quants"        "MidBlock"     
 [8] "RedGreen"      "Barcode"       "ChipType"      "RunInfo"       "Unknowns"   

The 'issue' here lies with limma, which looks for the expected structure and then fails when it can't find it.  The mean and standard deviation for each bead type is still present in the Quants matrix, so I would suggestion either convincing the limma authors to add code to handle this really weird case, or work out how you can convert the Quants matrix into a text file format that limma can handle directly.

ADD REPLY
0
Entering edit mode

Dear smith

thank your reply. after reading by readIDAT what should i do. illuminaio reads only single sample. what after?

ADD REPLY

Login before adding your answer.

Traffic: 509 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6