Search
Question: Error with read.idat() function in illuminaio package
0
gravatar for Stephen Piccolo
24 months ago by
United States
Stephen Piccolo480 wrote:

I'm trying to read in some idat files and a bgx file using the illuminaio package. The data files can be found here: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE54839

When I try to execute the following command:

read.idat(idatFilePaths, bgxFilePath)

I get the following error message:

Reading manifest file /tmp/GSE54839/GPL6947_HumanHT-12_V3_0_R1_11283641_A.bgx ... Done

     /tmp/GSE54839/GSM1324893_4746900020_G_Grn.idat ... Done

Error in tmp$Quants[, "IllumicodeBinData"] : subscript out of bounds

Calls: normalizeLimma -> read.idat -> match

Below is my sessionInfo():

R version 3.2.1 (2015-06-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  methods   stats     graphics  grDevices utils    
[8] datasets  base     

other attached packages:
 [1] illuminaio_0.10.0   magrittr_1.5        readr_0.1.1        
 [4] dplyr_0.4.2         GEOquery_2.34.0     oligo_1.32.0       
 [7] Biostrings_2.36.1   XVector_0.8.0       IRanges_2.2.5      
[10] S4Vectors_0.6.2     Biobase_2.28.0      oligoClasses_1.30.0
[13] BiocGenerics_0.14.0 limma_3.24.14      

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.0           affxparser_1.40.0     GenomicRanges_1.20.5 
 [4] splines_3.2.1         zlibbioc_1.14.0       bit_1.1-12           
 [7] R6_2.1.0              foreach_1.4.2         GenomeInfoDb_1.4.1   
[10] tools_3.2.1           base64_1.1            ff_2.2-13            
[13] DBI_0.3.1             iterators_1.0.7       assertthat_0.1       
[16] preprocessCore_1.30.0 affyio_1.36.0         bitops_1.0-6         
[19] codetools_0.2-14      RCurl_1.95-4.7        RSQLite_1.0.0        
[22] BiocInstaller_1.18.4  XML_3.98-1.3

Any ideas on what I can try?

ADD COMMENTlink modified 11 weeks ago by Matthew Ritchie630 • written 24 months ago by Stephen Piccolo480
1
gravatar for Matthew Ritchie
11 weeks ago by
Australia
Matthew Ritchie630 wrote:

I've made some changes in limma to accommodate idat files in SNP format. I'm not sure when this change will become publicly available though, so in the meantime, you can download an rda file containing data from this experiment read in using the commands below from http://bioinf.wehi.edu.au/folders/mritchie/idat.rda

library(limma)
files = dir(pattern="idat")
bgxfile = dir(pattern="bgx")
data = read.idat(files, bgxfile)

save(bgxfile, data, files, file="idat.rda")

summary(data$E[,1:2])
GSM1324893_4746900020_G_Grn GSM1324894_4746900020_H_Grn
Min.   :   87.0             Min.   :   83.0           
1st Qu.:  129.0             1st Qu.:  120.0           
Median :  147.0             Median :  136.0           
Mean   :  671.9             Mean   :  616.8           
3rd Qu.:  273.0             3rd Qu.:  249.0           
Max.   :53899.0             Max.   :53203.0
ADD COMMENTlink written 11 weeks ago by Matthew Ritchie630
0
gravatar for Mike Smith
24 months ago by
Mike Smith1.9k
EMBL Heidelberg / de.NBI
Mike Smith1.9k wrote:

Hi Stephen,

I just took a look at the data, and it seems the idats are in two different formats.  Expression arrays are typically in an encrypted format, with a differently structured plain binary for genotyping and methylation arrays.  If you look at the file sizes you'll see you've got half in one, and half in the other.  I've never seen expression data in the unencrypted format before, but it's worth noting that they were generated a few months apart with different version of the Illumina's scanning software.

illuminaio is able to read either fine (you can do that with readIDAT()), but you get differently names parts in the output. limma is only ever expecting the output from the encrypted format, and so read.idat() falls over.

You can read half the files using:

idatFiles <- list.files(pattern = "_4[89](.+).idat$")
read.idat(idatFiles, bgxFilePath)

To do the rest, you could probably read each file individually with readIDAT() and create a flat text file to be read using read.ilmn(), but it may be simpler to email the limma maintainer and tweak the code to cope with either file format.

ADD COMMENTlink written 24 months ago by Mike Smith1.9k
0
gravatar for zamanijavad67
12 weeks ago by
zamanijavad670 wrote:

Dear Mike. I have encountered this problem too. but i can not solve problem. My experiment GEO ID GSE63808.

> read.idat(idatfiles[120:125], "GPL6947_HumanHT-12_V3_0_R1_11283641_A.bgx")
Reading manifest file GPL6947_HumanHT-12_V3_0_R1_11283641_A.bgx ... Done
	 GSM1565697_4493594310_I_Grn.idat ... Done
Error in tmp$Quants[, "IllumicodeBinData"] : subscript out of bounds
> sessionInfo()
R version 3.2.4 Revised (2016-03-16 r70336)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] illuminaio_0.12.0    limma_3.26.9         BiocInstaller_1.20.3

loaded via a namespace (and not attached):
[1] tools_3.2.4   base64_2.0    openssl_0.9.6
ADD COMMENTlink written 12 weeks ago by zamanijavad670

Normally it's a good idea to ask a new question here, rather than adding an 'answer' to an existing one.  People are more likely to notice a new question with no answers, than bumping something old.


Looking at this, the data are unusual, but it's basically the same problem as was identified before.  The idat files you have downloaded are not in the format one would normally expect to see from an expression array, but rather they are structured like those from a genotyping array.  illuminaio is able to read them fine e.g.

library(illuminaio)

## download and untar the file from GEO
tar_file <- tempfile()
download.file("https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE63808&format=file", 
              destfile = tar_file)
untar(tar_file, exdir = tempdir())

## get a list of all the extracted files
files <- list.files(tempdir(), pattern = ".idat.gz$")

## now read the first file
idat <- readIDAT(files[1])

You can check the idat object and see it's a list with sensible entries (at least sensible for some Illumina platforms).  You'll notice there is an entry called 'nSNPsRead' which you wouldn't expect for an expression array.

> is(idat)
[1] "list"   "vector"
> names(idat)
 [1] "fileSize"      "versionNumber" "nFields"       "fields"        "nSNPsRead"     "Quants"        "MidBlock"     
 [8] "RedGreen"      "Barcode"       "ChipType"      "RunInfo"       "Unknowns"   

The 'issue' here lies with limma, which looks for the expected structure and then fails when it can't find it.  The mean and standard deviation for each bead type is still present in the Quants matrix, so I would suggestion either convincing the limma authors to add code to handle this really weird case, or work out how you can convert the Quants matrix into a text file format that limma can handle directly.

ADD REPLYlink written 12 weeks ago by Mike Smith1.9k

Dear smith

thank your reply. after reading by readIDAT what should i do. illuminaio reads only single sample. what after?

ADD REPLYlink written 12 weeks ago by zamanijavad670
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 266 users visited in the last hour