It seems I found a small bug in limma
or illuminaio
. If not, when the behavior below is as intended, please allow me to put forward a (small) feature request:
the possibility to also read compressed IDAT files (just like it is possible for compressed BGX files).
-> at the Gene Expression Omnibus (GEO), the (raw) data files that are made available are always compressed by GZIP. Being able to directly read these compressed files with limma
(through illuminaio
?) would make life slightly more comfortable. I noticed compressed BGX files could already be handled, but this seems not to be the case for IDAT files. Hence my question.
Thanks for considering.
Guido
# Example (from: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE80081) # I manually downloaded the raw data file (GSE80081_RAW.tar), and extracted it. # The result is a set of compressed files. > library(limma) > dir() [1] "GPL6887_MouseWG-6_V2_0_R0_11278593_A.bgx.gz" [2] "GPL6887_MouseWG-6_V2_0_R3_11278593_A.txt.gz" [3] "GSM2112545_9482914014_A_Grn.idat.gz" [4] "GSM2112546_9482914014_B_Grn.idat.gz" [5] "GSM2112547_9482914014_C_Grn.idat.gz" [6] "GSM2112548_9482914014_D_Grn.idat.gz" [7] "GSM2112549_9482914014_E_Grn.idat.gz" [8] "GSM2112550_9482914014_F_Grn.idat.gz" > bgxfile = dir(pattern="bgx") > idatfiles = dir(pattern="idat") > > x <- read.idat(idatfiles, bgxfile) Reading manifest file GPL6887_MouseWG-6_V2_0_R0_11278593_A.bgx.gz ... Done GSM2112545_9482914014_A_Grn.idat.gz ... Error in dataChunks[[i]] : subscript out of bounds > traceback() 4: strsplit(dataChunks[[i]], "\\\"") 3: readIDAT_enc(file) 2: illuminaio::readIDAT(idatfiles[j]) 1: read.idat(idatfiles, bgxfile) >
# After manually extracting the compressed IDAT files (only) it works fine.
> dir() [1] "GPL6887_MouseWG-6_V2_0_R0_11278593_A.bgx.gz" [2] "GSM2112545_9482914014_A_Grn.idat" [3] "GSM2112546_9482914014_B_Grn.idat" [4] "GSM2112547_9482914014_C_Grn.idat" [5] "GSM2112548_9482914014_D_Grn.idat" [6] "GSM2112549_9482914014_E_Grn.idat" [7] "GSM2112550_9482914014_F_Grn.idat" > bgxfile = dir(pattern="bgx") > idatfiles = dir(pattern="idat") > > x <- read.idat(idatfiles, bgxfile) Reading manifest file GPL6887_MouseWG-6_V2_0_R0_11278593_A.bgx.gz ... Done GSM2112545_9482914014_A_Grn.idat ... Done GSM2112546_9482914014_B_Grn.idat ... Done GSM2112547_9482914014_C_Grn.idat ... Done GSM2112548_9482914014_D_Grn.idat ... Done GSM2112549_9482914014_E_Grn.idat ... Done GSM2112550_9482914014_F_Grn.idat ... Done Finished reading data. > > x.norm <- neqc(x) >
> sessionInfo() R version 3.4.0 Patched (2017-05-10 r72670) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 Matrix products: default locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] limma_3.32.2 loaded via a namespace (and not attached): [1] compiler_3.4.0 base64_2.0 illuminaio_0.18.0 openssl_0.9.6 >