Problem with memory limit in Rstudio/Windows when importing Affymetrix HTA 2.0 CEL files with oligo R package
1
0
Entering edit mode
svlachavas ▴ 840
@svlachavas-7225
Last seen 14 months ago
Germany/Heidelberg/German Cancer Resear…

Dear Community,

i would like to ask for a very specific problem regarding memory size in R/windows regarding the import of large size CEL affymetrix files with oligo R package, and the possibility of overcoming this problem. In detail, the total size of the CEL files (unzipped) is ~115 Gb (about 1820 CEL files-HTA 2.0 affymetrix platform).

Below is a small relative output of the problem:

dat <- read.celfiles(list.celfiles())

Platform design info loaded.
Error: cannot allocate vector of size 93.5 Gb

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7600)

locale:
[1] LC_COLLATE=Greek_Greece.1253  LC_CTYPE=Greek_Greece.1253    LC_MONETARY=Greek_Greece.1253
[4] LC_NUMERIC=C                  LC_TIME=Greek_Greece.1253    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BiocInstaller_1.24.0 pd.hta.2.0_3.12.2    DBI_0.6              RSQLite_1.1-2       
 [5] oligo_1.38.0         Biostrings_2.42.1    XVector_0.14.1       IRanges_2.8.2       
 [9] S4Vectors_0.12.2     oligoClasses_1.36.0  GEOquery_2.40.0      Biobase_2.34.0      
[13] BiocGenerics_0.20.0 

memory.limit()

[1] 8163

which cant be increased, as my RAM is 8Gb.

Thus, perhaps the solution would be to try to a pc with a much greater memory, or i could try something here ?

Best,

Efstathios

 

 

oligo hta2.0 affymetrix microarrays memory problem • 1.8k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States

1820 HTA arrays? On a Windows box with 8Gb? Never gonna happen. You might be able to get away with processing in batches and using something like UPC to control for the batches. I've never done that, so ymmv.

An alternative would be to spin up a relatively large EC2 instance with lots of RAM and use the Amazon AMI to process the data, and then do further analyses of the processed data on your Windows box.

ADD COMMENT
0
Entering edit mode

Dear James,

thanks for the comments !! i was certain (unfortunately) with the windows box, but i thought to give a try to get feedback for further options !! 2 last comments on this matter:

1) on a unix machine with 64Gb RAM will worth give it a try ? I mention it, because this is my last alternative  that i could try before something like the Amazon AMI you mentioned.

2) In this specific case of many CEL files, would you try for a quick analysis the processed option:

for example, regarding the specific dataset i have described:

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE88884

it has an option GSE88884_ILLUMINATE1and2_SLEbaselineVsHealthy_preprocessed.txt.gz

However, i have never used/downloaded pre-processed data (only RAW), and im not certain how or what preprocessing has been done

ADD REPLY
1
Entering edit mode

If you submit to GEO, you are supposed to say how you processed the data, but this information is stored at the sample level. So if you click on any of the sample links, like say this one. They apparently use rlm from MASS instead of rma. Both rlm and median polish are intended to be robust model fitting algorithms, so you could probably argue that the provided gene level data are fine as is.

ADD REPLY

Login before adding your answer.

Traffic: 742 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6