Question

Normalization large data set

0

Entering edit mode

phungduongmy1416222 • 0

@phungduongmy1416222-19921

Last seen 5.2 years ago

Hello, I am trying to normalize data "GSE68465" (462 samples) by gcRMA and I will only use gcRMA in this case due to the requirement work. The number of sample is large and when I analyze, the error is related to R can not allocate the so big vector. Here is my code:

setwd("D:/justforR1/meta_lung/GSE68465_RAW")
source("http://bioconductor.org/biocLite.R")
biocLite()
library(affy)
library(gcrma)
eset.gcrma = justGCRMA()
exprSet.nologs = exprs(eset.gcrma)
write.table(exprSet.nologs, file="Normalizationtest.gcrma.txt", quote=F, sep="\t")

What should I do in this situation? Thank you

normalization gcRMA • 616 views

ADD COMMENT • link updated 5.2 years ago by James W. MacDonald 65k • written 5.2 years ago by phungduongmy1416222 • 0

score 0 · Answer 1 · 2019-02-19

That's a lot of arrays. You could try using optimize.by = "memory", which is a bit more memory efficient. But it looks like you are working on a Windows box, which probably doesn't have much memory (like, maybe 16 Gb, which is a 'normal' amount of RAM for a desktop), in which case it probably won't help.

An alternative would be to use AWS to spin up a large enough instance to run GCRMA on that number of files, and then get your data back and analyze the summarized data on your desktop. That wouldn't cost much, but it does take a bit to figure out how to do it (I find most of Amazon's documentation, um, let's say obscure). There are some instructions here that you could use.