Entering edit mode
@victor-m-trevino-alvarado-505
Last seen 10.6 years ago
We are trying to load around 260 cel files in a AffyBatch object.
There are 6117 genes in the chip.
We tried to load in both Windows and Linux machines. The problem is
that we cannot allocate "memory" to load this data.
In windows with 512Mb & 4Gb of VirtualMem, we can manipulate the
amount of memory available to R but when R process is around 1.6Gb
the R process simply "never end" or "hang" (it is a known problem - R
documentation).
In Linux with 2Gb & 4Gb of VirtualMem we ecountered the "cannot
allocate around 470Mb of vector".
We sucesfully load the data making some "workarounds" but I think the
"read.affybatch" routine can be enhanced and/or AffyBatch object can
be "modified" in some way to resolve this kind of problems.
We "analyze" the read.affybatch routine and we saw that:
1) An affybatch object is created before reading the all cel files
2) All cel files are read and saved in an "temporal" matrix.
3) The temporal matrix is "copied" (or assigned ?) to affybatch
object.
We think that the 3 steps above consume more memory.
We think that if the affybatch object would have a method to load or
replace just 1 column of the matrix (that should be a CEL file) the
amount of memory necesary to load all data would be significant lower.
This is because in the current process of read.affybatch the amount of
necesary memory is twice than the final affybatch object really
consume.
We couldn't load the 260 cel files but 210 was successfully loaded
following these steps:
1) use the routine called "vivo.read.affybatch" to load the data into
a matrix (see below).
2) save the data on 1
3) in a new session of R, we load the saved data on 2 and used to
create an affybatch object.
4) the affybatch object from 3 was saved
5) in a new session or R, we load the affybatch saved on 4
6) proceed with the analysis.
The memory used after step 5) was the half than memory used after the
step 4).
Any comments are welcome,
Regards,
Victor Trevino-Alvarado
vmt359@bham.ac.uk
# for reference, this is closer to a "copy-paste version" of the
read.affybatch routine
library(affy)
vivo.read.affybatch <- function (..., filenames = character(0),
phenoData =
new("phenoData"),
description = NULL, notes = "", compress =
getOption("BioC")$affy$compress.cel,
rm.mask = FALSE, rm.outliers = FALSE, rm.extra = FALSE, verbose =
FALSE)
{
auxnames <- as.list(substitute(list(...)))[-1]
filenames <- .Primitive("c")(filenames, auxnames)
n <- length(filenames)
if (n == 0)
stop("No file name given !")
pdata <- pData(phenoData)
if (dim(pdata)[1] != n) {
warning("Incompatible phenoData object. Created a new one.\n")
samplenames <- sub("^/?([^/]*/)*", "", unlist(filenames),
extended = TRUE)
pdata <- data.frame(sample = 1:n, row.names = samplenames)
phenoData <- new("phenoData", pData = pdata, varLabels =
list(sample = "arbitrary numbering"))
}
}
else samplenames <- rownames(pdata)
if (is.null(description)) {
description <- new("MIAME")
description@preprocessing$filenames <- filenames
description@preprocessing$affyversion <- library(help =
affy)$info[[2]][[2]][2]
}
if (verbose)
cat(1, "reading", filenames[[1]], "...")
cel <- read.celfile(filenames[[1]], compress = compress,
rm.mask = rm.mask, rm.outliers = rm.outliers, rm.extra =
rm.extra)
if (verbose)
cat("done.\n")
dim.intensity <- dim(intensity(cel))
ref.cdfName <- cel@cdfName
if (verbose) cat("Instanciating the array...")
ival <- array(0, dim = c(prod(dim.intensity), n), dimnames =
list(NULL, samplenames)) #intensity(conty)
cat("done!\n")
ival[, 1] <- c(intensity(cel))
for (i in (1:n)[-1]) {
if (verbose)
cat(i, "reading", filenames[[i]], "...")
cel <- read.celfile(filenames[[i]], compress = compress,
rm.mask = rm.mask, rm.outliers = rm.outliers, rm.extra =
rm.extra)
if (any(dim(intensity(cel)) != dim.intensity))
stop(paste("CEL file dimension mismatch !\n(file",
filenames[[i]], ")"))
if (verbose)
cat("done.\n")
if (cel@cdfName != ref.cdfName)
warning(paste("\n***\nDetected a mismatch of the cdfName:
found ",
cel@cdfName, ", expected ", ref.cdfName, "\nin file number
",
i, " (", filenames[[i]], ")\n", "Please make sure all
cel files belong to the same chip type!\n***\n",
ival[, i] <- c(intensity(cel))
}
if (verbose)
cat(paste("instanciating an AffyBatch (intensity a ",
prod(dim.intensity), "x", length(filenames), "
matrix)...",
sep = ""))
if (verbose)
cat("done.\n")
return(ival)
}
#-------------- Step 1 & 2 (new session)
values <- vivo.read.affybatch(<your tipical="" parameters="">)
save(values, file="values.RData", compress=T)
#------------- Step 3 & 4 (new session)
load("values.RData")
ab <- new("AffyBatch", exprs = values, cdfName = "your cdf",
phenoData = a new phenoData, nrow = dim(values)[1], ncol =
dim(values)[2])
save(ab, file="ab.RData", compress=T)
#------------- Step 5 (new session)
load("ab.RData")
# Proceed with your analysis