Entering edit mode
Timur Shtatland
▴
30
@timur-shtatland-2685
Last seen 10.3 years ago
Dear all,
I am looking for differentially expressed genes using multtest
package, MTP procedure. I am computing raw and adjusted t-test P
values (group 1 = 7 samples, group 2 = 3 samples) using bootstrap. R
is running out of memory ('Error: cannot allocate vector') when the
number of genes (5413) combined with the number of bootstrap
iterations (B=10000) produce a large matrix. I am running R on
PowerBook G4, Mac OS 10.4.11. See the code and the output below.
Are there any other possible solutions, except for 1-3 listed below?
For example, can I enable only part of the null matrix, rather than
the entire matrix, to be held in RAM at any given time?
The other possible solutions to this problem are:
1. Buy more memory - my choice #1 unless another solution is easily
available. I assume that MTP is trying to read the entire null matrix
into memory: 10000 iterations * 5413 genes * 10 samples * 8
bytes/(sample*gene*iteration) = 4.3 GB, and it obviously does not fit
into the current RAM (1.25 GB).
2. Use fewer bootstrap iterations - my choice #2, because with fewer
iterations many genes have raw P values equal to exactly 0:
https://stat.ethz.ch/pipermail/bioconductor/2008-March/021396.html
https://stat.ethz.ch/pipermail/bioconductor/2008-March/021436.html
3. Use fewer genes - my choice #3, because it is not clear exactly
what effect a more restrictive filter will have on the false positive
rate (the rate of truly differentially expressed genes that will be
filtered out). Currently I already use genefilter to reduce the
number of genes for MTP input from 22283 to 5413:
ffun <- filterfun(pOverA(p = 0.5, A = 100), cv(a = 0.3))
I was not running any process other than R at the time of the error,
to maximize available memory. The error occurs always at the end of
the bootstrap iterations (which take 18-24 hours).
I searched the MTP help page and its vignettes, as well as
Bioconductor mailing list archives. The solution 'buy more memory'
appears to be the most commonly suggested on this mailing list for
other assorted 'out of memory' problems, but I was wondering if there
is an easy way around it.
Thank you for your help.
Best regards,
Timur
--
Timur Shtatland, PhD
Center for Molecular Imaging Research
Massachusetts General Hospital
149 13th Street, Room 5408
Charlestown, MA 02129
tshtatland at mgh dot harvard dot edu
############################################################
## read *only* the gcrma-processed dataset (nothing else) into a new R
session
## to reduce the number of objects in memory:
> load("esetGcrma.rda")
...
> B=10000
...
> ffun <- filterfun(pOverA(p = 0.5, A = 100), cv(a = 0.3))
> filtered <- genefilter(2^exprs(esetGcrma), ffun)
...
> TTBoot <- MTP(X=esetGcrmaExprsFiltered, Y=TT, test =
"t.twosamp.unequalvar",
alternative = "two.sided", typeone="fdr", method="ss.maxT",
fdr.method="conservative", keep.nulldist = FALSE, B=B, seed=seed)
running bootstrap...
iteration = 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300
1400 1500
1600 1700 1800 1900 2000 2100 2200 2300 2400 2500 2600 2700 2800 2900
3000 3100
3200 3300 3400 3500 3600 3700 3800 3900 4000 4100 4200 4300 4400 4500
4600 4700
4800 4900 5000 5100 5200 5300 5400 5500 5600 5700 5800 5900 6000 6100
6200 6300
6400 6500 6600 6700 6800 6900 7000 7100 7200 7300 7400 7500 7600 7700
7800 7900
8000 8100 8200 8300 8400 8500 8600 8700 8800 8900 9000 9100 9200 9300
9400 9500
9600 9700 9800 9900 10000
Error: cannot allocate vector of size 413.0 Mb
R(202,0xa000ed88) malloc: *** vm_allocate(size=433041408) failed
(error code=3)
R(202,0xa000ed88) malloc: *** error: can't allocate region
R(202,0xa000ed88) malloc: *** set a breakpoint in szone_error to debug
2008-03-08 18:56:10.979 R[202] tossing reply message sequence 2 on
thread
0x4d2a110
>
> traceback()
7: apply(null, 2, max)
6: ss.maxT(nulldistn, obs, alternative, get.cutoff, get.cr, pind,
alpha)
5: MTP(X = esetGcrmaExprsFiltered, Y = TT, test =
"t.twosamp.unequalvar",
alternative = "two.sided", typeone = "fdr", method = "ss.maxT",
fdr.method = "conservative", keep.nulldist = FALSE, B = B,
seed = seed)
4: multTest(esetGcrma = esetGcrma, B = 10000)
3: eval.with.vis(expr, envir, enclos)
2: eval.with.vis(ei, envir)
1: source("~/bin/computeRestrictedMa2.R")
> sessionInfo()
R version 2.6.0 (2007-10-03)
powerpc-apple-darwin8.10.1
locale:
en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] splines tools stats graphics grDevices utils
datasets
methods base
other attached packages:
[1] multtest_1.18.0 genefilter_1.16.0 survival_2.32
Biobase_1.16.1
loaded via a namespace (and not attached):
[1] AnnotationDbi_1.0.6 DBI_0.2-4 RSQLite_0.6-4
annotate_1.16.1
> version
_
platform powerpc-apple-darwin8.10.1
arch powerpc
os darwin8.10.1
system powerpc, darwin8.10.1
status
major 2
minor 6.0
year 2007
month 10
day 03
svn rev 43063
language R
version.string R version 2.6.0 (2007-10-03)
> .Machine$sizeof.pointer == 8
[1] FALSE
> R.version$arch
[1] "powerpc"
> .Platform$r_arch
[1] "ppc"
The information transmitted in this electronic
communica...{{dropped:16}}