Hi Gavin,
I'm sending this to the BioC list as well so there's a record for
anyone
else looking for advice.
The code you've suggested isn't quite what I had in mind. In this
case, I
was suggesting the combine() function should be used on the beadData
objects
after the weights have been set. Hopefully the pseudo code below will
be
understandable (anything in quotes needs to be specified for your
data).
So on our cluster I would run multiple R sessions. The I would read a
subset of the arrays in each session so:
For sesssion 1:
beadData1 <- readIllumina("sections 1 : N")
For session 2:
beadData2 <- readIllumina("sections N+1 : 2N")
etc...
I find sticking to a BeadChip per R session is convenient, mostly due
to the
folder structure produced by BeadScan.
Then in each R session you can run something like the following:
..."Any pre-BASH processing you like"....
BASHoutput <- BASH(beadData, array = 1:N)
beadData <- setWeights(beadData, bashOutput$wts, array=1:N)
save(beadData, file = "beadData session N")
quit...
I'd then open a new R session and load the various beadData objects.
You can then combine them with:
beadData <- combine(beadData1, beadData2).
If you have more than two you'll probably need a loop, I don't think
our
combine function takes more than two at a time, but it's worth
checking the
man page for that.
You should then have one large beadData object with all the arrays and
BASH
weights. As an alternative, you could skip the combining step, don't
close
the separate R session and do any further processing right up to the
summarization step. I think I'm right in saying none of the QC, QA
etc
requires information between chips, so each can be process
independently.
That's probably all a bit messy, but feel free to ask any more
questions.
Mike
On Tue, Apr 5, 2011 at 11:25 PM, Gavin Koh <gavin.koh@gmail.com>
wrote:
> Dear Mike,
> Thanks for replying so quickly.
> R exits and throws me back to the system prompt.
>
> I'll try running array 53 alone first to see if that is the problem.
> If that is not the problem, then I would like to try breaking it up
> into batches as you suggest.
> I've not used the bioBase combine() function before, but looking at
> the help file, I would think that I could do
>
> bashOutput1 <- BASH(beadData, array=1:12)
> bashOutput2 <- BASH(beadData, array=13:24)
> .
> .
> .
> bashOutput <- combine(bashOutput1, bashOutput2,...bashOutputn)
> beadData <- setWeights(beadData, bashOutput$wts, array=1:n)
>
> Am I right?
>
> Thanks,
>
> Gavin.
>
> On 5 April 2011 13:30, Mike Smith <grimbough@gmail.com> wrote:
> > Hi Gavin,
> >
> > I'm afraid that particular error means nothing to me. Does R exit
too,
> or
> > does BASH stop and return you to an interactive session?
> >
> > I found this very old post on R exit codes
> > (
http://tolstoy.newcastle.edu.au/R/help/02b/3168.html), which may
be
> > relevant but I'm speculating at the moment.
> >
> > Is there anything particularly unusual with the 53rd array? If
you try
> to
> > BASH that array in isolation e.g. BASHoutput <- BASH(beadData,
array=53)
> > does it proceed ok?
> >
> > If it is a memory problem then it may be worth waiting for the
next
> > Bioconductor release in about a week. I recently discovered a
memory
> leak
> > and a small bug that could cause a segfault in the BASH C code,
which
> I've
> > patched in the developmental version. I conducted a test this
morning
> with
> > 4 HumanV3 sections and the memory leak was about 100MB, which
isn't
> ideal,
> > but with a 16GB limit I'd have thought you'd have enough head room
not to
> be
> > affected by it.
> >
> > Personally I've never tried to BASH so many sections in one go,
but
> there's
> > reason it shouldn't work (memory and time permitting). What we
tend to
> do
> > is read a smaller number of sections (say a single BeadChip) into
an R
> > session and process each in separately. We then save each
separate
> object
> > once it's been processed, load them all into a new R session and
use the
> > combine() function to create a single beadLevelData object. That
way it
> can
> > be done in sort of coarse grained parallel.
> >
> > As far as making it parallel in a more friendly way, that's
something
> we're
> > working on, but it's not an imminent release.
> >
> > I hope that's some help,
> >
> >
> > On Mon, Apr 4, 2011 at 9:51 PM, Gavin Koh <gavin.koh@gmail.com>
wrote:
> >>
> >> I have 60 samples which were run on an Illumina HumanWG-6 v3.0
> >> Expression BeadChip (so 120 sections) and I am doing the
> >> pre-processing using beadarray.
> >>
> >> I am trying to generate spatial masks using BASH(). I have
> >> successfully run a smaller analysis (one slide of 12 sections) on
my
> >> MacBook OSX Snow Leopard with 4Gb RAM using beadarray 2.7.
> >>
> >> The command I used to call BASH was:
> >> BASHoutput <- BASH(beadData, array=1:n)
> >>
> >> I am running the full analysis (120 sections) on a computing
cluster
> >> (lustre). I have only requested a single core with 16Gb RAM,
because I
> >> don't know how to get BASH() to multithread (although in theory
it
> >> ought to be possible? it is a repetitive process after all). I
cannot
> >> get the script past 53 sections, without bash() terminating with
exit
> >> code "user code 2". Doesn't matter if I am running it
interactively or
> >> calling R CMD BATCH. I don't know what the exit code means, so I
don't
> >> know how to fix it. I don't think it is out of memory, because
lustre
> >> has other codes for reporting out-of-memory and R usually reports
> >> out-of-memory errors as "cannot allocate vector of size..."?
Also, the
> >> previous time it ran out of memory (when I tried 6 Gb RAM), it
was
> >> lustre that terminated the process.
> >>
> >> I don't know if the problem is that BASH() cannot handle so many
> >> sections. If that is in fact the problem, then there are two
solutions
> >> I can think of: 1. get BASH() to run multithreaded, or 2. run
BASH()
> >> on selected sections only.
> >>
> >> On inspection of the pseudoimages, I can see there are only two
> >> sections of the 120 with obvious spatial defects (they look like
> >> scratches). Is it possible to find outliers on the other sections
> >> using the usual (faster) method (>3MAD) and then just use BASH()
for
> >> the two sections that are defective only? or...is there a tool to
just
> >> draw the masks myself??
> >>
> >> Thanks in advance,
> >>
> >> Gavin
> >>
> >> sessionInfo() reports:
> >> R version 2.12.0 (2010-10-15)
> >> Platform: x86_64-unknown-linux-gnu (64-bit)
> >>
> >> locale:
> >> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
> >> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
> >> [5] LC_MONETARY=C LC_MESSAGES=C
> >> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
> >> [9] LC_ADDRESS=C LC_TELEPHONE=C
> >> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
> >>
> >> attached base packages:
> >> [1] stats graphics grDevices utils datasets methods
base
> >>
> >> other attached packages:
> >> [1] beadarray_2.0.6 Biobase_2.10.0
> >>
> >> loaded via a namespace (and not attached):
> >> [1] limma_3.6.6
> >>
> >> --
> >> Hofstadter's Law: It always takes longer than you expect, even
when
> >> you take into account Hofstadter's Law.
> >> Douglas Hofstadter (in Gödel, Escher, Bach, 1979)
> >>
> >> _______________________________________________
> >> Bioconductor mailing list
> >> Bioconductor@r-project.org
> >>
https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> Search the archives:
> >>
http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> >
> > --
> > Mike Smith
> > PhD Student
> > Computational Biology Group
> > Cambridge University
> >
>
>
>
> --
> Hofstadter's Law: It always takes longer than you expect, even when
> you take into account Hofstadter's Law.
> Douglas Hofstadter (in Gödel, Escher, Bach, 1979)
>
--
Mike Smith
PhD Student
Computational Biology Group
Cambridge University
[[alternative HTML version deleted]]