Question: R hthgu133afrmavecs_1.1.0 error: row.effects should sum to zero

0

Matthew McCall •

**830**wrote:Ben,
I'm occasionally running into problems with the preprocessCore
package's
rcModelWPLM function when using it as part of frma (for the most
recent
example, see Peter's email copied below). This is the line that is
throwing
an error:
if (abs(sum(row.effects)) > 10 * .Machine$double.eps) {
stop("row.effects should sum to zero")
}
This issue seems to happen when I precompute and freeze the probe
effects
(here "row.effects"). Each of these has an error of max size
.Machine$double.eps, so it is possible for a probe set with more than
10
probes to produce a value of "abs(sum(row.effects))" greater than "10
*
.Machine$double.eps". I believe this would be solved by replacing "10"
with
"length(row.effects)".
Thank you,
Matt
On Thu, Jul 3, 2014 at 6:50 AM, Audano III, Peter A
<paudano@gatech.edu>
wrote:
> In frmaRobReg.R, I modified the function in lapply (around line 98)
to
> output each probe and index as it processes them:
>
> fit <- lapply(1:length(S), function(i) {
> message(paste("Probe Group: ", names(S)[i]), " (", i, ")",
sep="")
> s <- S[[i]]
> frma:::rwaFit2(pms[s,, drop=FALSE], w[s],
input.vecs$probeVec[s],
> input.vecs$probesetSD[i])
> })
>
> Output:
> Probe Group: AFFX-M27830_M_at (22246)
> Probe Group: AFFX-PheX-3_at (22247)
> Probe Group: AFFX-PheX-5_at (22248)
> Probe Group: AFFX-PheX-M_at (22249)
> Error in rcModelWPLM(y = x1, w = w.tmp, row.effects = pe.tmp,
input.scale
> = x4) :
> row.effects should sum to zero
>
> # I set i to 22249 and setup the parameters for rwaFit2()
> # In rwaFit2(), I see this:
>
> > pe.tmp
> [1] -25.4713486 0.3569676 0.5434795 0.5476299 0.7854822
> 0.5656246
> [7] 0.6570181 0.1222252 -1.0729423 2.7573149 1.9201914
> 2.0823409
> [13] 3.6356608 3.6915274 2.5214547 2.8022633 1.9330730
> 1.5327069
> [19] 4.3232655 -4.2339352
>
> > abs(sum(pe.tmp))
> [1] 2.609024e-15
>
> > 10 * .Machine$double.eps
> [1] 2.220446e-15
>
>
> The best explanation I can come up with is that a 64-bit
architecture
> might be storing intermediate calculations with more precision (It's
a
> guess, I am pretty new to R). That could be just enough to push the
result
> too far from the stringent threshold rcModelWPLM enforces.
>
> You could make this problem go away by implementing your own version
of
> rcModelWPLM and allowing for a little more error if it wouldn't
> significantly affect results. It's a pretty short function. It might
also
> be a good idea to open an issue with the BioConductor maintainers
and ask
> if they can make that "10" a tunable parameter.
>
> If the assumptions are correct, then you might break 32-bit
environments
> if you recompute the vectors in a 64-bit environment. I suspect it's
too
> fragile as it is. Floating point calculations are not exact and
should
> probably be given a little more "wiggle-room".
>
>
>
> ----- Original Message -----
> From: "Matthew McCall" <mccallm@gmail.com>
> To: "Audano III, Peter A" <paudano@gatech.edu>
> Sent: Wednesday, July 2, 2014 5:26:13 PM
> Subject: Re: R hthgu133afrmavecs_1.1.0 error: row.effects should sum
to
> zero
>
> Hmm. Guess that's not it then:
>
> > .Machine$double.eps
> [1] 2.220446e-16
>
> Not sure why you're getting an error and I'm not. The only
difference I see
> is I'm running 32-bit linux:
> R version 3.1.0 (2014-04-10) -- "Spring Dance"
> Copyright (C) 2014 The R Foundation for Statistical Computing
> Platform: i686-pc-linux-gnu (32-bit)
>
> Best,
> Matt
>
>
>
> On Wed, Jul 2, 2014 at 5:23 PM, Audano III, Peter A
<paudano@gatech.edu>
> wrote:
>
> > I will work on it a little and see if I can dig anything up.
> >
> > > .Machine$double.eps
> > [1] 2.220446e-16
> >
> > - Pete
> >
> > ------------------------------
> > *From: *"Matthew McCall" <mccallm@gmail.com>
> > *To: *"Audano III, Peter A" <paudano@gatech.edu>
> > *Sent: *Wednesday, July 2, 2014 2:40:13 PM
> > *Subject: *Re: R hthgu133afrmavecs_1.1.0 error: row.effects should
sum to
> > zero
> >
> >
> > Peter,
> >
> > This error comes up fairly often despite my numerous attempts to
address
> > it. It often represents a version mismatch, but that doesn't
appear to be
> > the case here (at least as far as I can tell). However, I just ran
frma
> on
> > the first 5 CEL files from the data set you're interested in and
didn't
> get
> > an error.
> >
> > This is the line throwing the error (it's in the preprocessCore
package,
> > so I can't change anything there):
> > if (abs(sum(row.effects)) > 10 * .Machine$double.eps)
> > stop("row.effects should sum to zero")
> >
> > I could be that .Machine$double.eps differs between the machine on
which
> I
> > built the hthgu133afrmavecs and the one you're using. If you're
> comfortable
> > hacking the R code a bit, you could run frma line by line and find
which
> > probeset throws the error -- then see what the corresponding probe
> effects
> > actually sum to -- my guess would be something tiny but greater
than
> > 10*.Machine$double.eps.
> >
> > Sorry I couldn't be more help.
> >
> > Best,
> > Matt
> >
> >
> >
> >
> > On Wed, Jul 2, 2014 at 1:14 PM, Audano III, Peter A
<paudano@gatech.edu>
> > wrote:
> >
> >> Matthew McCall,
> >>
> >> I am attempting to run frma on a a set of CEL files annotated as
> >> hthgu133a. Running frma on the data, I get the following error:
> >>
> >> Error in rcModelWPLM(y = x1, w = w.tmp, row.effects = pe.tmp,
> input.scale
> >> = x4) :
> >> row.effects should sum to zero
> >>
> >> I have been able to replicate on several machines.
> >>
> >> The data set is large (789 CEL files), but I can also replicate
with a
> >> subset of them.
> >>
> >> For on test, I tried using hgu133afrmavecs instead of
hthgu133afrmavecs,
> >> and it worked. I believe there is a bug in hgu133afrmavecs.
> >>
> >> Can you look into this for me? I am trying to replicate someone's
> results
> >> who used frma with hthgu133afrmavecs.
> >>
> >> The CEL files are from CGP:
> >> * Download from:
> >> http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-783/
> >> * All 9 E-MTAB-783.raw.*.zip
> >>
> >> Note: Any subset of this data can be used. If you download one
zip file,
> >> you should get the same results. To speed up the the process, try
> replacing
> >> the ReadAffy line (see below) with "abatch <-
> >> ReadAffy(filenames=fileList[1:5])", which will only use 5 of the
CEL
> files.
> >>
> >> After downloading the zip files, the remainder of this message is
the
> >> list of commands (extracting files and the R session) I used to
> reproduce
> >> this issue. sessionInfo() is included in the output.
> >>
> >>
> >> $ ls
> >> ./ E-MTAB-783.raw.3.zip E-MTAB-783.raw.7.zip
> >> ../ E-MTAB-783.raw.4.zip E-MTAB-783.raw.8.zip
> >> E-MTAB-783.raw.1.zip E-MTAB-783.raw.5.zip E-MTAB-783.raw.9.zip
> >> E-MTAB-783.raw.2.zip E-MTAB-783.raw.6.zip
> >>
> >> $ ls *.zip | xargs -I FILE unzip FILE
> >>