Question

R hthgu133afrmavecs_1.1.0 error: row.effects should sum to zero

0

Entering edit mode

Matthew McCall ▴ 830

@matthew-mccall-4459

Last seen 4.9 years ago

United States

Ben, I'm occasionally running into problems with the preprocessCore package's rcModelWPLM function when using it as part of frma (for the most recent example, see Peter's email copied below). This is the line that is throwing an error: if (abs(sum(row.effects)) > 10 * .Machine$double.eps) { stop("row.effects should sum to zero") } This issue seems to happen when I precompute and freeze the probe effects (here "row.effects"). Each of these has an error of max size .Machine$double.eps, so it is possible for a probe set with more than 10 probes to produce a value of "abs(sum(row.effects))" greater than "10 * .Machine$double.eps". I believe this would be solved by replacing "10" with "length(row.effects)". Thank you, Matt On Thu, Jul 3, 2014 at 6:50 AM, Audano III, Peter A <paudano@gatech.edu> wrote: > In frmaRobReg.R, I modified the function in lapply (around line 98) to > output each probe and index as it processes them: > > fit <- lapply(1:length(S), function(i) { > message(paste("Probe Group: ", names(S)[i]), " (", i, ")", sep="") > s <- S[[i]] > frma:::rwaFit2(pms[s,, drop=FALSE], w[s], input.vecs$probeVec[s], > input.vecs$probesetSD[i]) > }) > > Output: > Probe Group: AFFX-M27830_M_at (22246) > Probe Group: AFFX-PheX-3_at (22247) > Probe Group: AFFX-PheX-5_at (22248) > Probe Group: AFFX-PheX-M_at (22249) > Error in rcModelWPLM(y = x1, w = w.tmp, row.effects = pe.tmp, input.scale > = x4) : > row.effects should sum to zero > > # I set i to 22249 and setup the parameters for rwaFit2() > # In rwaFit2(), I see this: > > > pe.tmp > [1] -25.4713486 0.3569676 0.5434795 0.5476299 0.7854822 > 0.5656246 > [7] 0.6570181 0.1222252 -1.0729423 2.7573149 1.9201914 > 2.0823409 > [13] 3.6356608 3.6915274 2.5214547 2.8022633 1.9330730 > 1.5327069 > [19] 4.3232655 -4.2339352 > > > abs(sum(pe.tmp)) > [1] 2.609024e-15 > > > 10 * .Machine$double.eps > [1] 2.220446e-15 > > > The best explanation I can come up with is that a 64-bit architecture > might be storing intermediate calculations with more precision (It's a > guess, I am pretty new to R). That could be just enough to push the result > too far from the stringent threshold rcModelWPLM enforces. > > You could make this problem go away by implementing your own version of > rcModelWPLM and allowing for a little more error if it wouldn't > significantly affect results. It's a pretty short function. It might also > be a good idea to open an issue with the BioConductor maintainers and ask > if they can make that "10" a tunable parameter. > > If the assumptions are correct, then you might break 32-bit environments > if you recompute the vectors in a 64-bit environment. I suspect it's too > fragile as it is. Floating point calculations are not exact and should > probably be given a little more "wiggle-room". > > > > ----- Original Message ----- > From: "Matthew McCall" <mccallm@gmail.com> > To: "Audano III, Peter A" <paudano@gatech.edu> > Sent: Wednesday, July 2, 2014 5:26:13 PM > Subject: Re: R hthgu133afrmavecs_1.1.0 error: row.effects should sum to > zero > > Hmm. Guess that's not it then: > > > .Machine$double.eps > [1] 2.220446e-16 > > Not sure why you're getting an error and I'm not. The only difference I see > is I'm running 32-bit linux: > R version 3.1.0 (2014-04-10) -- "Spring Dance" > Copyright (C) 2014 The R Foundation for Statistical Computing > Platform: i686-pc-linux-gnu (32-bit) > > Best, > Matt > > > > On Wed, Jul 2, 2014 at 5:23 PM, Audano III, Peter A <paudano@gatech.edu> > wrote: > > > I will work on it a little and see if I can dig anything up. > > > > > .Machine$double.eps > > [1] 2.220446e-16 > > > > - Pete > > > > ------------------------------ > > *From: *"Matthew McCall" <mccallm@gmail.com> > > *To: *"Audano III, Peter A" <paudano@gatech.edu> > > *Sent: *Wednesday, July 2, 2014 2:40:13 PM > > *Subject: *Re: R hthgu133afrmavecs_1.1.0 error: row.effects should sum to > > zero > > > > > > Peter, > > > > This error comes up fairly often despite my numerous attempts to address > > it. It often represents a version mismatch, but that doesn't appear to be > > the case here (at least as far as I can tell). However, I just ran frma > on > > the first 5 CEL files from the data set you're interested in and didn't > get > > an error. > > > > This is the line throwing the error (it's in the preprocessCore package, > > so I can't change anything there): > > if (abs(sum(row.effects)) > 10 * .Machine$double.eps) > > stop("row.effects should sum to zero") > > > > I could be that .Machine$double.eps differs between the machine on which > I > > built the hthgu133afrmavecs and the one you're using. If you're > comfortable > > hacking the R code a bit, you could run frma line by line and find which > > probeset throws the error -- then see what the corresponding probe > effects > > actually sum to -- my guess would be something tiny but greater than > > 10*.Machine$double.eps. > > > > Sorry I couldn't be more help. > > > > Best, > > Matt > > > > > > > > > > On Wed, Jul 2, 2014 at 1:14 PM, Audano III, Peter A <paudano@gatech.edu> > > wrote: > > > >> Matthew McCall, > >> > >> I am attempting to run frma on a a set of CEL files annotated as > >> hthgu133a. Running frma on the data, I get the following error: > >> > >> Error in rcModelWPLM(y = x1, w = w.tmp, row.effects = pe.tmp, > input.scale > >> = x4) : > >> row.effects should sum to zero > >> > >> I have been able to replicate on several machines. > >> > >> The data set is large (789 CEL files), but I can also replicate with a > >> subset of them. > >> > >> For on test, I tried using hgu133afrmavecs instead of hthgu133afrmavecs, > >> and it worked. I believe there is a bug in hgu133afrmavecs. > >> > >> Can you look into this for me? I am trying to replicate someone's > results > >> who used frma with hthgu133afrmavecs. > >> > >> The CEL files are from CGP: > >> * Download from: > >> http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-783/ > >> * All 9 E-MTAB-783.raw.*.zip > >> > >> Note: Any subset of this data can be used. If you download one zip file, > >> you should get the same results. To speed up the the process, try > replacing > >> the ReadAffy line (see below) with "abatch <- > >> ReadAffy(filenames=fileList[1:5])", which will only use 5 of the CEL > files. > >> > >> After downloading the zip files, the remainder of this message is the > >> list of commands (extracting files and the R session) I used to > reproduce > >> this issue. sessionInfo() is included in the output. > >> > >> > >> $ ls > >> ./ E-MTAB-783.raw.3.zip E-MTAB-783.raw.7.zip > >> ../ E-MTAB-783.raw.4.zip E-MTAB-783.raw.8.zip > >> E-MTAB-783.raw.1.zip E-MTAB-783.raw.5.zip E-MTAB-783.raw.9.zip > >> E-MTAB-783.raw.2.zip E-MTAB-783.raw.6.zip > >> > >> $ ls *.zip | xargs -I FILE unzip FILE > >> > >> > >> $ ls *.CEL | wc -l > >> 789 > >> > >> $ R --no-save > >> > >> R version 3.1.0 (2014-04-10) -- "Spring Dance" > >> Copyright (C) 2014 The R Foundation for Statistical Computing > >> Platform: x86_64-redhat-linux-gnu (64-bit) > >> > >> R is free software and comes with ABSOLUTELY NO WARRANTY. > >> You are welcome to redistribute it under certain conditions. > >> Type 'license()' or 'licence()' for distribution details. > >> > >> Natural language support but running in an English locale > >> > >> R is a collaborative project with many contributors. > >> Type 'contributors()' for more information and > >> 'citation()' on how to cite R or R packages in publications. > >> > >> Type 'demo()' for some demos, 'help()' for on-line help, or > >> 'help.start()' for an HTML browser interface to help. > >> Type 'q()' to quit R. > >> > >> > > >> > library(affy) > >> Loading required package: BiocGenerics > >> Loading required package: parallel > >> > >> Attaching package: 'BiocGenerics' > >> > >> The following objects are masked from 'package:parallel': > >> > >> clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, > >> clusterExport, clusterMap, parApply, parCapply, parLapply, > >> parLapplyLB, parRapply, parSapply, parSapplyLB > >> > >> The following object is masked from 'package:stats': > >> > >> xtabs > >> > >> The following objects are masked from 'package:base': > >> > >> Filter, Find, Map, Position, Reduce, anyDuplicated, append, > >> as.data.frame, as.vector, cbind, colnames, do.call, duplicated, > >> eval, evalq, get, intersect, is.unsorted, lapply, mapply, match, > >> mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind, > >> rep.int, rownames, sapply, setdiff, sort, table, tapply, union, > >> unique, unlist > >> > >> Loading required package: Biobase > >> Welcome to Bioconductor > >> > >> Vignettes contain introductory material; view with > >> 'browseVignettes()'. To cite Bioconductor, see > >> 'citation("Biobase")', and for packages 'citation("pkgname")'. > >> > >> > > >> > library(frma) > >> > > >> > fileList <- list.files(pattern="*.CEL") > >> > length(fileList) > >> [1] 789 > >> > > >> > abatch <- ReadAffy(filenames=fileList) > >> > abatch > >> AffyBatch object > >> size of arrays=744x744 features (211 kb) > >> cdf=HT_HG-U133A (22277 affyids) > >> number of samples=789 > >> number of genes=22277 > >> annotation=hthgu133a > >> notes= > >> > > >> > eset <- frma(abatch) > >> > >> Error in rcModelWPLM(y = x1, w = w.tmp, row.effects = pe.tmp, > input.scale > >> = x4) : > >> row.effects should sum to zero > >> In addition: Warning messages: > >> 1: replacing previous import by 'utils::head' when loading > 'hthgu133acdf' > >> 2: replacing previous import by 'utils::tail' when loading > 'hthgu133acdf' > >> > > >> > traceback() > >> 7: stop("row.effects should sum to zero") > >> 6: rcModelWPLM(y = x1, w = w.tmp, row.effects = pe.tmp, input.scale = > x4) > >> 5: rwaFit2(pms[s, , drop = FALSE], w[s], input.vecs$probeVec[s], > >> input.vecs$probesetSD[i]) > >> 4: FUN(1:22277[[22249L]], ...) > >> 3: lapply(1:length(S), function(i) { > >> s <- S[[i]] > >> rwaFit2(pms[s, , drop = FALSE], w[s], input.vecs$probeVec[s], > >> input.vecs$probesetSD[i]) > >> }) > >> 2: frmaRobReg(object, background, normalize, summarize, target, > >> input.vecs, output.param, verbose) > >> 1: frma(abatch) > >> > > >> > sessionInfo() > >> R version 3.1.0 (2014-04-10) > >> Platform: x86_64-redhat-linux-gnu (64-bit) > >> > >> locale: > >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C > >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > >> [9] LC_ADDRESS=C LC_TELEPHONE=C > >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > >> > >> attached base packages: > >> [1] parallel stats graphics grDevices utils datasets methods > >> [8] base > >> > >> other attached packages: > >> [1] hthgu133afrmavecs_1.1.0 hthgu133acdf_2.14.0 frma_1.16.0 > >> [4] affy_1.42.3 Biobase_2.24.0 BiocGenerics_0.10.0 > >> > >> loaded via a namespace (and not attached): > >> [1] AnnotationDbi_1.26.0 BiocInstaller_1.14.2 Biostrings_2.32.0 > >> [4] DBI_0.2-7 GenomeInfoDb_1.0.2 GenomicRanges_1.16.3 > >> [7] IRanges_1.22.9 MASS_7.3-33 RSQLite_0.11.4 > >> [10] XVector_0.4.0 affxparser_1.36.0 affyio_1.32.0 > >> [13] bit_1.1-12 codetools_0.2-8 ff_2.2-13 > >> [16] foreach_1.4.2 iterators_1.0.7 oligo_1.28.2 > >> [19] oligoClasses_1.26.0 preprocessCore_1.26.1 splines_3.1.0 > >> [22] stats4_3.1.0 tools_3.1.0 zlibbioc_1.10.0 > >> > >> > >> > # TRY WITH hgu133afrmavecs # > >> > library(hgu133afrmavecs) > >> > data(hgu133afrmavecs) > >> > eset <- frma(abatch, input.vecs=hgu133afrmavecs) > >> > > >> > eset > >> ExpressionSet (storageMode: lockedEnvironment) > >> assayData: 22277 features, 789 samples > >> element names: exprs, se.exprs > >> protocolData: none > >> phenoData > >> sampleNames: 5500024030401071707289.A01.CEL > >> 5500024030401071707289.A02.CEL ... 5500024052861011409506.H11.CEL > >> (789 total) > >> varLabels: sample > >> varMetadata: labelDescription > >> featureData: none > >> experimentData: use 'experimentData(object)' > >> Annotation: hthgu133a > >> > >> > >> -- > >> > >> Peter Audano > >> paudano@gatech.edu > >> > >> > > > > > > -- > > Matthew N McCall, PhD > > 112 Arvine Heights > > Rochester, NY 14611 > > Cell: 202-222-5880 > > > > > > > > > > -- > > > > Peter Audano > > paudano@gatech.edu > > > > > > > -- > Matthew N McCall, PhD > 112 Arvine Heights > Rochester, NY 14611 > Cell: 202-222-5880 > > -- > > Peter Audano > paudano@gatech.edu > > -- Matthew N McCall, PhD 112 Arvine Heights Rochester, NY 14611 Cell: 202-222-5880 [[alternative HTML version deleted]]

GO probe PROcess frma GO probe PROcess frma • 1.2k views

ADD COMMENT • link 9.8 years ago Matthew McCall ▴ 830