Question about 'updateCel' from package 'affxparser'
0
0
Entering edit mode
@henrik-bengtsson-4333
Last seen 14 months ago
United States
Hi, sorry for the late response on this. Yes, if there is a bug, it could to be related updateCel() of affxparser. Though, I cross my fingers that there is another reason for all this - something that is not a bug but a simple mistake. I am aware of the note in help("seek") saying "Use of seek on Windows is discouraged. We have found so many errors in the Windows implementation of file positioning that users are advised to use it only at their own risk, and asked not to waste the R developers' time with bug reports on Windows' deficiencies.". Unfortunately, there is no other option than using seek(), but of course, it should not give errors. FYI, updateCel() has been used a lots on Windows for many years without any issues, although I acknowledge that issues as those you are reporting could indeed be missed. One way to know that the problem is with updateCel() you could add the following, e.g. for (j in 1:9) { filename <- filenames[j]; y <- exp(output[,j]); # Update CEL file updateCel(filename, indices=indices, intensities=y); # Assert correctness y2 <- readCel(filename, indices=indices, readIntensities=TRUE, readHeader=FALSE, readStdvs=FALSE, readPixels=FALSE, readXY=FALSE, readOutliers=FALSE, readMasked=FALSE)$intensities; if (!all.equal(y2, y)) { # Report the data we tried to write str(list(filename=filename, indices=indices, intensities=y, rereadIntensities=y2)); stop("Error in updateCel()!"). } } # for (j ...) One could do an even fancier version where one compare the complete CEL file before and after, which then would detect incorrectly written values outside the file areas that 'indices' points to. Also before troubleshooting more, what is your sessionInfo()? /Henrik (co-author of affxparser) On Fri, Jul 29, 2011 at 1:27 AM, Steve Pederson <stephen.pederson at="" adelaide.edu.au=""> wrote: > Hi, > > > > I'm building an array analysis package using an MCMC process for each gene & > am writing the key values (posterior quantiles, mean, sd etc) for each gene > to disk using CEL files and the aroma.affymetrix architecture. However I've > noticed that on rare occasion, the incorrect values are written to the CEL > files. > > > > As an example of what goes wrong, if I have a vector of 8 values (e.g. > intensities=1:8) and am updating 8 cells (e.g. indices=221632:221639) with > these 8 values, occasionally the first value (i.e. intensities[1]) would be > incorrectly given to all 8 cells on the CEL file. This has happened in an > obvious manner only to 2 different genes in 2 of 3 datasets & I can't spot a > connection in the data structures that may cause the misfire. If I can see > these 2 obvious problems, I'm unsure as to what less obvious errors there > may be lurking in the dataset & hence my confidence in the results is > somewhat diminished. > > > > The code from the relevant section of my function (where I write batches of > genes at a time) is: > > monoUgcMap <- getUnitGroupCellMap(monoCdf, unit=batchUnits) # Get the > UnitGroupCellMap from a monocell cdf for the current batch units > > indices <- monoUgcMap$cell > > for (j in 1:9) { # The output after the MCMC process will always have 9 > columns, which are written to separate CEL files > > ? ? ?updateCel(filenames[j], indices=indices, intensities=exp(output[,j])) > #update the CEL file for the current batch > > } > > > > I am wondering if the issue arises from the usage of 'seek' within the > function 'updateCel', as I am on a Windows 7 x64 system? The line I'm > referring to ?from the 'updateCel' source code is: > > > > seek(con, origin="start", where=offset, rw="read") > > > > Reading the 'seek' help page didn't really shed much light (to my flawed > brain) on what this step does, but I did spot the warning about this > function under Windows. ?I'm still a bit unclear as to what this step does, > so am looking for either a clarification as to the point of this line of > code, or any suggestions for a simple workaround. > > > > As it appears to be a random & rare misfire, I'll have to make changes & run > a few datasets (over multiple days) just to check it so may be a bit slow > with any responses... > > > > Thanks in advance, > > > > Steve > > > ? ? ? ?[[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
cdf PROcess affxparser cdf PROcess affxparser • 728 views