Entering edit mode
Hi,
your not the first one. A few months ago I transfered a large data
set via an external HDD and like you it took a long time to notice the
fact that some CEL files were corrupt - some how the CEL files were
still valid and read just file. It was just some probe intensities
that had ridiculous large values. I used MD5 on the files to identify
which files were corrupted.
As Seth suggested, the digest() function in the 'digest' package can
be used for this.
FYI: In August I will release aroma.affymetrix for analyzing small to
very large Affymetrix data sets etc etc. Since I was bitten by the
above bug, I added methods for generating and validating sets of CEL
files via MD5.
Cheers
Henrik
On 5/31/07, Hooiveld, Guido <guido.hooiveld at="" wur.nl=""> wrote:
> Hi List,
>
> Does anyone know of a package/tool/script that allows checking the
integrity of (Affymetrix CEL) files?? [e.g. using comparisons of MD5
checksums]?
>
> I am asking because when transferring a data set via FTP
unexpectedly a CEL file became corrupt. Upon uploading the files are
automatically analyzed in our pipeline. It took us quite some time to
find out that the problem was caused by one faulty file out of 16 (and
not something else).
>
>
> > data <- ReadAffy()
> Error in read.affybatch(filenames = l$filenames, phenoData =
l$phenoData, :
> Is D:/Guido/A42_7_Int_ko_wy.CEL really a CEL file? tried
reading as text, gzipped text and binary
> >
>
>
> This is the first time it happened to us, but now I realized that it
would be very useful if after transferring the integrity of the CEL
file could be checked, allowing the immediate identification of
corrupt files.
>
> Thanks,
> Guido
>
> ------------------------------------------------
> Guido Hooiveld, PhD
> Nutrition, Metabolism & Genomics Group
> Division of Human Nutrition
> Wageningen University
> Biotechnion, Bomenweg 2
> NL-6703 HD Wageningen
> the Netherlands
>
> internet: http://nutrigene.4t.com
> email: guido.hooiveld at wur.nl
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>