Unable to load in Affymetrix data CEL files with readCelHeader
1
0
Entering edit mode
Matt K ▴ 20
@matt-k-2796
Last seen 10.3 years ago
I am having problems reading in some publicly available chromosome X titration Nsp chip CEL files. The data are available from the Affymetrix website: http://www.affymetrix.com/support/technical/sample_data/copy_number_da ta.affx I have not modified the data in anyway. Here is what happens when I try to read the data: > library(affxparser) > path <- "./rawData/3X/Mapping250K_Nsp/" > pathnames <- list.files(path=path, pattern="[.](cel|CEL)$", full.names=TRUE) > pathnames [1] "./rawData/3X/Mapping250K_Nsp//NA04626_NSP_R1.CEL" [2] "./rawData/3X/Mapping250K_Nsp//NA04626_NSP_R2.CEL" [3] "./rawData/3X/Mapping250K_Nsp//NA04626_NSP_R3.CEL" [4] "./rawData/3X/Mapping250K_Nsp//NA04626_NSP_R4.CEL" > hdr <- readCelHeader(pathnames[1]) terminate called after throwing an instance of 'affymetrix_calvin_exceptions::UnableToOpenFileException' Process R aborted at Tue May 13 15:01:55 2008 As you see R aborts. The same failure happens when I try to load in any of the other CEL files. My R session info is: > sessionInfo() R version 2.7.0 Under development (unstable) (2008-01-21 r44087) x86_64-unknown-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US .UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US. UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8 ;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base Thanks for any help. Matt [[alternative HTML version deleted]]
• 1.5k views
ADD COMMENT
0
Entering edit mode
rgentleman ★ 5.5k
@rgentleman-7725
Last seen 9.6 years ago
United States
Hi Matt, Matt K wrote: > I am having problems reading in some publicly available chromosome X > titration Nsp chip CEL files. The data are available from the Affymetrix > website: > > http://www.affymetrix.com/support/technical/sample_data/copy_number_ data.affx I did not see any file there that obviously contained the CEL files, can you say what you downloaded (provided the questions below don't solve your problem). > > I have not modified the data in anyway. Here is what happens when I try to > read the data: > >> library(affxparser) >> path <- "./rawData/3X/Mapping250K_Nsp/" >> pathnames <- list.files(path=path, pattern="[.](cel|CEL)$", > full.names=TRUE) >> pathnames > [1] "./rawData/3X/Mapping250K_Nsp//NA04626_NSP_R1.CEL" > [2] "./rawData/3X/Mapping250K_Nsp//NA04626_NSP_R2.CEL" > [3] "./rawData/3X/Mapping250K_Nsp//NA04626_NSP_R3.CEL" > [4] "./rawData/3X/Mapping250K_Nsp//NA04626_NSP_R4.CEL" >> hdr <- readCelHeader(pathnames[1]) > terminate called after throwing an instance of > 'affymetrix_calvin_exceptions::UnableToOpenFileException' That suggests that you may not have read permission on them. Did you check and see if you could open those files with any other tool/editor? You could just try and open them and read them using standard R tools to see if that works (if they are binary CEL files then you will just get junk from readLines, but that isn't the issue, you just want to know if they can be opened from R). Your R is out of date, and this sessionInfo is not correct, as you should have had affxparser attached and it is not. Please don't do that, mixing and matching error messages and sessionInfo output makes life hard for anyone that wants to help. Step one is to update R and BioC... > > Process R aborted at Tue May 13 15:01:55 2008 > > As you see R aborts. The same failure happens when I try to load in any of > the other CEL files. My R session info is: > >> sessionInfo() > R version 2.7.0 Under development (unstable) (2008-01-21 r44087) > x86_64-unknown-linux-gnu > > locale: > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_ US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_U S.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF -8;LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > Thanks for any help. > > Matt > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org
ADD COMMENT
0
Entering edit mode
Hi, On Tue, May 13, 2008 at 12:31 PM, Robert Gentleman <rgentlem at="" fhcrc.org=""> wrote: > Hi Matt, > > > Matt K wrote: > > > I am having problems reading in some publicly available chromosome X > > titration Nsp chip CEL files. The data are available from the Affymetrix > > website: > > > > > http://www.affymetrix.com/support/technical/sample_data/copy_number_ data.affx > > > > I did not see any file there that obviously contained the CEL files, can > you say what you downloaded (provided the questions below don't solve your > problem). FYI/for the record, they are in the *.DTT files (basically a *.zip archive split in many files). It is quite tedious to extract the files if you don't have the right tools, but it is still possible with a basic Unix setup. See http://groups.google.com/group/aroma-affymetrix/web/mapping250k-nsp- mapping250k-sty and the reference to Page 'Affymetrix multi-part DTT/ZIP archives' for the details. > > > > > > > I have not modified the data in anyway. Here is what happens when I try to > > read the data: > > > > > > > library(affxparser) > > > path <- "./rawData/3X/Mapping250K_Nsp/" > > > pathnames <- list.files(path=path, pattern="[.](cel|CEL)$", > > > > > full.names=TRUE) > > > > > pathnames > > > > > [1] "./rawData/3X/Mapping250K_Nsp//NA04626_NSP_R1.CEL" > > [2] "./rawData/3X/Mapping250K_Nsp//NA04626_NSP_R2.CEL" > > [3] "./rawData/3X/Mapping250K_Nsp//NA04626_NSP_R3.CEL" > > [4] "./rawData/3X/Mapping250K_Nsp//NA04626_NSP_R4.CEL" > > > > > hdr <- readCelHeader(pathnames[1]) > > > > > terminate called after throwing an instance of > > 'affymetrix_calvin_exceptions::UnableToOpenFileException' That is sometime how 'affxparser' responds to corrupt files. There is currently no exception handling at the native-code level in affxparser causing it to core dump on bad files. It's on the to do list but with low priority. FYI, I've got those files as well and I can read the perfectly well using affxparser v1.11.3: library(affxparser); path <- "rawData/Affymetrix_2006-Chromosome_X/Mapping250K_Nsp/"; pathnames <- list.files(pattern="[.]CEL$", path=path, full.names=TRUE); hdrs <- lapply(pathnames, readCelHeader); Since it is quite tricky that to extract the CEL files from the DTT files, you might have got something wrong there. You might also have downloaded the DTT, D02, D03, ... files in text mode and not binary mode (adding/removing extract bytes). My notes on http://groups.google.com/group/aroma-affymetrix/web/affymetrix-multi- part-dtt-zip-archives might shine some light on your problem. For your troubleshooting, here is are some details (make sure to have the latest version of 'digest' installed): library(digest); x <- lapply(pathnames, FUN=function(pathname) { c(basename(pathname), file.info(pathname)$size, digest(file=pathname)) }); x <- as.data.frame(matrix(unlist(x), ncol=3, byrow=TRUE)); colnames(x) <- c("filename", "bytes", "md5"); print(x); filename bytes md5 1 NA01416_NSP_R1.CEL 65743954 edce95d22481a133bcadb4faa79eb8d5 2 NA01416_NSP_R2.CEL 65701910 89291d7fb32b43ce6a9c83716d3db747 3 NA01416_NSP_R3.CEL 65727988 fba34620e18b6b3de8ff3a394ed0e313 4 NA01416_NSP_R4.CEL 65730078 0b26c2ac467fda36182855eca1e005e5 5 NA04626_NSP_R1.CEL 65693986 e720285a271506ea79bcc067feb90066 6 NA04626_NSP_R2.CEL 65725339 e7934222ecd4b0f9f46bcea27aa53549 7 NA04626_NSP_R3.CEL 65703014 7300b841c8b7eafdd73f6272de7e551d 8 NA04626_NSP_R4.CEL 65712493 64a9c11f8bc268ea17632f23264f0be1 9 NA06061_NSP_R1.CEL 65741243 51cae8fcdf3761ef4ce5d68ef980dfce 10 NA06061_NSP_R2.CEL 65703467 01a19363f226765b5c3b10cd5607dc98 11 NA06061_NSP_R3.CEL 65721991 9fdc58f8036457caa3551bc9eb8cd046 12 NA06061_NSP_R4.CEL 65712766 e7fa2d9adf599fd29b80c121b7e4dfe7 Hope this helps Henrik > > > > That suggests that you may not have read permission on them. Did you check > and see if you could open those files with any other tool/editor? > > You could just try and open them and read them using standard R tools to > see if that works (if they are binary CEL files then you will just get junk > from readLines, but that isn't the issue, you just want to know if they can > be opened from R). > > Your R is out of date, and this sessionInfo is not correct, as you should > have had affxparser attached and it is not. Please don't do that, mixing > and matching error messages and sessionInfo output makes life hard for > anyone that wants to help. Step one is to update R and BioC... > > > > > > > > Process R aborted at Tue May 13 15:01:55 2008 > > > > As you see R aborts. The same failure happens when I try to load in any of > > the other CEL files. My R session info is: > > > > > > > sessionInfo() > > > > > R version 2.7.0 Under development (unstable) (2008-01-21 r44087) > > x86_64-unknown-linux-gnu > > > > locale: > > > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_ US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_U S.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF -8;LC_IDENTIFICATION=C > > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > > > Thanks for any help. > > > > Matt > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > -- > Robert Gentleman, PhD > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M2-B876 > PO Box 19024 > Seattle, Washington 98109-1024 > 206-667-7700 > rgentlem at fhcrc.org > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
Thanks for your replies. I've managed to solve the problem although im still not sure why the other files were corrupt. The data Henrik has referred to is indeed the data that I extracted from the Affymetrix webstie - the chromsome X data in the form of DTT files. Initially I used the Affymetrix Data Transfer Tool to extract the CEL files on a windows based machine. I then transferred these CEL files over to my main linux machine, however somewhere during the transfer process they became corrupt. I can say this because I am able to successfully read in the CEL files in R on my windows machine and I get exactly the same file information as Hendrik has shown. So instead of transferring the files, I followed Hendrik's tutorial on extracting the CEL files from the DTT files in UNIX and it has worked to perfection! Thanks so much for your help. That was very useful. Best regards, Matt On Tue, May 13, 2008 at 5:53 PM, Henrik Bengtsson <hb@stat.berkeley.edu> wrote: > Hi, > > On Tue, May 13, 2008 at 12:31 PM, Robert Gentleman <rgentlem@fhcrc.org> > wrote: > > Hi Matt, > > > > > > Matt K wrote: > > > > > I am having problems reading in some publicly available chromosome X > > > titration Nsp chip CEL files. The data are available from the > Affymetrix > > > website: > > > > > > > > > http://www.affymetrix.com/support/technical/sample_data/copy_number_ data.affx > > > > > > > I did not see any file there that obviously contained the CEL files, > can > > you say what you downloaded (provided the questions below don't solve > your > > problem). > > FYI/for the record, they are in the *.DTT files (basically a *.zip > archive split in many files). It is quite tedious to extract the > files if you don't have the right tools, but it is still possible with > a basic Unix setup. See > > http://groups.google.com/group/aroma-affymetrix/web/mapping250k-nsp- mapping250k-sty > and the reference to Page 'Affymetrix multi-part DTT/ZIP archives' for > the details. > > > > > > > > > > > > > I have not modified the data in anyway. Here is what happens when I try > to > > > read the data: > > > > > > > > > > library(affxparser) > > > > path <- "./rawData/3X/Mapping250K_Nsp/" > > > > pathnames <- list.files(path=path, pattern="[.](cel|CEL)$", > > > > > > > full.names=TRUE) > > > > > > > pathnames > > > > > > > [1] "./rawData/3X/Mapping250K_Nsp//NA04626_NSP_R1.CEL" > > > [2] "./rawData/3X/Mapping250K_Nsp//NA04626_NSP_R2.CEL" > > > [3] "./rawData/3X/Mapping250K_Nsp//NA04626_NSP_R3.CEL" > > > [4] "./rawData/3X/Mapping250K_Nsp//NA04626_NSP_R4.CEL" > > > > > > > hdr <- readCelHeader(pathnames[1]) > > > > > > > terminate called after throwing an instance of > > > 'affymetrix_calvin_exceptions::UnableToOpenFileException' > > That is sometime how 'affxparser' responds to corrupt files. There is > currently no exception handling at the native-code level in affxparser > causing it to core dump on bad files. It's on the to do list but with > low priority. > > FYI, I've got those files as well and I can read the perfectly well > using affxparser v1.11.3: > > library(affxparser); > path <- "rawData/Affymetrix_2006-Chromosome_X/Mapping250K_Nsp/"; > pathnames <- list.files(pattern="[.]CEL$", path=path, full.names=TRUE); > hdrs <- lapply(pathnames, readCelHeader); > > Since it is quite tricky that to extract the CEL files from the DTT > files, you might have got something wrong there. You might also have > downloaded the DTT, D02, D03, ... files in text mode and not binary > mode (adding/removing extract bytes). My notes on > > http://groups.google.com/group/aroma-affymetrix/web/affymetrix- multi-part-dtt-zip-archives > might shine some light on your problem. > > For your troubleshooting, here is are some details (make sure to have > the latest version of 'digest' installed): > > library(digest); > x <- lapply(pathnames, FUN=function(pathname) { > c(basename(pathname), file.info(pathname)$size, digest(file=pathname)) > }); > x <- as.data.frame(matrix(unlist(x), ncol=3, byrow=TRUE)); > colnames(x) <- c("filename", "bytes", "md5"); > print(x); > > filename bytes md5 > 1 NA01416_NSP_R1.CEL 65743954 edce95d22481a133bcadb4faa79eb8d5 > 2 NA01416_NSP_R2.CEL 65701910 89291d7fb32b43ce6a9c83716d3db747 > 3 NA01416_NSP_R3.CEL 65727988 fba34620e18b6b3de8ff3a394ed0e313 > 4 NA01416_NSP_R4.CEL 65730078 0b26c2ac467fda36182855eca1e005e5 > 5 NA04626_NSP_R1.CEL 65693986 e720285a271506ea79bcc067feb90066 > 6 NA04626_NSP_R2.CEL 65725339 e7934222ecd4b0f9f46bcea27aa53549 > 7 NA04626_NSP_R3.CEL 65703014 7300b841c8b7eafdd73f6272de7e551d > 8 NA04626_NSP_R4.CEL 65712493 64a9c11f8bc268ea17632f23264f0be1 > 9 NA06061_NSP_R1.CEL 65741243 51cae8fcdf3761ef4ce5d68ef980dfce > 10 NA06061_NSP_R2.CEL 65703467 01a19363f226765b5c3b10cd5607dc98 > 11 NA06061_NSP_R3.CEL 65721991 9fdc58f8036457caa3551bc9eb8cd046 > 12 NA06061_NSP_R4.CEL 65712766 e7fa2d9adf599fd29b80c121b7e4dfe7 > > Hope this helps > > Henrik > > > > > > > > That suggests that you may not have read permission on them. Did you > check > > and see if you could open those files with any other tool/editor? > > > > You could just try and open them and read them using standard R tools > to > > see if that works (if they are binary CEL files then you will just get > junk > > from readLines, but that isn't the issue, you just want to know if they > can > > be opened from R). > > > > Your R is out of date, and this sessionInfo is not correct, as you > should > > have had affxparser attached and it is not. Please don't do that, mixing > > and matching error messages and sessionInfo output makes life hard for > > anyone that wants to help. Step one is to update R and BioC... > > > > > > > > > > > > > > Process R aborted at Tue May 13 15:01:55 2008 > > > > > > As you see R aborts. The same failure happens when I try to load in any > of > > > the other CEL files. My R session info is: > > > > > > > > > > sessionInfo() > > > > > > > R version 2.7.0 Under development (unstable) (2008-01-21 r44087) > > > x86_64-unknown-linux-gnu > > > > > > locale: > > > > > > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_ US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_U S.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF -8;LC_IDENTIFICATION=C > > > > > > attached base packages: > > > [1] stats graphics grDevices utils datasets methods base > > > > > > Thanks for any help. > > > > > > Matt > > > > > > [[alternative HTML version deleted]] > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor@stat.math.ethz.ch > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > > -- > > Robert Gentleman, PhD > > Program in Computational Biology > > Division of Public Health Sciences > > Fred Hutchinson Cancer Research Center > > 1100 Fairview Ave. N, M2-B876 > > PO Box 19024 > > Seattle, Washington 98109-1024 > > 206-667-7700 > > rgentlem@fhcrc.org > > > > > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 773 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6