xps: hugene11 chip gives problems
1
0
Entering edit mode
@groot-philip-de-1307
Last seen 9.6 years ago
Hi Christian, I am trying to do an analysis using xps and the hugene11 chip. However, I run into problems for which I need your help. I created a small test-script to demonstrate the problem: library(xps) scheme <- root.scheme("/local2/R-2.15.2/library/xps/schemes/hugene11stv1.root") x.xps <- import.data(scheme, "tmp_x", celdir = ".", celfiles = "G092_A05_01_1.1.CEL", verbose = TRUE) cat("The loaded .CEL-files are:\n"); for (i in 1:length(rawCELName(x.xps, fullpath = FALSE))) cat(sprintf("%s\n", rawCELName(x.xps, fullpath = FALSE)[i])); Upon execution, I get: > library(xps) Welcome to xps version 1.18.1 an R wrapper for XPS - eXpression Profiling System (c) Copyright 2001-2012 by Christian Stratowa > scheme <- root.scheme("/local2/R-2.15.2/library/xps/schemes/hugene11stv1.root") > x.xps <- import.data(scheme, "tmp_x", celdir = ".", celfiles = "G092_A05_01_1.1.CEL", verbose = TRUE) Opening file </local2> in <read> mode... Creating new temporary file </mnt>... Importing <./G092_A05_01_1.1.CEL> as <g092_a05_01_1.1.cel>... hybridization statistics: 1 cells with minimal intensity 19 1 cells with maximal intensity 21364.4 New dataset <dataset> is added to Content... > > cat("The loaded .CEL-files are:\n"); The loaded .CEL-files are: > for (i in 1:length(rawCELName(x.xps, fullpath = FALSE))) + cat(sprintf("%s\n", rawCELName(x.xps, fullpath = FALSE)[i])); Error: Tree set <> could not be found in file content Error: Tree set <> could not be found in file content NA The weird thing is: I only have this problem with the hugene11 chip. As far as I can see, al other chips work properly (still na32 based). This effects all other steps, because there is no "content" to normalise etc. I created the root-scheme as follows: scmdir <- paste(.path.package("xps"), "schemes/", sep = "/") scheme <- import.exon.scheme("hugene11stv1",filedir=scmdir, layoutfile=paste(libdir, "HuGene-1_1-st-v1.r4.clf", sep="/"), schemefile=paste(libdir,"HuGene-1_1-st-v1.r4.pgf", sep="/"), probeset=paste(anndir,"HuGene-1_1-st-v1.na33.1.hg19.probeset.csv", sep="/"), transcript=paste(anndir,"HuGene- 1_1-st-v1.na33.1.hg19.transcript.csv", sep="/"), add.mask = TRUE) (libdir and anndir are also defined off course). I even updated the na32 annotation to the latest Affymetrix version (na33) the exclude a problem there. It does not fix the issue. Please note that I am running root version 5.32/04 as version 5.32/01 is no longer available for download. Root works properly as far as I can see. Do you have any clue where this problem originates from? Thank you! sessionInfo(): > sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] xps_1.18.1 loaded via a namespace (and not attached): [1] tools_2.15.2 Regards, Dr. Philip de Groot Bioinformatician / Microarray analysis expert Wageningen University / TIFN Netherlands Nutrigenomics Center (NNC) Nutrition, Metabolism & Genomics Group Division of Human Nutrition PO Box 8129, 6700 EV Wageningen Visiting Address: "De Valk" ("Erfelijkheidsleer"), Building 304, Verbindingsweg 4, 6703 HC Wageningen Room: 0052a T: 0317 485786 F: 0317 483342 E-mail: Philip.deGroot@wur.nl<mailto:philip.degroot@wur.nl> I: http://humannutrition.wur.nl<http: humannutrition.wur.nl=""/> https://madmax.bioinformatics.nl http://www.nutrigenomicsconsortium.nl<http: www.nutrigenom="" icsconsortium.nl=""/> [[alternative HTML version deleted]]
Microarray Annotation xps Microarray Annotation xps • 1.3k views
ADD COMMENT
0
Entering edit mode
cstrato ★ 3.9k
@cstrato-908
Last seen 5.5 years ago
Austria
Dear Philip, I have just tried a subset of CEL-files from the Affymetrix "gene_1_1_st_ap_tissue_sample_data" for HuGene_1.1 array, but I cannot repeat the error you get. Here is my output for one CEL-file only: > library(xps) Welcome to xps version 1.19.1 an R wrapper for XPS - eXpression Profiling System (c) Copyright 2001-2013 by Christian Stratowa > scheme <- root.scheme("./na33/hugene11stv1.root") > x.xps <- import.data(scheme, "tmp_x", celdir = "./cel", celfiles = "HumanBrain_1.CEL", verbose = TRUE) Opening file <./na33/hugene11stv1.root> in <read> mode... Creating new temporary file </volumes>... Importing <./cel/HumanBrain_1.CEL> as <humanbrain_1.cel>... hybridization statistics: 1 cells with minimal intensity 17.5 1 cells with maximal intensity 22402.1 New dataset <dataset> is added to Content... > cat("The loaded .CEL-files are:\n"); The loaded .CEL-files are: > for (i in 1:length(rawCELName(x.xps, fullpath = FALSE))) + cat(sprintf("%s\n", rawCELName(x.xps, fullpath = FALSE)[i])); HumanBrain_1.CEL > > sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] xps_1.19.1 loaded via a namespace (and not attached): [1] tools_2.15.0 > As you see everything is ok. I did also run the triplicates of the Brain and Prostate samples and could do RMA w/o problems. Could you please try the following two options: 1, Could you try to use the CEL-files from the Affymetrix dataset to make sure that there is no problem with the CEL-files. 2, I see that you did create the ROOT scheme files in directory: scmdir <- paste(.path.package("xps"), "schemes/", sep = "/") I must admit that I have never tried to store the scheme files in the package directory, since I have the feeling that this may cause troubles, especially when you update R and/or the xps package to a new version. Could you please try to save your file "hugene11stv1.root" in a different directory such as '/home/degroot/schemes' or better to create this file in this directory, and then try if you still get the problem. Best regards, Christian On 1/10/13 1:03 PM, Groot, Philip de wrote: > Hi Christian, > > I am trying to do an analysis using xps and the hugene11 chip. However, > I run into problems for which I need your help. > > I created a small test-script to demonstrate the problem: > > library(xps) > > scheme <- > root.scheme("/local2/R-2.15.2/library/xps/schemes/hugene11stv1.root") > > x.xps <- import.data(scheme, "tmp_x", celdir = ".", celfiles = > "G092_A05_01_1.1.CEL", verbose = TRUE) > > cat("The loaded .CEL-files are:\n"); > > for (i in 1:length(rawCELName(x.xps, fullpath = FALSE))) > > cat(sprintf("%s\n", rawCELName(x.xps, fullpath = FALSE)[i])); > > Upon execution, I get: > >> library(xps) > > Welcome to xps version 1.18.1 > > an R wrapper for XPS - eXpression Profiling System > > (c) Copyright 2001-2012 by Christian Stratowa > >> scheme <- root.scheme("/local2/R-2.15.2/library/xps/schemes/hugene11stv1.root") > >> x.xps <- import.data(scheme, "tmp_x", celdir = ".", celfiles = "G092_A05_01_1.1.CEL", verbose = TRUE) > > Opening file </local2> in > <read> mode... > > Creating new temporary file > </mnt>... > > Importing <./G092_A05_01_1.1.CEL> as <g092_a05_01_1.1.cel>... > > hybridization statistics: > > 1 cells with minimal intensity 19 > > 1 cells with maximal intensity 21364.4 > > New dataset <dataset> is added to Content... > >> > >> cat("The loaded .CEL-files are:\n"); > > The loaded .CEL-files are: > >> for (i in 1:length(rawCELName(x.xps, fullpath = FALSE))) > > + cat(sprintf("%s\n", rawCELName(x.xps, fullpath = FALSE)[i])); > > Error: Tree set <> could not be found in file content > > Error: Tree set <> could not be found in file content > > NA > > The weird thing is: I only have this problem with the hugene11 chip. As > far as I can see, al other chips work properly (still na32 based). > > This effects all other steps, because there is no ?content? to normalise > etc. > > I created the root-scheme as follows: > > scmdir <- paste(.path.package("xps"), "schemes/", sep = "/") > > scheme <- import.exon.scheme("hugene11stv1",filedir=scmdir, > layoutfile=paste(libdir, "HuGene-1_1-st-v1.r4.clf", sep="/"), > schemefile=paste(libdir,"HuGene-1_1-st-v1.r4.pgf", sep="/"), > probeset=paste(anndir,"HuGene-1_1-st-v1.na33.1.hg19.probeset.csv", > sep="/"), > transcript=paste(anndir,"HuGene- 1_1-st-v1.na33.1.hg19.transcript.csv", > sep="/"), add.mask = TRUE) > > (libdir and anndir are also defined off course). > > I even updated the na32 annotation to the latest Affymetrix version > (na33) the exclude a problem there. It does not fix the issue. > > Please note that I am running root version 5.32/04 as version 5.32/01 is > no longer available for download. Root works properly as far as I can see. > > Do you have any clue where this problem originates from? Thank you! > > sessionInfo(): > >> sessionInfo() > > R version 2.15.2 (2012-10-26) > > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > > [7] LC_PAPER=C LC_NAME=C > > [9] LC_ADDRESS=C LC_TELEPHONE=C > > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > > [1] xps_1.18.1 > > loaded via a namespace (and not attached): > > [1] tools_2.15.2 > > Regards, > > *Dr. Philip de Groot > Bioinformatician / Microarray analysis expert* > > Wageningen University / TIFN > Netherlands Nutrigenomics Center (NNC) > > Nutrition, Metabolism & Genomics Group > Division of Human Nutrition > PO Box 8129, 6700 EV Wageningen > Visiting Address: > > "De Valk" ("Erfelijkheidsleer"), > > Building 304, > Verbindingsweg 4, 6703 HC Wageningen > Room: 0052a > T: 0317 485786 > F: 0317 483342 > E-mail: Philip.deGroot at wur.nl <mailto:philip.degroot at="" wur.nl=""> > I: http://humannutrition.wur.nl <http: humannutrition.wur.nl=""/> > > https://madmax.bioinformatics.nl > > http://www.nutrigenomicsconsortium.nl > <http: www.nutrigenomicsconsortium.nl=""/> > > >
ADD COMMENT
0
Entering edit mode
Dear Philip, Meanwhile I did another test and renamed my CEL-files to mimic your names. This is what I get: > celfiles <- c("Brain_01_1.1.CEL","Prostate_01_1.1.CEL") > data.genome11 <- import.data(scheme.hugene11, "tmp_HuBrPr", filedir=datdir, celdir=celdir, celfiles=celfiles) Opening file </volumes> in <read> mode... Creating new temporary file </volumes>... Importing </volumes> as <brain_01_1.1.cel>... hybridization statistics: 1 cells with minimal intensity 17.5 1 cells with maximal intensity 22402.1 New dataset <dataset> is added to Content... Importing </volumes> as <prostate_01_1.1.cel>... hybridization statistics: 2 cells with minimal intensity 14.5 1 cells with maximal intensity 23266.3 > for (i in 1:length(rawCELName(data.genome11, fullpath = FALSE))) + cat(sprintf("%s\n", rawCELName(data.genome11, fullpath = FALSE)[i])) Error: Tree set <> could not be found in file content Error: Tree set <> could not be found in file content As you can see I can now replicate your error. The solution is simple, i.e. use parameter 'celnames'. Now the result is: > celfiles <- c("Brain_01_1.1.CEL","Prostate_01_1.1.CEL") > celnames <- c("Brain01","Prostate01") > data.genome11 <- import.data(scheme.hugene11, "tmp_HuBrPr", filedir=datdir, celdir=celdir, celfiles=celfiles, celnames=celnames) Opening file </volumes> in <read> mode... Creating new temporary file </volumes>... Importing </volumes> as <brain01.cel>... hybridization statistics: 1 cells with minimal intensity 17.5 1 cells with maximal intensity 22402.1 New dataset <dataset> is added to Content... Importing </volumes> as <prostate01.cel>... hybridization statistics: 2 cells with minimal intensity 14.5 1 cells with maximal intensity 23266.3 > for (i in 1:length(rawCELName(data.genome11, fullpath = FALSE))) + cat(sprintf("%s\n", rawCELName(data.genome11, fullpath = FALSE)[i])) Brain_01_1.1.CEL Prostate_01_1.1.CEL As you can see, now everything works fine. The reason for introducing parameter 'celnames' was from the beginning to allow alternative names w/o the need to change the names of the original CEL-files, since often CEL-files had names such as 'Breast_tissue;24/08/1999;batch-1,lot-2.1.CEL'. I hope that using parameter 'celnames' does solve your problem. Best regards, Christian On 1/10/13 9:10 PM, cstrato wrote: > Dear Philip, > > I have just tried a subset of CEL-files from the Affymetrix > "gene_1_1_st_ap_tissue_sample_data" for HuGene_1.1 array, but I cannot > repeat the error you get. Here is my output for one CEL-file only: > > > library(xps) > > Welcome to xps version 1.19.1 > an R wrapper for XPS - eXpression Profiling System > (c) Copyright 2001-2013 by Christian Stratowa > > > scheme <- root.scheme("./na33/hugene11stv1.root") > > x.xps <- import.data(scheme, "tmp_x", celdir = "./cel", celfiles = > "HumanBrain_1.CEL", verbose = TRUE) > Opening file <./na33/hugene11stv1.root> in <read> mode... > Creating new temporary file > </volumes>... > Importing <./cel/HumanBrain_1.CEL> as <humanbrain_1.cel>... > hybridization statistics: > 1 cells with minimal intensity 17.5 > 1 cells with maximal intensity 22402.1 > New dataset <dataset> is added to Content... > > cat("The loaded .CEL-files are:\n"); > The loaded .CEL-files are: > > for (i in 1:length(rawCELName(x.xps, fullpath = FALSE))) > + cat(sprintf("%s\n", rawCELName(x.xps, fullpath = FALSE)[i])); > HumanBrain_1.CEL > > > > sessionInfo() > R version 2.15.0 (2012-03-30) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] xps_1.19.1 > > loaded via a namespace (and not attached): > [1] tools_2.15.0 > > > > > As you see everything is ok. I did also run the triplicates of the Brain > and Prostate samples and could do RMA w/o problems. > > Could you please try the following two options: > > 1, Could you try to use the CEL-files from the Affymetrix dataset to > make sure that there is no problem with the CEL-files. > > 2, I see that you did create the ROOT scheme files in directory: > scmdir <- paste(.path.package("xps"), "schemes/", sep = "/") > > I must admit that I have never tried to store the scheme files in the > package directory, since I have the feeling that this may cause > troubles, especially when you update R and/or the xps package to a new > version. > Could you please try to save your file "hugene11stv1.root" in a > different directory such as '/home/degroot/schemes' or better to create > this file in this directory, and then try if you still get the problem. > > Best regards, > Christian > > > On 1/10/13 1:03 PM, Groot, Philip de wrote: >> Hi Christian, >> >> I am trying to do an analysis using xps and the hugene11 chip. However, >> I run into problems for which I need your help. >> >> I created a small test-script to demonstrate the problem: >> >> library(xps) >> >> scheme <- >> root.scheme("/local2/R-2.15.2/library/xps/schemes/hugene11stv1.root") >> >> x.xps <- import.data(scheme, "tmp_x", celdir = ".", celfiles = >> "G092_A05_01_1.1.CEL", verbose = TRUE) >> >> cat("The loaded .CEL-files are:\n"); >> >> for (i in 1:length(rawCELName(x.xps, fullpath = FALSE))) >> >> cat(sprintf("%s\n", rawCELName(x.xps, fullpath = FALSE)[i])); >> >> Upon execution, I get: >> >>> library(xps) >> >> Welcome to xps version 1.18.1 >> >> an R wrapper for XPS - eXpression Profiling System >> >> (c) Copyright 2001-2012 by Christian Stratowa >> >>> scheme <- >>> root.scheme("/local2/R-2.15.2/library/xps/schemes/hugene11stv1.root") >> >>> x.xps <- import.data(scheme, "tmp_x", celdir = ".", celfiles = >>> "G092_A05_01_1.1.CEL", verbose = TRUE) >> >> Opening file </local2> in >> <read> mode... >> >> Creating new temporary file >> </mnt>... >> >> >> Importing <./G092_A05_01_1.1.CEL> as <g092_a05_01_1.1.cel>... >> >> hybridization statistics: >> >> 1 cells with minimal intensity 19 >> >> 1 cells with maximal intensity 21364.4 >> >> New dataset <dataset> is added to Content... >> >>> >> >>> cat("The loaded .CEL-files are:\n"); >> >> The loaded .CEL-files are: >> >>> for (i in 1:length(rawCELName(x.xps, fullpath = FALSE))) >> >> + cat(sprintf("%s\n", rawCELName(x.xps, fullpath = FALSE)[i])); >> >> Error: Tree set <> could not be found in file content >> >> Error: Tree set <> could not be found in file content >> >> NA >> >> The weird thing is: I only have this problem with the hugene11 chip. As >> far as I can see, al other chips work properly (still na32 based). >> >> This effects all other steps, because there is no ?content? to normalise >> etc. >> >> I created the root-scheme as follows: >> >> scmdir <- paste(.path.package("xps"), "schemes/", sep = "/") >> >> scheme <- import.exon.scheme("hugene11stv1",filedir=scmdir, >> layoutfile=paste(libdir, "HuGene-1_1-st-v1.r4.clf", sep="/"), >> schemefile=paste(libdir,"HuGene-1_1-st-v1.r4.pgf", sep="/"), >> probeset=paste(anndir,"HuGene-1_1-st-v1.na33.1.hg19.probeset.csv", >> sep="/"), >> transcript=paste(anndir,"HuGene- 1_1-st-v1.na33.1.hg19.transcript.csv", >> sep="/"), add.mask = TRUE) >> >> (libdir and anndir are also defined off course). >> >> I even updated the na32 annotation to the latest Affymetrix version >> (na33) the exclude a problem there. It does not fix the issue. >> >> Please note that I am running root version 5.32/04 as version 5.32/01 is >> no longer available for download. Root works properly as far as I can >> see. >> >> Do you have any clue where this problem originates from? Thank you! >> >> sessionInfo(): >> >>> sessionInfo() >> >> R version 2.15.2 (2012-10-26) >> >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> >> [7] LC_PAPER=C LC_NAME=C >> >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> >> [1] xps_1.18.1 >> >> loaded via a namespace (and not attached): >> >> [1] tools_2.15.2 >> >> Regards, >> >> *Dr. Philip de Groot >> Bioinformatician / Microarray analysis expert* >> >> Wageningen University / TIFN >> Netherlands Nutrigenomics Center (NNC) >> >> Nutrition, Metabolism & Genomics Group >> Division of Human Nutrition >> PO Box 8129, 6700 EV Wageningen >> Visiting Address: >> >> "De Valk" ("Erfelijkheidsleer"), >> >> Building 304, >> Verbindingsweg 4, 6703 HC Wageningen >> Room: 0052a >> T: 0317 485786 >> F: 0317 483342 >> E-mail: Philip.deGroot at wur.nl <mailto:philip.degroot at="" wur.nl=""> >> I: http://humannutrition.wur.nl <http: humannutrition.wur.nl=""/> >> >> https://madmax.bioinformatics.nl >> >> http://www.nutrigenomicsconsortium.nl >> <http: www.nutrigenomicsconsortium.nl=""/> >> >> >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
Dear Christian, Thank you very much! I was thinking that it must have been something in the CEL-file itself, but it turns out to be the filename! I'll adapt the script on our production server to fix the issue. I have to mention that we use xps for quite some years now. We never encountered this issue before! I worked through your recommendations from yesterday. I could indeed properly load the affymetrix sample data. And changing the location of the root-scheme did not fix the issue either! Fortunately, we do understand this now! And you are right: if xps is updated, I need to recreate the schemes too. This needs only to be done once every 6 months (usually) and is not a big problem. And it also forces me to check the Affymetrix site for updated annotations etc. I just feel more comfortable if the schemes are created by the current running version of xps. Have a nice weekend. Regards, Dr. Philip de Groot Ph.D. Bioinformatics Researcher Wageningen University / TIFN Nutrigenomics Consortium Nutrition, Metabolism & Genomics Group Division of Human Nutrition PO Box 8129, 6700 EV Wageningen Visiting Address: Erfelijkheidsleer: De Valk, Building 304 Dreijenweg 2, 6703 HA Wageningen Room: 0052a T: +31-317-485786 F: +31-317-483342 E-mail: Philip.deGroot at wur.nl Internet: http://www.nutrigenomicsconsortium.nl http://humannutrition.wur.nl/ https://madmax.bioinformatics.nl/ ________________________________________ From: cstrato [cstrato@aon.at] Sent: 11 January 2013 21:05 To: Groot, Philip de Cc: bioconductor at r-project.org Subject: Re: [BioC] xps: hugene11 chip gives problems Dear Philip, Meanwhile I did another test and renamed my CEL-files to mimic your names. This is what I get: > celfiles <- c("Brain_01_1.1.CEL","Prostate_01_1.1.CEL") > data.genome11 <- import.data(scheme.hugene11, "tmp_HuBrPr", filedir=datdir, celdir=celdir, celfiles=celfiles) Opening file </volumes> in <read> mode... Creating new temporary file </volumes>... Importing </volumes> as <brain_01_1.1.cel>... hybridization statistics: 1 cells with minimal intensity 17.5 1 cells with maximal intensity 22402.1 New dataset <dataset> is added to Content... Importing </volumes> as <prostate_01_1.1.cel>... hybridization statistics: 2 cells with minimal intensity 14.5 1 cells with maximal intensity 23266.3 > for (i in 1:length(rawCELName(data.genome11, fullpath = FALSE))) + cat(sprintf("%s\n", rawCELName(data.genome11, fullpath = FALSE)[i])) Error: Tree set <> could not be found in file content Error: Tree set <> could not be found in file content As you can see I can now replicate your error. The solution is simple, i.e. use parameter 'celnames'. Now the result is: > celfiles <- c("Brain_01_1.1.CEL","Prostate_01_1.1.CEL") > celnames <- c("Brain01","Prostate01") > data.genome11 <- import.data(scheme.hugene11, "tmp_HuBrPr", filedir=datdir, celdir=celdir, celfiles=celfiles, celnames=celnames) Opening file </volumes> in <read> mode... Creating new temporary file </volumes>... Importing </volumes> as <brain01.cel>... hybridization statistics: 1 cells with minimal intensity 17.5 1 cells with maximal intensity 22402.1 New dataset <dataset> is added to Content... Importing </volumes> as <prostate01.cel>... hybridization statistics: 2 cells with minimal intensity 14.5 1 cells with maximal intensity 23266.3 > for (i in 1:length(rawCELName(data.genome11, fullpath = FALSE))) + cat(sprintf("%s\n", rawCELName(data.genome11, fullpath = FALSE)[i])) Brain_01_1.1.CEL Prostate_01_1.1.CEL As you can see, now everything works fine. The reason for introducing parameter 'celnames' was from the beginning to allow alternative names w/o the need to change the names of the original CEL-files, since often CEL-files had names such as 'Breast_tissue;24/08/1999;batch-1,lot-2.1.CEL'. I hope that using parameter 'celnames' does solve your problem. Best regards, Christian On 1/10/13 9:10 PM, cstrato wrote: > Dear Philip, > > I have just tried a subset of CEL-files from the Affymetrix > "gene_1_1_st_ap_tissue_sample_data" for HuGene_1.1 array, but I cannot > repeat the error you get. Here is my output for one CEL-file only: > > > library(xps) > > Welcome to xps version 1.19.1 > an R wrapper for XPS - eXpression Profiling System > (c) Copyright 2001-2013 by Christian Stratowa > > > scheme <- root.scheme("./na33/hugene11stv1.root") > > x.xps <- import.data(scheme, "tmp_x", celdir = "./cel", celfiles = > "HumanBrain_1.CEL", verbose = TRUE) > Opening file <./na33/hugene11stv1.root> in <read> mode... > Creating new temporary file > </volumes>... > Importing <./cel/HumanBrain_1.CEL> as <humanbrain_1.cel>... > hybridization statistics: > 1 cells with minimal intensity 17.5 > 1 cells with maximal intensity 22402.1 > New dataset <dataset> is added to Content... > > cat("The loaded .CEL-files are:\n"); > The loaded .CEL-files are: > > for (i in 1:length(rawCELName(x.xps, fullpath = FALSE))) > + cat(sprintf("%s\n", rawCELName(x.xps, fullpath = FALSE)[i])); > HumanBrain_1.CEL > > > > sessionInfo() > R version 2.15.0 (2012-03-30) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] xps_1.19.1 > > loaded via a namespace (and not attached): > [1] tools_2.15.0 > > > > > As you see everything is ok. I did also run the triplicates of the Brain > and Prostate samples and could do RMA w/o problems. > > Could you please try the following two options: > > 1, Could you try to use the CEL-files from the Affymetrix dataset to > make sure that there is no problem with the CEL-files. > > 2, I see that you did create the ROOT scheme files in directory: > scmdir <- paste(.path.package("xps"), "schemes/", sep = "/") > > I must admit that I have never tried to store the scheme files in the > package directory, since I have the feeling that this may cause > troubles, especially when you update R and/or the xps package to a new > version. > Could you please try to save your file "hugene11stv1.root" in a > different directory such as '/home/degroot/schemes' or better to create > this file in this directory, and then try if you still get the problem. > > Best regards, > Christian > > > On 1/10/13 1:03 PM, Groot, Philip de wrote: >> Hi Christian, >> >> I am trying to do an analysis using xps and the hugene11 chip. However, >> I run into problems for which I need your help. >> >> I created a small test-script to demonstrate the problem: >> >> library(xps) >> >> scheme <- >> root.scheme("/local2/R-2.15.2/library/xps/schemes/hugene11stv1.root") >> >> x.xps <- import.data(scheme, "tmp_x", celdir = ".", celfiles = >> "G092_A05_01_1.1.CEL", verbose = TRUE) >> >> cat("The loaded .CEL-files are:\n"); >> >> for (i in 1:length(rawCELName(x.xps, fullpath = FALSE))) >> >> cat(sprintf("%s\n", rawCELName(x.xps, fullpath = FALSE)[i])); >> >> Upon execution, I get: >> >>> library(xps) >> >> Welcome to xps version 1.18.1 >> >> an R wrapper for XPS - eXpression Profiling System >> >> (c) Copyright 2001-2012 by Christian Stratowa >> >>> scheme <- >>> root.scheme("/local2/R-2.15.2/library/xps/schemes/hugene11stv1.root") >> >>> x.xps <- import.data(scheme, "tmp_x", celdir = ".", celfiles = >>> "G092_A05_01_1.1.CEL", verbose = TRUE) >> >> Opening file </local2> in >> <read> mode... >> >> Creating new temporary file >> </mnt>... >> >> >> Importing <./G092_A05_01_1.1.CEL> as <g092_a05_01_1.1.cel>... >> >> hybridization statistics: >> >> 1 cells with minimal intensity 19 >> >> 1 cells with maximal intensity 21364.4 >> >> New dataset <dataset> is added to Content... >> >>> >> >>> cat("The loaded .CEL-files are:\n"); >> >> The loaded .CEL-files are: >> >>> for (i in 1:length(rawCELName(x.xps, fullpath = FALSE))) >> >> + cat(sprintf("%s\n", rawCELName(x.xps, fullpath = FALSE)[i])); >> >> Error: Tree set <> could not be found in file content >> >> Error: Tree set <> could not be found in file content >> >> NA >> >> The weird thing is: I only have this problem with the hugene11 chip. As >> far as I can see, al other chips work properly (still na32 based). >> >> This effects all other steps, because there is no ?content? to normalise >> etc. >> >> I created the root-scheme as follows: >> >> scmdir <- paste(.path.package("xps"), "schemes/", sep = "/") >> >> scheme <- import.exon.scheme("hugene11stv1",filedir=scmdir, >> layoutfile=paste(libdir, "HuGene-1_1-st-v1.r4.clf", sep="/"), >> schemefile=paste(libdir,"HuGene-1_1-st-v1.r4.pgf", sep="/"), >> probeset=paste(anndir,"HuGene-1_1-st-v1.na33.1.hg19.probeset.csv", >> sep="/"), >> transcript=paste(anndir,"HuGene- 1_1-st-v1.na33.1.hg19.transcript.csv", >> sep="/"), add.mask = TRUE) >> >> (libdir and anndir are also defined off course). >> >> I even updated the na32 annotation to the latest Affymetrix version >> (na33) the exclude a problem there. It does not fix the issue. >> >> Please note that I am running root version 5.32/04 as version 5.32/01 is >> no longer available for download. Root works properly as far as I can >> see. >> >> Do you have any clue where this problem originates from? Thank you! >> >> sessionInfo(): >> >>> sessionInfo() >> >> R version 2.15.2 (2012-10-26) >> >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> >> [7] LC_PAPER=C LC_NAME=C >> >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> >> [1] xps_1.18.1 >> >> loaded via a namespace (and not attached): >> >> [1] tools_2.15.2 >> >> Regards, >> >> *Dr. Philip de Groot >> Bioinformatician / Microarray analysis expert* >> >> Wageningen University / TIFN >> Netherlands Nutrigenomics Center (NNC) >> >> Nutrition, Metabolism & Genomics Group >> Division of Human Nutrition >> PO Box 8129, 6700 EV Wageningen >> Visiting Address: >> >> "De Valk" ("Erfelijkheidsleer"), >> >> Building 304, >> Verbindingsweg 4, 6703 HC Wageningen >> Room: 0052a >> T: 0317 485786 >> F: 0317 483342 >> E-mail: Philip.deGroot at wur.nl <mailto:philip.degroot at="" wur.nl=""> >> I: http://humannutrition.wur.nl <http: humannutrition.wur.nl=""/> >> >> https://madmax.bioinformatics.nl >> >> http://www.nutrigenomicsconsortium.nl >> <http: www.nutrigenomicsconsortium.nl=""/> >> >> >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
Dear Philip, I am glad to hear that using 'celnames' could solve your problem. It is interesting to hear that you have never had problems with names of CEL-files. Personally I prefer to change the names, especially the names of the CEL-files from GEO which are simply numbers with a prefix. Have a nice weekend, too. Christian On 1/11/13 10:34 PM, Groot, Philip de wrote: > Dear Christian, > > Thank you very much! I was thinking that it must have been something in the CEL-file itself, but it turns out to be the filename! I'll adapt the script on our production server to fix the issue. I have to mention that we use xps for quite some years now. We never encountered this issue before! > > I worked through your recommendations from yesterday. I could indeed properly load the affymetrix sample data. And changing the location of the root-scheme did not fix the issue either! Fortunately, we do understand this now! > > And you are right: if xps is updated, I need to recreate the schemes too. This needs only to be done once every 6 months (usually) and is not a big problem. And it also forces me to check the Affymetrix site for updated annotations etc. I just feel more comfortable if the schemes are created by the current running version of xps. > > Have a nice weekend. > > Regards, > > > Dr. Philip de Groot Ph.D. > Bioinformatics Researcher > > Wageningen University / TIFN > Nutrigenomics Consortium > Nutrition, Metabolism & Genomics Group > Division of Human Nutrition > PO Box 8129, 6700 EV Wageningen > Visiting Address: Erfelijkheidsleer: De Valk, Building 304 > Dreijenweg 2, 6703 HA Wageningen > Room: 0052a > T: +31-317-485786 > F: +31-317-483342 > E-mail: Philip.deGroot at wur.nl > Internet: http://www.nutrigenomicsconsortium.nl > http://humannutrition.wur.nl/ > https://madmax.bioinformatics.nl/ > ________________________________________ > From: cstrato [cstrato at aon.at] > Sent: 11 January 2013 21:05 > To: Groot, Philip de > Cc: bioconductor at r-project.org > Subject: Re: [BioC] xps: hugene11 chip gives problems > > Dear Philip, > > Meanwhile I did another test and renamed my CEL-files to mimic your > names. This is what I get: > > celfiles <- c("Brain_01_1.1.CEL","Prostate_01_1.1.CEL") > > data.genome11 <- import.data(scheme.hugene11, "tmp_HuBrPr", > filedir=datdir, celdir=celdir, celfiles=celfiles) > Opening file > </volumes> in > <read> mode... > Creating new temporary file > </volumes>... > Importing > </volumes> > as <brain_01_1.1.cel>... > hybridization statistics: > 1 cells with minimal intensity 17.5 > 1 cells with maximal intensity 22402.1 > New dataset <dataset> is added to Content... > Importing > </volumes> as > <prostate_01_1.1.cel>... > hybridization statistics: > 2 cells with minimal intensity 14.5 > 1 cells with maximal intensity 23266.3 > > for (i in 1:length(rawCELName(data.genome11, fullpath = FALSE))) > + cat(sprintf("%s\n", rawCELName(data.genome11, fullpath = FALSE)[i])) > Error: Tree set <> could not be found in file content > Error: Tree set <> could not be found in file content > > > As you can see I can now replicate your error. > > The solution is simple, i.e. use parameter 'celnames'. Now the result is: > > celfiles <- c("Brain_01_1.1.CEL","Prostate_01_1.1.CEL") > > celnames <- c("Brain01","Prostate01") > > data.genome11 <- import.data(scheme.hugene11, "tmp_HuBrPr", > filedir=datdir, celdir=celdir, celfiles=celfiles, celnames=celnames) > Opening file > </volumes> in > <read> mode... > Creating new temporary file > </volumes>... > Importing > </volumes> > as <brain01.cel>... > hybridization statistics: > 1 cells with minimal intensity 17.5 > 1 cells with maximal intensity 22402.1 > New dataset <dataset> is added to Content... > Importing > </volumes> as > <prostate01.cel>... > hybridization statistics: > 2 cells with minimal intensity 14.5 > 1 cells with maximal intensity 23266.3 > > for (i in 1:length(rawCELName(data.genome11, fullpath = FALSE))) > + cat(sprintf("%s\n", rawCELName(data.genome11, fullpath = FALSE)[i])) > Brain_01_1.1.CEL > Prostate_01_1.1.CEL > > As you can see, now everything works fine. The reason for introducing > parameter 'celnames' was from the beginning to allow alternative names > w/o the need to change the names of the original CEL-files, since often > CEL-files had names such as 'Breast_tissue;24/08/1999;batch-1,lot-2.1.CEL'. > > I hope that using parameter 'celnames' does solve your problem. > > Best regards, > Christian > > > On 1/10/13 9:10 PM, cstrato wrote: >> Dear Philip, >> >> I have just tried a subset of CEL-files from the Affymetrix >> "gene_1_1_st_ap_tissue_sample_data" for HuGene_1.1 array, but I cannot >> repeat the error you get. Here is my output for one CEL-file only: >> >> > library(xps) >> >> Welcome to xps version 1.19.1 >> an R wrapper for XPS - eXpression Profiling System >> (c) Copyright 2001-2013 by Christian Stratowa >> >> > scheme <- root.scheme("./na33/hugene11stv1.root") >> > x.xps <- import.data(scheme, "tmp_x", celdir = "./cel", celfiles = >> "HumanBrain_1.CEL", verbose = TRUE) >> Opening file <./na33/hugene11stv1.root> in <read> mode... >> Creating new temporary file >> </volumes>... >> Importing <./cel/HumanBrain_1.CEL> as <humanbrain_1.cel>... >> hybridization statistics: >> 1 cells with minimal intensity 17.5 >> 1 cells with maximal intensity 22402.1 >> New dataset <dataset> is added to Content... >> > cat("The loaded .CEL-files are:\n"); >> The loaded .CEL-files are: >> > for (i in 1:length(rawCELName(x.xps, fullpath = FALSE))) >> + cat(sprintf("%s\n", rawCELName(x.xps, fullpath = FALSE)[i])); >> HumanBrain_1.CEL >> > >> > sessionInfo() >> R version 2.15.0 (2012-03-30) >> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >> >> locale: >> [1] C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] xps_1.19.1 >> >> loaded via a namespace (and not attached): >> [1] tools_2.15.0 >> > >> >> >> As you see everything is ok. I did also run the triplicates of the Brain >> and Prostate samples and could do RMA w/o problems. >> >> Could you please try the following two options: >> >> 1, Could you try to use the CEL-files from the Affymetrix dataset to >> make sure that there is no problem with the CEL-files. >> >> 2, I see that you did create the ROOT scheme files in directory: >> scmdir <- paste(.path.package("xps"), "schemes/", sep = "/") >> >> I must admit that I have never tried to store the scheme files in the >> package directory, since I have the feeling that this may cause >> troubles, especially when you update R and/or the xps package to a new >> version. >> Could you please try to save your file "hugene11stv1.root" in a >> different directory such as '/home/degroot/schemes' or better to create >> this file in this directory, and then try if you still get the problem. >> >> Best regards, >> Christian >> >> >> On 1/10/13 1:03 PM, Groot, Philip de wrote: >>> Hi Christian, >>> >>> I am trying to do an analysis using xps and the hugene11 chip. However, >>> I run into problems for which I need your help. >>> >>> I created a small test-script to demonstrate the problem: >>> >>> library(xps) >>> >>> scheme <- >>> root.scheme("/local2/R-2.15.2/library/xps/schemes/hugene11stv1.root") >>> >>> x.xps <- import.data(scheme, "tmp_x", celdir = ".", celfiles = >>> "G092_A05_01_1.1.CEL", verbose = TRUE) >>> >>> cat("The loaded .CEL-files are:\n"); >>> >>> for (i in 1:length(rawCELName(x.xps, fullpath = FALSE))) >>> >>> cat(sprintf("%s\n", rawCELName(x.xps, fullpath = FALSE)[i])); >>> >>> Upon execution, I get: >>> >>>> library(xps) >>> >>> Welcome to xps version 1.18.1 >>> >>> an R wrapper for XPS - eXpression Profiling System >>> >>> (c) Copyright 2001-2012 by Christian Stratowa >>> >>>> scheme <- >>>> root.scheme("/local2/R-2.15.2/library/xps/schemes/hugene11stv1.root") >>> >>>> x.xps <- import.data(scheme, "tmp_x", celdir = ".", celfiles = >>>> "G092_A05_01_1.1.CEL", verbose = TRUE) >>> >>> Opening file </local2> in >>> <read> mode... >>> >>> Creating new temporary file >>> </mnt>... >>> >>> >>> Importing <./G092_A05_01_1.1.CEL> as <g092_a05_01_1.1.cel>... >>> >>> hybridization statistics: >>> >>> 1 cells with minimal intensity 19 >>> >>> 1 cells with maximal intensity 21364.4 >>> >>> New dataset <dataset> is added to Content... >>> >>>> >>> >>>> cat("The loaded .CEL-files are:\n"); >>> >>> The loaded .CEL-files are: >>> >>>> for (i in 1:length(rawCELName(x.xps, fullpath = FALSE))) >>> >>> + cat(sprintf("%s\n", rawCELName(x.xps, fullpath = FALSE)[i])); >>> >>> Error: Tree set <> could not be found in file content >>> >>> Error: Tree set <> could not be found in file content >>> >>> NA >>> >>> The weird thing is: I only have this problem with the hugene11 chip. As >>> far as I can see, al other chips work properly (still na32 based). >>> >>> This effects all other steps, because there is no ?content? to normalise >>> etc. >>> >>> I created the root-scheme as follows: >>> >>> scmdir <- paste(.path.package("xps"), "schemes/", sep = "/") >>> >>> scheme <- import.exon.scheme("hugene11stv1",filedir=scmdir, >>> layoutfile=paste(libdir, "HuGene-1_1-st-v1.r4.clf", sep="/"), >>> schemefile=paste(libdir,"HuGene-1_1-st-v1.r4.pgf", sep="/"), >>> probeset=paste(anndir,"HuGene-1_1-st-v1.na33.1.hg19.probeset.csv", >>> sep="/"), >>> transcript=paste(anndir,"HuGene- 1_1-st-v1.na33.1.hg19.transcript.csv", >>> sep="/"), add.mask = TRUE) >>> >>> (libdir and anndir are also defined off course). >>> >>> I even updated the na32 annotation to the latest Affymetrix version >>> (na33) the exclude a problem there. It does not fix the issue. >>> >>> Please note that I am running root version 5.32/04 as version 5.32/01 is >>> no longer available for download. Root works properly as far as I can >>> see. >>> >>> Do you have any clue where this problem originates from? Thank you! >>> >>> sessionInfo(): >>> >>>> sessionInfo() >>> >>> R version 2.15.2 (2012-10-26) >>> >>> Platform: x86_64-unknown-linux-gnu (64-bit) >>> >>> locale: >>> >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>> >>> [7] LC_PAPER=C LC_NAME=C >>> >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> >>> [1] xps_1.18.1 >>> >>> loaded via a namespace (and not attached): >>> >>> [1] tools_2.15.2 >>> >>> Regards, >>> >>> *Dr. Philip de Groot >>> Bioinformatician / Microarray analysis expert* >>> >>> Wageningen University / TIFN >>> Netherlands Nutrigenomics Center (NNC) >>> >>> Nutrition, Metabolism & Genomics Group >>> Division of Human Nutrition >>> PO Box 8129, 6700 EV Wageningen >>> Visiting Address: >>> >>> "De Valk" ("Erfelijkheidsleer"), >>> >>> Building 304, >>> Verbindingsweg 4, 6703 HC Wageningen >>> Room: 0052a >>> T: 0317 485786 >>> F: 0317 483342 >>> E-mail: Philip.deGroot at wur.nl <mailto:philip.degroot at="" wur.nl=""> >>> I: http://humannutrition.wur.nl <http: humannutrition.wur.nl=""/> >>> >>> https://madmax.bioinformatics.nl >>> >>> http://www.nutrigenomicsconsortium.nl >>> <http: www.nutrigenomicsconsortium.nl=""/> >>> >>> >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >
ADD REPLY

Login before adding your answer.

Traffic: 801 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6