affxparser updateCelUnit question

0

Entering edit mode

Martha Behnke ▴ 100

@martha-behnke-3807

Last seen 9.6 years ago

Hello all, I have normalized data from hgu133a2 microarray experiments using a custom algorithm, in the format of a matrix with rowname=probe position and columns the normalized intensity values for each sample (125 samples). I am trying to write the data back out to individual CEL files using affxparser. I successfully created the 'empty' CEL files using createCel. Because I only have the PM probes, I believe I need to use the updateCelUnits function. Following the vignette, I set up my code for the first sample as such (CDF file is in the working directory): pathname <- file.path(getwd(), basename(files[1])) intens=as.list(normed.1[,1]) updateCelUnits(pathname, cdf=NULL, intens) and checked my variables- > pathname [1] "L:/Normed data/CIR_5-D-309.CEL" > intens[1:4] $`129981` [1] 7.566183 $`212118` [1] 8.783549 $`393384` [1] 8.812677 $`84268` [1] 10.39683 However, when I run the function I get the following error: > updateCelUnits(pathname, cdf=NULL, intens) Error in dirname(filename) : a character vector argument expected > is.character(pathname) [1] TRUE > is.vector(pathname) [1] TRUE Since I do appear to be giving it a character vector, I'm stumped. Can someone help? My session info appears below. Thank you very much! Mikki Behnke Doctoral Student Virginia Commonwealth University R version 2.13.0 (2011-04-13) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] affxparser_1.24.0 hgu133a2hsrefseq.db_14.1.0 [3] org.Hs.eg.db_2.5.0 RSQLite_0.9-4 [5] DBI_0.2-5 AnnotationDbi_1.14.1 [7] hgu133a2cdf_2.8.0 simpleaffy_2.28.0 [9] gcrma_2.24.1 genefilter_1.34.0 [11] affy_1.30.0 Biobase_2.12.1 loaded via a namespace (and not attached): [1] affyio_1.20.0 annotate_1.30.0 Biostrings_2.20.0 [4] IRanges_1.10.0 preprocessCore_1.14.0 splines_2.13.0 [7] survival_2.36-9 tools_2.13.0 xtable_1.5-6 > [[alternative HTML version deleted]]

Microarray hgu133a2 affxparser Microarray hgu133a2 affxparser • 1.7k views

ADD COMMENT • link 12.8 years ago Martha Behnke ▴ 100

0

Entering edit mode

Martha Behnke ▴ 100

@martha-behnke-3807

Last seen 9.6 years ago

Oh, thank you! I thought I had solved the CDF problem when I figured out that this package requires a copy of the CDF file to be in the working directory, because the CDF-specific error messages went away. I can now get it to find my cdf file but still doesn't like it, but I will go away and Google some more and try to figure it out. Thanks again for getting me on the right track! Mikki On Tue, Jul 12, 2011 at 12:21 PM, Harris A. Jaffee <hj@jhu.edu> wrote: > On Jul 12, 2011, at 11:59 AM, M Behnke wrote: > >> debug: cdfFile <- findCdf(chipType) >> Browse[2]> >> debug: unitNames <- readCdfUnitNames(cdfFile) >> Browse[2]> >> Error in dirname(filename) : a character vector argument expected >> > > Yes, it's useful! Apparently, cdfFile was NULL, as in this orchestrated > failure? > > > readCdfUnitNames(NULL) > > Error in dirname(filename) : a character vector argument expected > > The problem seems to be with the value of findCdf() rather than your > pathname variable. > I think you need to specify something non-NULL for the 'cdf' argument to > updateCelUnits, > or else incorporate it into the 'data' argument -- or maybe the optional > arguments: > > cdf: A (optional) CDF 'list' structure either with field 'indices' > or fields 'x' and 'y'. If 'NULL', the unit names (and from > there the cell indices) are inferred from the names of the > elements in 'data'. > > data: A 'list' structure in a format similar to what is returned by > 'readCelUnits'() for _a single CEL file only_. > > ...: Optional arguments passed to 'readCdfCellIndices'(), which is > called if 'cdf' is not given. > > > > [[alternative HTML version deleted]]

ADD COMMENT • link 12.8 years ago Martha Behnke ▴ 100

0

Entering edit mode

Henrik Bengtsson ★ 2.4k

@henrik-bengtsson-4333

Last seen 5 days ago

United States

Hi, if you are trying to write a vector of probe intensities to a CEL file, and you have the corresponding probe indices, then use updateCel(), e.g. updateCel(pathname, indices=idxs, intensities=y) This does not require knowing the CDF structure. (FYI, any function with "unit" in its name, such as updateCelUnits(), utilizes a CDF structure for its reading/writing.) FYI, if the aroma.affymetrix package [http://www.aroma-project.org/] provides a higher-level API on top of affxparser for dealing with single or sets of CEL files. It also provides a standard file structure for hold data sets, so you/each person don't have to invent their own each time. ...and more. You might find it useful. If you're developing custom preprocessing methods, they can be made so they plug in transparently. /Henrik On Tue, Jul 12, 2011 at 6:59 AM, M Behnke <mkbehnke at="" gmail.com=""> wrote: > Hello all, > > I have normalized data from hgu133a2 microarray experiments using a custom > algorithm, in the format of a matrix with rowname=probe position and columns > the normalized intensity values for each sample (125 samples). I am trying > to write the data back out to individual CEL files using affxparser. ?I > successfully created the 'empty' CEL files using createCel. ?Because I only > have the PM probes, I believe I need to use the updateCelUnits function. > Following the vignette, I set up my code for the first sample as such (CDF > file is in the working directory): > > pathname <- file.path(getwd(), basename(files[1])) > intens=as.list(normed.1[,1]) > updateCelUnits(pathname, cdf=NULL, intens) > and checked my variables- >> pathname > [1] "L:/Normed data/CIR_5-D-309.CEL" > >> intens[1:4] > $`129981` > [1] 7.566183 > $`212118` > [1] 8.783549 > $`393384` > [1] 8.812677 > $`84268` > [1] 10.39683 > However, when I run the function I get the following error: >> updateCelUnits(pathname, cdf=NULL, intens) > Error in dirname(filename) : a character vector argument expected >> is.character(pathname) > [1] TRUE >> is.vector(pathname) > [1] TRUE > Since I do appear to be giving it a character vector, I'm stumped. ?Can > someone help? > My session info appears below. > > Thank you very much! > Mikki Behnke > Doctoral Student > Virginia Commonwealth University > > R version 2.13.0 (2011-04-13) > Platform: x86_64-pc-mingw32/x64 (64-bit) > locale: > [1] LC_COLLATE=English_United States.1252 > [2] LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > attached base packages: > [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base > other attached packages: > ?[1] affxparser_1.24.0 ? ? ? ? ?hgu133a2hsrefseq.db_14.1.0 > ?[3] org.Hs.eg.db_2.5.0 ? ? ? ? RSQLite_0.9-4 > ?[5] DBI_0.2-5 ? ? ? ? ? ? ? ? ?AnnotationDbi_1.14.1 > ?[7] hgu133a2cdf_2.8.0 ? ? ? ? ?simpleaffy_2.28.0 > ?[9] gcrma_2.24.1 ? ? ? ? ? ? ? genefilter_1.34.0 > [11] affy_1.30.0 ? ? ? ? ? ? ? ?Biobase_2.12.1 > loaded via a namespace (and not attached): > [1] affyio_1.20.0 ? ? ? ? annotate_1.30.0 ? ? ? Biostrings_2.20.0 > [4] IRanges_1.10.0 ? ? ? ?preprocessCore_1.14.0 splines_2.13.0 > [7] survival_2.36-9 ? ? ? tools_2.13.0 ? ? ? ? ?xtable_1.5-6 >> > > ? ? ? ?[[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 12.8 years ago Henrik Bengtsson ★ 2.4k

0

Entering edit mode

Martha Behnke ▴ 100

@martha-behnke-3807

Last seen 9.6 years ago

Hi, Thanks for the reply! I didn't think I could use updateCel since I only have the 247,899 PM values. I did try it though and something strange is happening: With this code: files=list.files(pattern="[.](CEL)$") names=colnames(normed.1) pathname <- file.path(getwd(), basename(files[1])) intens=(normed.1[,1]) idxs=rownames(normed.1) updateCel(pathname,indices=idxs,intensities=intens) I get the error Error in updateCel(pathname, indices = idxs, intensities = intens) : Argument 'indices' is out of range [1,535824]: [1000,9999] Using debug, the error is occuring here: debug: if (r[1] < 1 || r[2] > nbrOfCells) { stop("Argument 'indices' is out of range [1,", nbrOfCells, "]: ", "[", r[1], ",", r[2], "]") } When I set up r as in the code with r=range(idxs), I do get > r [1] "1000" "9999" And > min(idxs) [1] "1000" > max(idxs) [1] "9999" BUT, random sampling of idxs shows values outside that range - > idxs[45] [1] "57961" >> idxs[9000] [1] "46169" I may well be making a basic programming error, as I am apparently a much better molecular biologist than I am a programmer! Any help you can give would be much appreciated! I will also check out aroma.affymetrix as you suggest. thanks again, Mikki On Wed, Jul 13, 2011 at 12:31 PM, Henrik Bengtsson <hb@biostat.ucsf.edu>wrote: > Hi, > > if you are trying to write a vector of probe intensities to a CEL > file, and you have the corresponding probe indices, then use > updateCel(), e.g. > > updateCel(pathname, indices=idxs, intensities=y) > > This does not require knowing the CDF structure. (FYI, any function > with "unit" in its name, such as updateCelUnits(), utilizes a CDF > structure for its reading/writing.) > > FYI, if the aroma.affymetrix package [http://www.aroma-project.org/] > provides a higher-level API on top of affxparser for dealing with > single or sets of CEL files. It also provides a standard file > structure for hold data sets, so you/each person don't have to invent > their own each time. ...and more. You might find it useful. If > you're developing custom preprocessing methods, they can be made so > they plug in transparently. > > /Henrik > > On Tue, Jul 12, 2011 at 6:59 AM, M Behnke <mkbehnke@gmail.com> wrote: > > Hello all, > > > > I have normalized data from hgu133a2 microarray experiments using a > custom > > algorithm, in the format of a matrix with rowname=probe position and > columns > > the normalized intensity values for each sample (125 samples). I am > trying > > to write the data back out to individual CEL files using affxparser. I > > successfully created the 'empty' CEL files using createCel. Because I > only > > have the PM probes, I believe I need to use the updateCelUnits function. > > Following the vignette, I set up my code for the first sample as such > (CDF > > file is in the working directory): > > > > pathname <- file.path(getwd(), basename(files[1])) > > intens=as.list(normed.1[,1]) > > updateCelUnits(pathname, cdf=NULL, intens) > > and checked my variables- > >> pathname > > [1] "L:/Normed data/CIR_5-D-309.CEL" > > > >> intens[1:4] > > $`129981` > > [1] 7.566183 > > $`212118` > > [1] 8.783549 > > $`393384` > > [1] 8.812677 > > $`84268` > > [1] 10.39683 > > However, when I run the function I get the following error: > >> updateCelUnits(pathname, cdf=NULL, intens) > > Error in dirname(filename) : a character vector argument expected > >> is.character(pathname) > > [1] TRUE > >> is.vector(pathname) > > [1] TRUE > > Since I do appear to be giving it a character vector, I'm stumped. Can > > someone help? > > My session info appears below. > > > > Thank you very much! > > Mikki Behnke > > Doctoral Student > > Virginia Commonwealth University > > > > R version 2.13.0 (2011-04-13) > > Platform: x86_64-pc-mingw32/x64 (64-bit) > > locale: > > [1] LC_COLLATE=English_United States.1252 > > [2] LC_CTYPE=English_United States.1252 > > [3] LC_MONETARY=English_United States.1252 > > [4] LC_NUMERIC=C > > [5] LC_TIME=English_United States.1252 > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > > [1] affxparser_1.24.0 hgu133a2hsrefseq.db_14.1.0 > > [3] org.Hs.eg.db_2.5.0 RSQLite_0.9-4 > > [5] DBI_0.2-5 AnnotationDbi_1.14.1 > > [7] hgu133a2cdf_2.8.0 simpleaffy_2.28.0 > > [9] gcrma_2.24.1 genefilter_1.34.0 > > [11] affy_1.30.0 Biobase_2.12.1 > > loaded via a namespace (and not attached): > > [1] affyio_1.20.0 annotate_1.30.0 Biostrings_2.20.0 > > [4] IRanges_1.10.0 preprocessCore_1.14.0 splines_2.13.0 > > [7] survival_2.36-9 tools_2.13.0 xtable_1.5-6 > >> > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > [[alternative HTML version deleted]]

ADD COMMENT • link 12.8 years ago Martha Behnke ▴ 100

0

Entering edit mode

Hi. On Wed, Jul 13, 2011 at 10:49 AM, M Behnke <mkbehnke at="" gmail.com=""> wrote: > Hi, > Thanks for the reply! ?I didn't think I could use updateCel since I only > have the 247,899 PM values. You are correct that the HG-U133A_2.cdf file specifies 247,899 PM probes; > pathname <- "annotationData/chipTypes/HG-U133A_2/HG-U133A_2.cdf"; > cellsList <- readCdfCellIndices(pathname, stratifyBy="pm"); > cells <- unlist(cellsList, use.names=FALSE); > str(cells); int [1:247899] 315119 315120 315121 315122 315123 ... in addition to MM probes and some other control probes. FYI, in affxparser, we prefer to refer to probes as "cells", hence the above names. >?I did try it though and something strange is > happening: > > With this code: > files=list.files(pattern="[.](CEL)$") > names=colnames(normed.1) > > pathname <- file.path(getwd(), basename(files[1])) > intens=(normed.1[,1]) > idxs=rownames(normed.1) Here 'idxs' is a *character* vector (because you use rownames()), but updateCel() requires a *numeric* vector. > updateCel(pathname,indices=idxs,intensities=intens) > > I get the error > Error in updateCel(pathname, indices = idxs, intensities = intens) : > ?Argument 'indices' is out of range [1,535824]: [1000,9999] So, however odd the error message seems*, it picks up the fact that you pass a character vector and not a numeric vector. (*) We don't validate type of the 'indices' argument in affxparser, but instead trust the user to pass the correct data type (affxparser is designed to be a low-level API for developers, not really end users). We might add it to the todo list. > Using debug, the error is occuring here: > debug: if (r[1] < 1 || r[2] > nbrOfCells) { > ? ?stop("Argument 'indices' is out of range [1,", nbrOfCells, > ? ? ? ?"]: ", "[", r[1], ",", r[2], "]") > } > > When I set up r as in the code with r=range(idxs), I do get >> r > [1] "1000" "9999" > And >> min(idxs) > [1] "1000" >> max(idxs) > [1] "9999" > BUT, random sampling of idxs shows values outside that range - >> idxs[45] > [1] "57961" >>> idxs[9000] > [1] "46169" Yes, this is all because you ask for min and max on *character* vectors, so it is like comparing letters in an alphabet - they are not dealt with as numeric values. If those row-name strings really contains cell (=probe) indices, you can *coerce* them to integers as: idxs <- as.integer(idxs); and it should work. I recommend the following updates: pathnames <- list.files(pattern="[.](CEL)$", full.names=TRUE) sampleNames <- colnames(normed.1) cells <- rownames(normed.1); cells <- as.integer(cells); pathname <- file.path(getwd(), basename(pathname[1])) # or if you wish to update pathnames[1], just use: pathname <- pathnames[1] intensities <- normed.1[,1] updateCel(pathname, indices=cells, intensities=intensities) Things to be careful about: 1. Make sure you do not overwrite your raw/original CEL files, but always create new ones (e.g. by copy the old ones to a new place). 2. Make sure you write to the correct cell indices (probably the same as you used to read the data in the first place). Hope this helps /Henrik > I may well be making a basic programming error, as I am apparently a much > better molecular biologist than I am a programmer! ?Any help you can give > would be much appreciated! > > I will also check out aroma.affymetrix as you suggest. > > thanks again, > Mikki > On Wed, Jul 13, 2011 at 12:31 PM, Henrik Bengtsson <hb at="" biostat.ucsf.edu="">wrote: > >> Hi, >> >> if you are trying to write a vector of probe intensities to a CEL >> file, and you have the corresponding probe indices, then use >> updateCel(), e.g. >> >> ?updateCel(pathname, indices=idxs, intensities=y) >> >> This does not require knowing the CDF structure. ?(FYI, any function >> with "unit" in its name, such as updateCelUnits(), utilizes a CDF >> structure for its reading/writing.) >> >> FYI, if the aroma.affymetrix package [http://www.aroma- project.org/] >> provides a higher-level API on top of affxparser for dealing with >> single or sets of CEL files. ?It also provides a standard file >> structure for hold data sets, so you/each person don't have to invent >> their own each time. ?...and more. ?You might find it useful. ?If >> you're developing custom preprocessing methods, they can be made so >> they plug in transparently. >> >> /Henrik >> >> On Tue, Jul 12, 2011 at 6:59 AM, M Behnke <mkbehnke at="" gmail.com=""> wrote: >> > Hello all, >> > >> > I have normalized data from hgu133a2 microarray experiments using a >> custom >> > algorithm, in the format of a matrix with rowname=probe position and >> columns >> > the normalized intensity values for each sample (125 samples). I am >> trying >> > to write the data back out to individual CEL files using affxparser. ?I >> > successfully created the 'empty' CEL files using createCel. ?Because I >> only >> > have the PM probes, I believe I need to use the updateCelUnits function. >> > Following the vignette, I set up my code for the first sample as such >> (CDF >> > file is in the working directory): >> > >> > pathname <- file.path(getwd(), basename(files[1])) >> > intens=as.list(normed.1[,1]) >> > updateCelUnits(pathname, cdf=NULL, intens) >> > and checked my variables- >> >> pathname >> > [1] "L:/Normed data/CIR_5-D-309.CEL" >> > >> >> intens[1:4] >> > $`129981` >> > [1] 7.566183 >> > $`212118` >> > [1] 8.783549 >> > $`393384` >> > [1] 8.812677 >> > $`84268` >> > [1] 10.39683 >> > However, when I run the function I get the following error: >> >> updateCelUnits(pathname, cdf=NULL, intens) >> > Error in dirname(filename) : a character vector argument expected >> >> is.character(pathname) >> > [1] TRUE >> >> is.vector(pathname) >> > [1] TRUE >> > Since I do appear to be giving it a character vector, I'm stumped. ?Can >> > someone help? >> > My session info appears below. >> > >> > Thank you very much! >> > Mikki Behnke >> > Doctoral Student >> > Virginia Commonwealth University >> > >> > R version 2.13.0 (2011-04-13) >> > Platform: x86_64-pc-mingw32/x64 (64-bit) >> > locale: >> > [1] LC_COLLATE=English_United States.1252 >> > [2] LC_CTYPE=English_United States.1252 >> > [3] LC_MONETARY=English_United States.1252 >> > [4] LC_NUMERIC=C >> > [5] LC_TIME=English_United States.1252 >> > attached base packages: >> > [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base >> > other attached packages: >> > ?[1] affxparser_1.24.0 ? ? ? ? ?hgu133a2hsrefseq.db_14.1.0 >> > ?[3] org.Hs.eg.db_2.5.0 ? ? ? ? RSQLite_0.9-4 >> > ?[5] DBI_0.2-5 ? ? ? ? ? ? ? ? ?AnnotationDbi_1.14.1 >> > ?[7] hgu133a2cdf_2.8.0 ? ? ? ? ?simpleaffy_2.28.0 >> > ?[9] gcrma_2.24.1 ? ? ? ? ? ? ? genefilter_1.34.0 >> > [11] affy_1.30.0 ? ? ? ? ? ? ? ?Biobase_2.12.1 >> > loaded via a namespace (and not attached): >> > [1] affyio_1.20.0 ? ? ? ? annotate_1.30.0 ? ? ? Biostrings_2.20.0 >> > [4] IRanges_1.10.0 ? ? ? ?preprocessCore_1.14.0 splines_2.13.0 >> > [7] survival_2.36-9 ? ? ? tools_2.13.0 ? ? ? ? ?xtable_1.5-6 >> >> >> > >> > ? ? ? ?[[alternative HTML version deleted]] >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor at r-project.org >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >> > > ? ? ? ?[[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 12.8 years ago Henrik Bengtsson ★ 2.4k

0

Entering edit mode

Thank you very much! I have successfully written the data out, now I'm checking to make sure it went to the right places :) I really appreciate your quick help!! On Wed, Jul 13, 2011 at 2:44 PM, Henrik Bengtsson <hb@biostat.ucsf.edu>wrote: > Hi. > > On Wed, Jul 13, 2011 at 10:49 AM, M Behnke <mkbehnke@gmail.com> wrote: > > Hi, > > Thanks for the reply! I didn't think I could use updateCel since I only > > have the 247,899 PM values. > > You are correct that the HG-U133A_2.cdf file specifies 247,899 PM probes; > > > pathname <- "annotationData/chipTypes/HG-U133A_2/HG-U133A_2.cdf"; > > cellsList <- readCdfCellIndices(pathname, stratifyBy="pm"); > > cells <- unlist(cellsList, use.names=FALSE); > > str(cells); > int [1:247899] 315119 315120 315121 315122 315123 ... > > in addition to MM probes and some other control probes. > > FYI, in affxparser, we prefer to refer to probes as "cells", hence the > above names. > > > I did try it though and something strange is > > happening: > > > > With this code: > > files=list.files(pattern="[.](CEL)$") > > names=colnames(normed.1) > > > > pathname <- file.path(getwd(), basename(files[1])) > > intens=(normed.1[,1]) > > idxs=rownames(normed.1) > > Here 'idxs' is a *character* vector (because you use rownames()), but > updateCel() requires a *numeric* vector. > > > updateCel(pathname,indices=idxs,intensities=intens) > > > > I get the error > > Error in updateCel(pathname, indices = idxs, intensities = intens) : > > Argument 'indices' is out of range [1,535824]: [1000,9999] > > So, however odd the error message seems*, it picks up the fact that > you pass a character vector and not a numeric vector. (*) We don't > validate type of the 'indices' argument in affxparser, but instead > trust the user to pass the correct data type (affxparser is designed > to be a low-level API for developers, not really end users). We might > add it to the todo list. > > > Using debug, the error is occuring here: > > debug: if (r[1] < 1 || r[2] > nbrOfCells) { > > stop("Argument 'indices' is out of range [1,", nbrOfCells, > > "]: ", "[", r[1], ",", r[2], "]") > > } > > > > When I set up r as in the code with r=range(idxs), I do get > >> r > > [1] "1000" "9999" > > And > >> min(idxs) > > [1] "1000" > >> max(idxs) > > [1] "9999" > > BUT, random sampling of idxs shows values outside that range - > >> idxs[45] > > [1] "57961" > >>> idxs[9000] > > [1] "46169" > > Yes, this is all because you ask for min and max on *character* > vectors, so it is like comparing letters in an alphabet - they are not > dealt with as numeric values. If those row-name strings really > contains cell (=probe) indices, you can *coerce* them to integers as: > > idxs <- as.integer(idxs); > > and it should work. > > I recommend the following updates: > > pathnames <- list.files(pattern="[.](CEL)$", full.names=TRUE) > sampleNames <- colnames(normed.1) > cells <- rownames(normed.1); > cells <- as.integer(cells); > > pathname <- file.path(getwd(), basename(pathname[1])) > > # or if you wish to update pathnames[1], just use: > pathname <- pathnames[1] > > intensities <- normed.1[,1] > updateCel(pathname, indices=cells, intensities=intensities) > > Things to be careful about: > > 1. Make sure you do not overwrite your raw/original CEL files, but > always create new ones (e.g. by copy the old ones to a new place). > 2. Make sure you write to the correct cell indices (probably the same > as you used to read the data in the first place). > > Hope this helps > /Henrik > > > I may well be making a basic programming error, as I am apparently a much > > better molecular biologist than I am a programmer! Any help you can give > > would be much appreciated! > > > > I will also check out aroma.affymetrix as you suggest. > > > > thanks again, > > Mikki > > On Wed, Jul 13, 2011 at 12:31 PM, Henrik Bengtsson <hb@biostat.ucsf.edu> >wrote: > > > >> Hi, > >> > >> if you are trying to write a vector of probe intensities to a CEL > >> file, and you have the corresponding probe indices, then use > >> updateCel(), e.g. > >> > >> updateCel(pathname, indices=idxs, intensities=y) > >> > >> This does not require knowing the CDF structure. (FYI, any function > >> with "unit" in its name, such as updateCelUnits(), utilizes a CDF > >> structure for its reading/writing.) > >> > >> FYI, if the aroma.affymetrix package [http://www.aroma- project.org/] > >> provides a higher-level API on top of affxparser for dealing with > >> single or sets of CEL files. It also provides a standard file > >> structure for hold data sets, so you/each person don't have to invent > >> their own each time. ...and more. You might find it useful. If > >> you're developing custom preprocessing methods, they can be made so > >> they plug in transparently. > >> > >> /Henrik > >> > >> On Tue, Jul 12, 2011 at 6:59 AM, M Behnke <mkbehnke@gmail.com> wrote: > >> > Hello all, > >> > > >> > I have normalized data from hgu133a2 microarray experiments using a > >> custom > >> > algorithm, in the format of a matrix with rowname=probe position and > >> columns > >> > the normalized intensity values for each sample (125 samples). I am > >> trying > >> > to write the data back out to individual CEL files using affxparser. > I > >> > successfully created the 'empty' CEL files using createCel. Because I > >> only > >> > have the PM probes, I believe I need to use the updateCelUnits > function. > >> > Following the vignette, I set up my code for the first sample as such > >> (CDF > >> > file is in the working directory): > >> > > >> > pathname <- file.path(getwd(), basename(files[1])) > >> > intens=as.list(normed.1[,1]) > >> > updateCelUnits(pathname, cdf=NULL, intens) > >> > and checked my variables- > >> >> pathname > >> > [1] "L:/Normed data/CIR_5-D-309.CEL" > >> > > >> >> intens[1:4] > >> > $`129981` > >> > [1] 7.566183 > >> > $`212118` > >> > [1] 8.783549 > >> > $`393384` > >> > [1] 8.812677 > >> > $`84268` > >> > [1] 10.39683 > >> > However, when I run the function I get the following error: > >> >> updateCelUnits(pathname, cdf=NULL, intens) > >> > Error in dirname(filename) : a character vector argument expected > >> >> is.character(pathname) > >> > [1] TRUE > >> >> is.vector(pathname) > >> > [1] TRUE > >> > Since I do appear to be giving it a character vector, I'm stumped. > Can > >> > someone help? > >> > My session info appears below. > >> > > >> > Thank you very much! > >> > Mikki Behnke > >> > Doctoral Student > >> > Virginia Commonwealth University > >> > > >> > R version 2.13.0 (2011-04-13) > >> > Platform: x86_64-pc-mingw32/x64 (64-bit) > >> > locale: > >> > [1] LC_COLLATE=English_United States.1252 > >> > [2] LC_CTYPE=English_United States.1252 > >> > [3] LC_MONETARY=English_United States.1252 > >> > [4] LC_NUMERIC=C > >> > [5] LC_TIME=English_United States.1252 > >> > attached base packages: > >> > [1] stats graphics grDevices utils datasets methods base > >> > other attached packages: > >> > [1] affxparser_1.24.0 hgu133a2hsrefseq.db_14.1.0 > >> > [3] org.Hs.eg.db_2.5.0 RSQLite_0.9-4 > >> > [5] DBI_0.2-5 AnnotationDbi_1.14.1 > >> > [7] hgu133a2cdf_2.8.0 simpleaffy_2.28.0 > >> > [9] gcrma_2.24.1 genefilter_1.34.0 > >> > [11] affy_1.30.0 Biobase_2.12.1 > >> > loaded via a namespace (and not attached): > >> > [1] affyio_1.20.0 annotate_1.30.0 Biostrings_2.20.0 > >> > [4] IRanges_1.10.0 preprocessCore_1.14.0 splines_2.13.0 > >> > [7] survival_2.36-9 tools_2.13.0 xtable_1.5-6 > >> >> > >> > > >> > [[alternative HTML version deleted]] > >> > > >> > _______________________________________________ > >> > Bioconductor mailing list > >> > Bioconductor@r-project.org > >> > https://stat.ethz.ch/mailman/listinfo/bioconductor > >> > Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > >> > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > [[alternative HTML version deleted]]

ADD REPLY • link 12.8 years ago Martha Behnke ▴ 100

0

Entering edit mode

Your idxs seems to be of type character when it ought to be numeric, at least when passed as the indices argument: indices: A 'numeric' 'vector' of cell (probe) indices specifying which cells to updated. If 'NULL', all indices are considered. This should avoid the error: idxs = as.numeric(rownames(normed.1)) On Jul 13, 2011, at 1:49 PM, M Behnke wrote: > Hi, > Thanks for the reply! I didn't think I could use updateCel since I > only > have the 247,899 PM values. I did try it though and something > strange is > happening: > > With this code: > files=list.files(pattern="[.](CEL)$") > names=colnames(normed.1) > > pathname <- file.path(getwd(), basename(files[1])) > intens=(normed.1[,1]) > idxs=rownames(normed.1) > updateCel(pathname,indices=idxs,intensities=intens) > > I get the error > Error in updateCel(pathname, indices = idxs, intensities = intens) : > Argument 'indices' is out of range [1,535824]: [1000,9999] > Using debug, the error is occuring here: > debug: if (r[1] < 1 || r[2] > nbrOfCells) { > stop("Argument 'indices' is out of range [1,", nbrOfCells, > "]: ", "[", r[1], ",", r[2], "]") > } > > When I set up r as in the code with r=range(idxs), I do get >> r > [1] "1000" "9999" > And >> min(idxs) > [1] "1000" >> max(idxs) > [1] "9999" > BUT, random sampling of idxs shows values outside that range - >> idxs[45] > [1] "57961" >>> idxs[9000] > [1] "46169" > I may well be making a basic programming error, as I am apparently > a much > better molecular biologist than I am a programmer! Any help you > can give > would be much appreciated! > > I will also check out aroma.affymetrix as you suggest. > > thanks again, > Mikki > On Wed, Jul 13, 2011 at 12:31 PM, Henrik Bengtsson > <hb at="" biostat.ucsf.edu="">wrote: > >> Hi, >> >> if you are trying to write a vector of probe intensities to a CEL >> file, and you have the corresponding probe indices, then use >> updateCel(), e.g. >> >> updateCel(pathname, indices=idxs, intensities=y) >> >> This does not require knowing the CDF structure. (FYI, any function >> with "unit" in its name, such as updateCelUnits(), utilizes a CDF >> structure for its reading/writing.) >> >> FYI, if the aroma.affymetrix package [http://www.aroma- project.org/] >> provides a higher-level API on top of affxparser for dealing with >> single or sets of CEL files. It also provides a standard file >> structure for hold data sets, so you/each person don't have to invent >> their own each time. ...and more. You might find it useful. If >> you're developing custom preprocessing methods, they can be made so >> they plug in transparently. >> >> /Henrik >> >> On Tue, Jul 12, 2011 at 6:59 AM, M Behnke <mkbehnke at="" gmail.com=""> wrote: >>> Hello all, >>> >>> I have normalized data from hgu133a2 microarray experiments using a >> custom >>> algorithm, in the format of a matrix with rowname=probe position and >> columns >>> the normalized intensity values for each sample (125 samples). I am >> trying >>> to write the data back out to individual CEL files using >>> affxparser. I >>> successfully created the 'empty' CEL files using createCel. >>> Because I >> only >>> have the PM probes, I believe I need to use the updateCelUnits >>> function. >>> Following the vignette, I set up my code for the first sample as >>> such >> (CDF >>> file is in the working directory): >>> >>> pathname <- file.path(getwd(), basename(files[1])) >>> intens=as.list(normed.1[,1]) >>> updateCelUnits(pathname, cdf=NULL, intens) >>> and checked my variables- >>>> pathname >>> [1] "L:/Normed data/CIR_5-D-309.CEL" >>> >>>> intens[1:4] >>> $`129981` >>> [1] 7.566183 >>> $`212118` >>> [1] 8.783549 >>> $`393384` >>> [1] 8.812677 >>> $`84268` >>> [1] 10.39683 >>> However, when I run the function I get the following error: >>>> updateCelUnits(pathname, cdf=NULL, intens) >>> Error in dirname(filename) : a character vector argument expected >>>> is.character(pathname) >>> [1] TRUE >>>> is.vector(pathname) >>> [1] TRUE >>> Since I do appear to be giving it a character vector, I'm >>> stumped. Can >>> someone help? >>> My session info appears below. >>> >>> Thank you very much! >>> Mikki Behnke >>> Doctoral Student >>> Virginia Commonwealth University >>> >>> R version 2.13.0 (2011-04-13) >>> Platform: x86_64-pc-mingw32/x64 (64-bit) >>> locale: >>> [1] LC_COLLATE=English_United States.1252 >>> [2] LC_CTYPE=English_United States.1252 >>> [3] LC_MONETARY=English_United States.1252 >>> [4] LC_NUMERIC=C >>> [5] LC_TIME=English_United States.1252 >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> other attached packages: >>> [1] affxparser_1.24.0 hgu133a2hsrefseq.db_14.1.0 >>> [3] org.Hs.eg.db_2.5.0 RSQLite_0.9-4 >>> [5] DBI_0.2-5 AnnotationDbi_1.14.1 >>> [7] hgu133a2cdf_2.8.0 simpleaffy_2.28.0 >>> [9] gcrma_2.24.1 genefilter_1.34.0 >>> [11] affy_1.30.0 Biobase_2.12.1 >>> loaded via a namespace (and not attached): >>> [1] affyio_1.20.0 annotate_1.30.0 Biostrings_2.20.0 >>> [4] IRanges_1.10.0 preprocessCore_1.14.0 splines_2.13.0 >>> [7] survival_2.36-9 tools_2.13.0 xtable_1.5-6 >>>> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/ > gmane.science.biology.informatics.conductor

ADD REPLY • link 12.8 years ago Harris A. Jaffee ▴ 590

0

Entering edit mode

Martha Behnke ▴ 100

@martha-behnke-3807

Last seen 9.6 years ago

This works -- at least it runs without error. I will do a few and make sure the data looks like data when I read it back in. thank you very much!! On Wed, Jul 13, 2011 at 2:21 PM, Harris A. Jaffee <hj@jhu.edu> wrote: > Your idxs seems to be of type character when it ought to be numeric, > at least when passed as the indices argument: > > indices: A 'numeric' 'vector' of cell (probe) indices specifying which > cells to updated. If 'NULL', all indices are considered. > > This should avoid the error: > > idxs = as.numeric(rownames(normed.1)) > > > On Jul 13, 2011, at 1:49 PM, M Behnke wrote: > > Hi, >> Thanks for the reply! I didn't think I could use updateCel since I only >> have the 247,899 PM values. I did try it though and something strange is >> happening: >> >> With this code: >> files=list.files(pattern="[.](**CEL)$") >> names=colnames(normed.1) >> >> pathname <- file.path(getwd(), basename(files[1])) >> intens=(normed.1[,1]) >> idxs=rownames(normed.1) >> updateCel(pathname,indices=**idxs,intensities=intens) >> >> I get the error >> Error in updateCel(pathname, indices = idxs, intensities = intens) : >> Argument 'indices' is out of range [1,535824]: [1000,9999] >> Using debug, the error is occuring here: >> debug: if (r[1] < 1 || r[2] > nbrOfCells) { >> stop("Argument 'indices' is out of range [1,", nbrOfCells, >> "]: ", "[", r[1], ",", r[2], "]") >> } >> >> When I set up r as in the code with r=range(idxs), I do get >> >>> r >>> >> [1] "1000" "9999" >> And >> >>> min(idxs) >>> >> [1] "1000" >> >>> max(idxs) >>> >> [1] "9999" >> BUT, random sampling of idxs shows values outside that range - >> >>> idxs[45] >>> >> [1] "57961" >> >>> idxs[9000] >>>> >>> [1] "46169" >> I may well be making a basic programming error, as I am apparently a much >> better molecular biologist than I am a programmer! Any help you can give >> would be much appreciated! >> >> I will also check out aroma.affymetrix as you suggest. >> >> thanks again, >> Mikki >> On Wed, Jul 13, 2011 at 12:31 PM, Henrik Bengtsson <hb@biostat.ucsf.edu>> >wrote: >> >> Hi, >>> >>> if you are trying to write a vector of probe intensities to a CEL >>> file, and you have the corresponding probe indices, then use >>> updateCel(), e.g. >>> >>> updateCel(pathname, indices=idxs, intensities=y) >>> >>> This does not require knowing the CDF structure. (FYI, any function >>> with "unit" in its name, such as updateCelUnits(), utilizes a CDF >>> structure for its reading/writing.) >>> >>> FYI, if the aroma.affymetrix package [http://www.aroma- project.org/**] >>> provides a higher-level API on top of affxparser for dealing with >>> single or sets of CEL files. It also provides a standard file >>> structure for hold data sets, so you/each person don't have to invent >>> their own each time. ...and more. You might find it useful. If >>> you're developing custom preprocessing methods, they can be made so >>> they plug in transparently. >>> >>> /Henrik >>> >>> On Tue, Jul 12, 2011 at 6:59 AM, M Behnke <mkbehnke@gmail.com> wrote: >>> >>>> Hello all, >>>> >>>> I have normalized data from hgu133a2 microarray experiments using a >>>> >>> custom >>> >>>> algorithm, in the format of a matrix with rowname=probe position and >>>> >>> columns >>> >>>> the normalized intensity values for each sample (125 samples). I am >>>> >>> trying >>> >>>> to write the data back out to individual CEL files using affxparser. I >>>> successfully created the 'empty' CEL files using createCel. Because I >>>> >>> only >>> >>>> have the PM probes, I believe I need to use the updateCelUnits function. >>>> Following the vignette, I set up my code for the first sample as such >>>> >>> (CDF >>> >>>> file is in the working directory): >>>> >>>> pathname <- file.path(getwd(), basename(files[1])) >>>> intens=as.list(normed.1[,1]) >>>> updateCelUnits(pathname, cdf=NULL, intens) >>>> and checked my variables- >>>> >>>>> pathname >>>>> >>>> [1] "L:/Normed data/CIR_5-D-309.CEL" >>>> >>>> intens[1:4] >>>>> >>>> $`129981` >>>> [1] 7.566183 >>>> $`212118` >>>> [1] 8.783549 >>>> $`393384` >>>> [1] 8.812677 >>>> $`84268` >>>> [1] 10.39683 >>>> However, when I run the function I get the following error: >>>> >>>>> updateCelUnits(pathname, cdf=NULL, intens) >>>>> >>>> Error in dirname(filename) : a character vector argument expected >>>> >>>>> is.character(pathname) >>>>> >>>> [1] TRUE >>>> >>>>> is.vector(pathname) >>>>> >>>> [1] TRUE >>>> Since I do appear to be giving it a character vector, I'm stumped. Can >>>> someone help? >>>> My session info appears below. >>>> >>>> Thank you very much! >>>> Mikki Behnke >>>> Doctoral Student >>>> Virginia Commonwealth University >>>> >>>> R version 2.13.0 (2011-04-13) >>>> Platform: x86_64-pc-mingw32/x64 (64-bit) >>>> locale: >>>> [1] LC_COLLATE=English_United States.1252 >>>> [2] LC_CTYPE=English_United States.1252 >>>> [3] LC_MONETARY=English_United States.1252 >>>> [4] LC_NUMERIC=C >>>> [5] LC_TIME=English_United States.1252 >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods base >>>> other attached packages: >>>> [1] affxparser_1.24.0 hgu133a2hsrefseq.db_14.1.0 >>>> [3] org.Hs.eg.db_2.5.0 RSQLite_0.9-4 >>>> [5] DBI_0.2-5 AnnotationDbi_1.14.1 >>>> [7] hgu133a2cdf_2.8.0 simpleaffy_2.28.0 >>>> [9] gcrma_2.24.1 genefilter_1.34.0 >>>> [11] affy_1.30.0 Biobase_2.12.1 >>>> loaded via a namespace (and not attached): >>>> [1] affyio_1.20.0 annotate_1.30.0 Biostrings_2.20.0 >>>> [4] IRanges_1.10.0 preprocessCore_1.14.0 splines_2.13.0 >>>> [7] survival_2.36-9 tools_2.13.0 xtable_1.5-6 >>>> >>>>> >>>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________**_________________ >>>> Bioconductor mailing list >>>> Bioconductor@r-project.org >>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat="" .ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>> Search the archives: >>>> >>> http://news.gmane.org/gmane.**science.biology.informatics.**conduc tor<http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>> >>>> >>>> >>> >> [[alternative HTML version deleted]] >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > > [[alternative HTML version deleted]]

ADD COMMENT • link 12.8 years ago Martha Behnke ▴ 100

Login before adding your answer.