Question

beadarray readIllumina suggestions

0

Entering edit mode

Keith James ▴ 10

@keith-james-2235

Last seen 9.6 years ago

I am reading single channel bead level data using the readIllumina function. The docs indicate that there is a path parameter: path character string specifying the location of files to be read by the function Calling the function with a path argument results in an error: readIllumina(path = "/path/to/data", txtType = ".txt") Error in strtrim(x, width) : invalid 'width' argument > traceback() 4: strtrim(xyFiles, nchar(xyFiles) - 4) 3: as.vector(y) 2: intersect(strtrim(GImages, nchar(GImages) - 8), strtrim(xyFiles, nchar(xyFiles) - 4)) 1: readIllumina(path = "/path/to/data", txtType = ".txt") I've seen in another post that readIllumina expects the files to be in the working directory, and this is the case since this line in the function relies on the default path for dir calls: GImages = dir(pattern = "_Grn.tif") At first I took this to be a documentation bug, but in fact the path argument is honoured for loading the csv files: file = csv_files[i] if (!is.null(path)) file = file.path(path, file) and the annotation (.opa file): if (!is.null(path)) annoFile = file.path(path, annoFile) but apparently not for loading the metrics file: if (metrics) { metrics = dir(pattern = metricsFile) My suggestion is to make the behaviour consistent across all the data, i.e. to honour a path argument for tif and metrics files. It is probably worth noting in the docs the assumptions the function makes about the files it expects in the data directory. i.e. that *all* tif images will be loaded and *all* .txt files. My data directory contained other .tif and .txt files ("targets.txt", "notes.txt") which caused the function to choke. I think that it is optimistic to assume that users will have no other such files present. In addition, I wonder whether the column names in the csv/txt files vary with the version of the scanner or scanner software. Instead of ProbeID G Gb GrnX GrnY or similar, we have Code Grn GrnX GrnY So, 4 columns rather than 3 or 5. Finally, we appreciate all the work you've done in enabling us to work with raw Illumina data. Many thanks. > sessionInfo() R version 2.5.0 (2007-04-23) i686-pc-linux-gnu locale: LC_CTYPE=en_GB;LC_NUMERIC=C;LC_TIME=en_GB;LC_COLLATE=en_GB;LC_MONETARY =en_GB;LC_MESSAGES=en_GB;LC_PAPER=en_GB;LC_NAME=C;LC_ADDRESS=C;LC_TELE PHONE=C;LC_MEASUREMENT=en_GB;LC_IDENTIFICATION=C attached base packages: [1] "grid" "tools" "stats" "graphics" "grDevices" "utils" [7] "datasets" "methods" "base" other attached packages: beadarray beadarraySNP quantsmooth lodplot quantreg SparseM "1.4.0" "1.2.0" "1.2.0" "1.1" "4.06" "0.73" affy affyio geneplotter lattice annotate Biobase "1.14.0" "1.4.0" "1.14.0" "0.15-4" "1.14.1" "1.14.0" limma "2.10.0" -- - Keith James <kdj at="" sanger.ac.uk=""> Microarray Informatics Group - - The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK - -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

Microarray Annotation geneplotter affy affyio beadarray quantsmooth beadarraySNP affy • 1.2k views

ADD COMMENT • link updated 16.8 years ago by Matt Ritchie ▴ 460 • written 16.8 years ago by Keith James ▴ 10

score 0 · Answer 1 · 2007-06-25

Dear Keith, Thanks for your mail and suggestions. You're right, there is an inconsistency in the code for the path argument. It was fixed a little while ago in the developmental version of beadarray (1.5.1 from memory), so if you upgrade to the latest developmental version of the package, the 'path' argument should work as described. I'd also recommend specifying the arrays which you want to read in explicitly using the 'arrayNames' argument of readIllumina(). I always do this to ensure the arrays are read in in the same order as they appear in my targets file (which contains sample information), rather than the default order. And yes, the column names of the .txt or .csv files do vary with the version of BeadScan, and the type of array (single-channel/two-colour). There is some checking in readIllumina to see how many columns there are and from memory 4 columns are supported. If the format you have doesn't read in properly, perhaps you can send us some example files for us to have a look at. Best wishes, Matt On 25/6/07 14:07, "Keith James" <kdj at="" sanger.ac.uk=""> wrote: > > I am reading single channel bead level data using the readIllumina function. > The docs indicate that there is a path parameter: > > path character string specifying the location of files to be read by the > function > > Calling the function with a path argument results in an error: > > readIllumina(path = "/path/to/data", txtType = ".txt") > Error in strtrim(x, width) : invalid 'width' argument >> traceback() > 4: strtrim(xyFiles, nchar(xyFiles) - 4) > 3: as.vector(y) > 2: intersect(strtrim(GImages, nchar(GImages) - 8), strtrim(xyFiles, > nchar(xyFiles) - 4)) > 1: readIllumina(path = "/path/to/data", txtType = ".txt") > > I've seen in another post that readIllumina expects the files to be in the > working directory, and this is the case since this line in the function > relies on the default path for dir calls: > > GImages = dir(pattern = "_Grn.tif") > > At first I took this to be a documentation bug, but in fact the path argument > is honoured for loading the csv files: > > file = csv_files[i] > if (!is.null(path)) > file = file.path(path, file) > > and the annotation (.opa file): > > if (!is.null(path)) > annoFile = file.path(path, annoFile) > > but apparently not for loading the metrics file: > > if (metrics) { > metrics = dir(pattern = metricsFile) > > My suggestion is to make the behaviour consistent across all the data, i.e. to > honour a path argument for tif and metrics files. > > It is probably worth noting in the docs the assumptions the function makes > about the files it expects in the data directory. i.e. that *all* tif images > will be loaded and *all* .txt files. My data directory contained other .tif > and .txt files ("targets.txt", "notes.txt") which caused the function to > choke. I think that it is optimistic to assume that users will have no other > such files present. > > In addition, I wonder whether the column names in the csv/txt files vary with > the version of the scanner or scanner software. Instead of > > ProbeID G Gb GrnX GrnY > > or similar, we have > > Code Grn GrnX GrnY > > So, 4 columns rather than 3 or 5. > > Finally, we appreciate all the work you've done in enabling us to work with > raw Illumina data. Many thanks. > >> sessionInfo() > R version 2.5.0 (2007-04-23) > i686-pc-linux-gnu > > locale: > LC_CTYPE=en_GB;LC_NUMERIC=C;LC_TIME=en_GB;LC_COLLATE=en_GB;LC_MONETA RY=en_GB;L > C_MESSAGES=en_GB;LC_PAPER=en_GB;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE= C;LC_MEASU > REMENT=en_GB;LC_IDENTIFICATION=C > > attached base packages: > [1] "grid" "tools" "stats" "graphics" "grDevices" "utils" > [7] "datasets" "methods" "base" > > other attached packages: > beadarray beadarraySNP quantsmooth lodplot quantreg SparseM > "1.4.0" "1.2.0" "1.2.0" "1.1" "4.06" "0.73" > affy affyio geneplotter lattice annotate Biobase > "1.14.0" "1.4.0" "1.14.0" "0.15-4" "1.14.1" "1.14.0" > limma > "2.10.0" >