limma reading genpix files

0

Entering edit mode

Bryce MacIver ▴ 10

@bryce-maciver-715

Last seen 9.6 years ago

hello all limma exprts, I have started using limma to analyse some spotted microarray data but have run into problems trying to read the data files in. From the limma guide, having specified the "files" , then using the command: RG<-read.maimages(files, source="genepix") I get: Read 7863scan3.gpr Error in read.table(fullname, skip = skip, header = TRUE, sep = sep, as.is = TRUE, : more columns than column names I have 4 .gpr files that were generated from genepix version 3 with 7863scan3 being the first I also tried specifying columns as mentioned in the limma guide: RG<-read.maimages(files, columns=list(Rf="F635 Median", Gf="F532 Median", Rb="B635 Median", Gb ="532 Median")) and got Error in "[.data.frame"(obj, , columns$Gb) : undefined columns selected. I have no idea how to interpret these error messages and have to say that my forays into BioConductor have been a frequent exercise in frustration because of constant unintelligible error messages. Could some one please help me in solving these issues. I'm running R 1.8.0 on MacOS X and recently updated limma (1.3?) thanks Bryce [[alternative HTML version deleted]]

Microarray limma Microarray limma • 1.5k views

ADD COMMENT • link updated 20.1 years ago by Gordon Smyth 50k • written 20.1 years ago by Bryce MacIver ▴ 10

0

Entering edit mode

Heidi Dvinge ▴ 30

@heidi-dvinge-717

Last seen 9.6 years ago

> I also tried specifying columns as mentioned in the limma guide: > > RG<-read.maimages(files, columns=list(Rf="F635 Median", Gf="F532 > Median", Rb="B635 Median", Gb ="532 Median")) > Don't you need Gb ="B532 Median" \Heidi > and got > > Error in "[.data.frame"(obj, , columns$Gb) : > undefined columns selected. > > I have no idea how to interpret these error messages and have to say > that my forays into BioConductor have been a frequent exercise in > frustration because of constant unintelligible error messages. Could > some one please help me in solving these issues. > > I'm running R 1.8.0 on MacOS X and recently updated limma (1.3?) > > thanks > Bryce > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >

ADD COMMENT • link 20.1 years ago Heidi Dvinge ▴ 30

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 9 minutes ago

WEHI, Melbourne, Australia

> hello all limma exprts, > > I have started using limma to analyse some spotted microarray data but > have run into problems trying to read the data files in. > From the limma guide, having specified the "files" , then using the > command: > RG<-read.maimages(files, source="genepix") > > I get: > > Read 7863scan3.gpr > Error in read.table(fullname, skip = skip, header = TRUE, sep = sep, > as.is = TRUE, : > more columns than column names Something is wrong with your second data file. Have you looked at it to check? > I have 4 .gpr files that were generated from genepix version 3 with > 7863scan3 being the first > > I also tried specifying columns as mentioned in the limma guide: > > RG<-read.maimages(files, columns=list(Rf="F635 Median", Gf="F532 > Median", Rb="B635 Median", Gb ="532 Median")) > > and got > > Error in "[.data.frame"(obj, , columns$Gb) : > undefined columns selected. The error message seems to me to be not impossible to interpret. It tells you that you've tried to specify a column that doesn't exist and that the offending column is Gb. If you look at your own code you'll see an obvious typo. Gordon > I have no idea how to interpret these error messages and have to say > that my forays into BioConductor have been a frequent exercise in > frustration because of constant unintelligible error messages. Could > some one please help me in solving these issues. > > I'm running R 1.8.0 on MacOS X and recently updated limma (1.3?) > > thanks > Bryce > [[alternative HTML version deleted]]

ADD COMMENT • link 20.1 years ago Gordon Smyth 50k

0

Entering edit mode

Hello all, I've just started plowing thru some two-color arrays and ran across the same bug. The problem I found is due to an efficiency hack that doesn't always work: read.maimages assumes that all the genepix files it reads have the same number of header records. That is, rather than taking the number of header records directly from each .gpr file, it counts the number of lines until the header line in the *first file only*, and then assumes that all the rest of the files have the same number of records. If not, then the skip= parameter is incorrect, read.table() starts in the wrong place, and results are, as they say, unpredictable. The "right" way to do this is to read the first lines of each .gpr file, get the number of header records, and then use read.table with the right number of header records. A quick hack is just to search for the header in each file. I've attached a modification to read.maimages() that does exactly that, at least the three times I've tried it :-) Cheers, Dave Nelson Gordon K Smyth wrote: >>hello all limma exprts, >> >>I have started using limma to analyse some spotted microarray data but >>have run into problems trying to read the data files in. >> From the limma guide, having specified the "files" , then using the >>command: >> RG<-read.maimages(files, source="genepix") >> >>I get: >> >>Read 7863scan3.gpr >>Error in read.table(fullname, skip = skip, header = TRUE, sep = sep, >>as.is = TRUE, : >> more columns than column names > > > Something is wrong with your second data file. Have you looked at it to > check? > > >>I have 4 .gpr files that were generated from genepix version 3 with >>7863scan3 being the first >> >>I also tried specifying columns as mentioned in the limma guide: >> >>RG<-read.maimages(files, columns=list(Rf="F635 Median", Gf="F532 >>Median", Rb="B635 Median", Gb ="532 Median")) >> >>and got >> >>Error in "[.data.frame"(obj, , columns$Gb) : >> undefined columns selected. > > > The error message seems to me to be not impossible to interpret. It tells > you that you've tried to specify a column that doesn't exist and that the > offending column is Gb. If you look at your own code you'll see an > obvious typo. > > Gordon > > >>I have no idea how to interpret these error messages and have to say >>that my forays into BioConductor have been a frequent exercise in >>frustration because of constant unintelligible error messages. Could >>some one please help me in solving these issues. >> >>I'm running R 1.8.0 on MacOS X and recently updated limma (1.3?) >> >>thanks >>Bryce >> [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > -------------- next part -------------- ### ### A hack to read genepix .gpr files with a differ number of header lines in each file. ### ### Does not work with other formats. SHOULD rewrite to take the header count from the second ### line of the file. ### my.read.maimages <- function (files, source = "spot", path = NULL, ext = NULL, names = NULL, columns = NULL, wt.fun = NULL, verbose = TRUE, sep = "\t", quote = "\"", ...) { if (missing(files)) { if (missing(ext)) stop("Must specify input files") else { extregex <- paste("\\.", ext, "$", sep = "") files <- dir(path = ifelse(is.null(path), ".", path), pattern = extregex) files <- sub(extregex, "", files) } } if (!missing(source) && !missing(columns)) stop("Cannot specify both source and columns") source <- match.arg(source, c("arrayvision", "genepix", "imagene", "quantarray", "smd", "spot", "spot.close.open")) if (source == "imagene") return(read.imagene(files = files, path = path, ext = ext, names = names, columns = columns, wt.fun = wt.fun, verbose = verbose, sep = sep, quote = quote, ...)) slides <- as.vector(as.character(files)) if (!is.null(ext)) slides <- paste(slides, ext, sep = ".") nslides <- length(slides) if (is.null(names)) names <- removeExt(files) if (is.null(columns)) columns <- switch(source, smd = list(Gf = "CH1I_MEAN", Gb = "CH1B_MEDIAN", Rf = "CH2I_MEAN", Rb = "CH2B_MEDIAN"), spot = list(Rf = "Rmean", Gf = "Gmean", Rb = "morphR", Gb = "morphG"), spot.close.open = list(Rf = "Rmean", Gf = "Gmean", Rb = "morphR.close.open", Gb = "morphG.close.open"), genepix = list(Rf = "F635 Mean", Gf = "F532 Mean", Rb = "B635 Median", Gb = "B532 Median"), quantarray = list(Rf = "ch2 Intensity", Gf = "ch1 Intensity", Rb = "ch2 Background", Gb = "ch1 Background")) fullname <- slides[1] if (!is.null(path)) fullname <- file.path(path, fullname) if (source == "quantarray") { firstfield <- scan(fullname, what = "", sep = "\t", flush = TRUE, quiet = TRUE, blank.lines.skip = FALSE, multi.line = FALSE) skip <- grep("Begin Data", firstfield) if (length(skip) == 0) stop("Cannot find \"Begin Data\" in image output file") nspots <- grep("End Data", firstfield) - skip - 2 obj <- read.table(fullname, skip = skip, header = TRUE, sep = sep, quote = quote, as.is = TRUE, check.names = FALSE, comment.char = "", nrows = nspots, ...) } else if (source == "arrayvision") { skip <- 1 cn <- scan(fullname, what = "", sep = sep, quote = quote, skip = 1, nlines = 1, quiet = TRUE) fg <- grep("^Median Dens - RFU", cn) if (length(fg) != 2) stop(paste("Cannot find foreground columns in", fullname)) bg <- grep("Bkgd", cn) if (length(fg) != 2) stop(paste("Cannot find background columns in", fullname)) columns <- list(Rf = fg[1], Rb = bg[1], Gf = fg[2], Gb = bg[2]) obj <- read.table(fullname, skip = skip, header = TRUE, sep = sep, quote = quote, as.is = TRUE, check.names = FALSE, comment.char = "", ...) nspots <- nrow(obj) } else { skip <- grep(columns$Rf, readLines(fullname, n = 80)) - 1 if (length(skip) == 0) stop(paste("Cannot find column heading in image output file", fullname)) else skip <- skip[1] if (verbose) cat("Reading", fullname, "after skipping", skip, "...") obj <- read.table(fullname, skip = skip, header = TRUE, sep = sep, quote = quote, as.is = TRUE, check.names = FALSE, comment.char = "", ...) if (verbose) cat("\tDone.\n") nspots <- nrow(obj) } Y <- matrix(0, nspots, nslides) colnames(Y) <- names RG <- list(R = Y, G = Y, Rb = Y, Gb = Y) if (source == "smd") { anncol <- grep(columns$Gf, colnames(obj)) - 1 if (anncol > 0) RG$genes <- data.frame(obj[, 1:anncol]) } if (!is.null(wt.fun)) RG$weights <- Y for (i in 1:nslides) { if (i > 1) { fullname <- slides[i] if (!is.null(path)) fullname <- file.path(path, fullname) ## ## HACK HACK. Works for genepix files, but others??? ## skip <- grep(columns$Rf, readLines(fullname, n = 80)) - 1 if (length(skip) == 0) stop(paste("Cannot find column heading in image output file", fullname)) else skip <- skip[1] if (verbose) cat("Reading", fullname, "after skipping", skip, "...") obj <- read.table(fullname, skip = skip, header = TRUE, sep = sep, as.is = TRUE, quote = quote, check.names = FALSE, comment.char = "", nrows = nspots, ...) cat("\tDone.\n") } RG$R[, i] <- obj[, columns$Rf] RG$G[, i] <- obj[, columns$Gf] RG$Rb[, i] <- obj[, columns$Rb] RG$Gb[, i] <- obj[, columns$Gb] if (!is.null(wt.fun)) RG$weights[, i] <- wt.fun(obj) # if (verbose) # cat(paste("Read", fullname, "\n")) } new("RGList", RG) }

ADD REPLY • link 20.0 years ago David O. Nelson ▴ 60

0

Entering edit mode

Dear Dave, Thanks for the diagnosis. You are quite correct that read.maimages() does assume that all the gpr files in a batch have headers of the same number of lines, and this assumption should be relaxed. As it happens, the assumption has always been true for genepix data sets that I have seen. Gordon At 03:53 AM 14/04/2004, David Nelson wrote: >Hello all, > >I've just started plowing thru some two-color arrays and ran across the >same bug. > >The problem I found is due to an efficiency hack that doesn't always work: >read.maimages assumes that all the genepix files it reads have the same >number of header records. That is, rather than taking the number of header >records directly from each .gpr file, it counts the number of lines until >the header line in the *first file only*, and then assumes that all the >rest of the files have the same number of records. > >If not, then the skip= parameter is incorrect, read.table() starts in the >wrong place, and results are, as they say, unpredictable. > >The "right" way to do this is to read the first lines of each .gpr file, >get the number of header records, and then use read.table with the right >number of header records. A quick hack is just to search for the header in >each file. > >I've attached a modification to read.maimages() that does exactly that, at >least the three times I've tried it :-) > > >Cheers, > >Dave Nelson

ADD REPLY • link 20.0 years ago Gordon Smyth 50k

Login before adding your answer.