marrayLayout difficulties
1
0
Entering edit mode
@jeremy-gollub-790
Last seen 10.2 years ago
Hi, all - I'm experiencing very poor performance using the marray package (20 minutes to normalize a single <32,000 spot microarray). Can someone tell me whether this is normal, or what I'm doing wrong? In the process of hunting down some errors, I also noticed some odd (to me) behavior in the marrayLayout maSub slot assignment method, described below. An attempt to "correct" this results in a much faster normalization (~1 minute), which looks good according to the MA plot but produces different numbers in maM than the slower calculation. It seems unlikely that either result is correct (I can choose between suspiciously bad performance, or messing with the marrayLayout object's internals). Thanks for any suggestions - details follow. I'm using R version 1.9.0 on a sparc system running Solaris 2.9. My marray version is 1.5.14. I have a text file, "dat.txt," containing the data I want to normalize. 10 columns, all numeric: in order, FEATURE spot number 1 - 31736 SECTOR unnecessary and unused ROW " COL " PLATE ID of printing plate Gf green channel foreground Rf red channel foreground Gb green channel background Rb red channel background W spot weights, either 0 or 1 Array parameters are: Ngr = 8, Ngc = 4, Nsr = 31, Nsc = 32, Nspots = 31744. Not all spots are printed (ragged ends to each block). Only printed spots are included in the data file, so there are gaps in the FEATURE column sequence but no blank lines in the file. The session: > library(marray) > > # Read file. > > dat <- read.table('dat.txt', header = TRUE) > > # Construct maSub: 1 for each printed spot, 0 for absent spots. > > seq <- c(1:31744) > int <- intersect(seq, as.numeric(dat[,1])) > sub <- rep(0, 31744) > sub[int] <- 1 > > # Note contents of sub around the end of the first block and beginning > # of the second: > > print(sub[980:1000]) [1] 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 > # total of 31488 present spots > sum(sub) [1] 31488 > > # Construct marrayLayout object. > > ml <- new("marrayLayout", maNgr = 8, maNgc = 4, maNsr = 31, maNsc = 32, + maNspots = 31744) > maSub(ml) <- sub > maPlate(ml) <- as.factor(dat[,5]) > > # Note contents of maSub: > > sum(ml@maSub) [1] 1 > length(ml@maSub) [1] 31744 > print(sub[1:20]) [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 > print(ml@maSub[1:20]) [1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > print(ml@maSub[980:1000]) [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > > # Now meddle with ml@maSub (set it back the way I think it should be). > # Or don't - see comment on maNormMain step, below. > > maSub(ml)[int] <- TRUE > > # construct marrayRaw object. > > mr <- new("marrayRaw", + maGf = matrix(dat[,6], ncol = 1), + maRf = matrix(dat[,7], ncol = 1), + maGb = matrix(dat[,8], ncol = 1), + maRb = matrix(dat[,9], ncol = 1), + maW = matrix(dat[,10], ncol = 1), + maLayout = ml) > > # This step takes about one minute if I do maSub(ml)[int] <- TRUE > # as indicated above. If I don't, it takes about 20 minutes. > # The results differ, although the MA plot looks normalized either way. > > mn <- maNormMain(mr, f.loc = list(maNormLoess(x="maA", y="maM", + z="maPrintTip", w=NULL, subset=TRUE, span = 0.4)), + f.scale = list(maNormMAD(x = "maPrintTip", y = "maM", + geo = FALSE, subset = TRUE)), + Mloc = TRUE, Mscale = TRUE) -- Jeremy Gollub, Ph.D. jgollub@genome.stanford.edu (W) 650/736-0075
PROcess marray PROcess marray • 1.0k views
ADD COMMENT
0
Entering edit mode
@jean-yee-hwa-yang-104
Last seen 10.2 years ago
Hi Jeremy, That sounds very slow from my experience. Which image analysis software did you get your data from? If you send me an example file off-line, I will take a look at it for you, I need to take a look to see if maSub was set properly, as this does make a big different in print-tip normalization. Alternatively, try the latest verion 1.5.17 that is temporary place at http://arrays.ucsf.edu/software/ maNorm was previously very slow for global lowess normalization for larget number of spots but in the new version, we have speed up the code with sampling. However, I don't think this was your problem. I will also suggest trying the swirl data within the marray package and see how long that take on yoru computer data(swirl) norm <- maNorm(swirl) If that takes a min or so that there is something wrong with your data setup. Cheers Jean On Thu, 30 Sep 2004, Jeremy Gollub wrote: > Hi, all - > > I'm experiencing very poor performance using the marray package (20 > minutes to normalize a single <32,000 spot microarray). Can someone > tell me whether this is normal, or what I'm doing wrong? > > In the process of hunting down some errors, I also noticed some odd (to > me) behavior in the marrayLayout maSub slot assignment method, described > below. An attempt to "correct" this results in a much faster > normalization (~1 minute), which looks good according to the MA plot > but produces different numbers in maM than the slower calculation. > > It seems unlikely that either result is correct (I can choose between > suspiciously bad performance, or messing with the marrayLayout object's > internals). > > Thanks for any suggestions - details follow. > > I'm using R version 1.9.0 on a sparc system running Solaris 2.9. My > marray version is 1.5.14. > > I have a text file, "dat.txt," containing the data I want to normalize. > 10 columns, all numeric: in order, > FEATURE spot number 1 - 31736 > SECTOR unnecessary and unused > ROW " > COL " > PLATE ID of printing plate > Gf green channel foreground > Rf red channel foreground > Gb green channel background > Rb red channel background > W spot weights, either 0 or 1 > > Array parameters are: Ngr = 8, Ngc = 4, Nsr = 31, Nsc = 32, Nspots = > 31744. Not all spots are printed (ragged ends to each block). Only > printed spots are included in the data file, so there are gaps in the > FEATURE column sequence but no blank lines in the file. > > The session: > > > library(marray) > > > > # Read file. > > > > dat <- read.table('dat.txt', header = TRUE) > > > > # Construct maSub: 1 for each printed spot, 0 for absent spots. > > > > seq <- c(1:31744) > > int <- intersect(seq, as.numeric(dat[,1])) > > sub <- rep(0, 31744) > > sub[int] <- 1 > > > > # Note contents of sub around the end of the first block and beginning > > # of the second: > > > > print(sub[980:1000]) > [1] 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 > > # total of 31488 present spots > > sum(sub) > [1] 31488 > > > > # Construct marrayLayout object. > > > > ml <- new("marrayLayout", maNgr = 8, maNgc = 4, maNsr = 31, maNsc = 32, > + maNspots = 31744) > > maSub(ml) <- sub > > maPlate(ml) <- as.factor(dat[,5]) > > > > # Note contents of maSub: > > > > sum(ml@maSub) > [1] 1 > > length(ml@maSub) > [1] 31744 > > print(sub[1:20]) > [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 > > print(ml@maSub[1:20]) > [1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > FALSE > [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > > print(ml@maSub[980:1000]) > [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > FALSE > [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > > > > # Now meddle with ml@maSub (set it back the way I think it should be). > > # Or don't - see comment on maNormMain step, below. > > > > maSub(ml)[int] <- TRUE > > > > # construct marrayRaw object. > > > > mr <- new("marrayRaw", > + maGf = matrix(dat[,6], ncol = 1), > + maRf = matrix(dat[,7], ncol = 1), > + maGb = matrix(dat[,8], ncol = 1), > + maRb = matrix(dat[,9], ncol = 1), > + maW = matrix(dat[,10], ncol = 1), > + maLayout = ml) > > > > # This step takes about one minute if I do maSub(ml)[int] <- TRUE > > # as indicated above. If I don't, it takes about 20 minutes. > > # The results differ, although the MA plot looks normalized either way. > > > > mn <- maNormMain(mr, f.loc = list(maNormLoess(x="maA", y="maM", > + z="maPrintTip", w=NULL, subset=TRUE, span = > 0.4)), > + f.scale = list(maNormMAD(x = "maPrintTip", y = "maM", > + geo = FALSE, subset = TRUE)), > + Mloc = TRUE, Mscale = TRUE) > > -- > Jeremy Gollub, Ph.D. > jgollub@genome.stanford.edu > (W) 650/736-0075 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor >
ADD COMMENT
0
Entering edit mode
Hi, Jean - One of your suggestions solved the problem, apparently. I'll include details here for the sake of the archive. I had previously constructed my marrayLayout object as follows: > dat <- read.table('13998GENEPIX13998.txt', header = TRUE) > # Construct maSub (1 for printed spots, 0 for missing spots) > seq <- c(1:31744) > int <- intersect(seq, as.numeric(dat[,1])) > sub <- rep(0, 31744) > sub[int] <- 1 > # Construct marrayLayout object. > ml <- new("marrayLayout", maNgr = 8, maNgc = 4, maNsr = 31, maNsc = 32, + maNspots = 31744) > maSub(ml) <- sub > maPlate(ml) <- as.factor(dat[,5]) After this, I get > table(ml@maSub) FALSE TRUE 31743 1 On Jean's advice, I instead tried > maSub(ml) <- as.logical(sub) > table(ml@maSub) FALSE TRUE 256 31488 This preserves the correct maSub vector. Performance is now good (~1 minute to normalize), and the results are identical to my previous results with post-setter modification of maSub. I think I'm comfortable assuming that the normalization is correct, since the MA plot looks correct and performance is within reasonable limits. Does this indicate a problem with the numerical form of the maSub slot assignment method? Or did I mis-use it? Many thanks for your help, -- Jeremy Gollub, Ph.D. jgollub@genome.stanford.edu (W) 650/736-0075 On Fri, 1 Oct 2004, Jean Yee Hwa Yang wrote: > Hi Jeremy, > > That sounds very slow from my experience. Which image analysis software > did you get your data from? If you send me an example file off- line, I > will take a look at it for you, I need to take a look to see if maSub was > set properly, as this does make a big different in print-tip > normalization. > > Alternatively, try the latest verion 1.5.17 that is temporary place at > http://arrays.ucsf.edu/software/ > > maNorm was previously very slow for global lowess normalization for larget > number of spots but in the new version, we have speed up the code with > sampling. However, I don't think this was your problem. > > I will also suggest trying the swirl data within the marray package and > see how long that take on yoru computer > > data(swirl) > norm <- maNorm(swirl) > > If that takes a min or so that there is something wrong with your data > setup. > > Cheers > > Jean > > > On Thu, 30 Sep 2004, Jeremy Gollub wrote: > > > Hi, all - > > > > I'm experiencing very poor performance using the marray package (20 > > minutes to normalize a single <32,000 spot microarray). Can someone > > tell me whether this is normal, or what I'm doing wrong? > > > > In the process of hunting down some errors, I also noticed some odd (to > > me) behavior in the marrayLayout maSub slot assignment method, described > > below. An attempt to "correct" this results in a much faster > > normalization (~1 minute), which looks good according to the MA plot > > but produces different numbers in maM than the slower calculation. > > > > It seems unlikely that either result is correct (I can choose between > > suspiciously bad performance, or messing with the marrayLayout object's > > internals). > > > > Thanks for any suggestions - details follow. > > > > I'm using R version 1.9.0 on a sparc system running Solaris 2.9. My > > marray version is 1.5.14. > > > > I have a text file, "dat.txt," containing the data I want to normalize. > > 10 columns, all numeric: in order, > > FEATURE spot number 1 - 31736 > > SECTOR unnecessary and unused > > ROW " > > COL " > > PLATE ID of printing plate > > Gf green channel foreground > > Rf red channel foreground > > Gb green channel background > > Rb red channel background > > W spot weights, either 0 or 1 > > > > Array parameters are: Ngr = 8, Ngc = 4, Nsr = 31, Nsc = 32, Nspots = > > 31744. Not all spots are printed (ragged ends to each block). Only > > printed spots are included in the data file, so there are gaps in the > > FEATURE column sequence but no blank lines in the file. > > > > The session: > > > > > library(marray) > > > > > > # Read file. > > > > > > dat <- read.table('dat.txt', header = TRUE) > > > > > > # Construct maSub: 1 for each printed spot, 0 for absent spots. > > > > > > seq <- c(1:31744) > > > int <- intersect(seq, as.numeric(dat[,1])) > > > sub <- rep(0, 31744) > > > sub[int] <- 1 > > > > > > # Note contents of sub around the end of the first block and beginning > > > # of the second: > > > > > > print(sub[980:1000]) > > [1] 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 > > > # total of 31488 present spots > > > sum(sub) > > [1] 31488 > > > > > > # Construct marrayLayout object. > > > > > > ml <- new("marrayLayout", maNgr = 8, maNgc = 4, maNsr = 31, maNsc = 32, > > + maNspots = 31744) > > > maSub(ml) <- sub > > > maPlate(ml) <- as.factor(dat[,5]) > > > > > > # Note contents of maSub: > > > > > > sum(ml@maSub) > > [1] 1 > > > length(ml@maSub) > > [1] 31744 > > > print(sub[1:20]) > > [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 > > > print(ml@maSub[1:20]) > > [1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > > FALSE > > [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > > > print(ml@maSub[980:1000]) > > [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > > FALSE > > [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > > > > > > # Now meddle with ml@maSub (set it back the way I think it should be). > > > # Or don't - see comment on maNormMain step, below. > > > > > > maSub(ml)[int] <- TRUE > > > > > > # construct marrayRaw object. > > > > > > mr <- new("marrayRaw", > > + maGf = matrix(dat[,6], ncol = 1), > > + maRf = matrix(dat[,7], ncol = 1), > > + maGb = matrix(dat[,8], ncol = 1), > > + maRb = matrix(dat[,9], ncol = 1), > > + maW = matrix(dat[,10], ncol = 1), > > + maLayout = ml) > > > > > > # This step takes about one minute if I do maSub(ml)[int] <- TRUE > > > # as indicated above. If I don't, it takes about 20 minutes. > > > # The results differ, although the MA plot looks normalized either way. > > > > > > mn <- maNormMain(mr, f.loc = list(maNormLoess(x="maA", y="maM", > > + z="maPrintTip", w=NULL, subset=TRUE, span = > > 0.4)), > > + f.scale = list(maNormMAD(x = "maPrintTip", y = "maM", > > + geo = FALSE, subset = TRUE)), > > + Mloc = TRUE, Mscale = TRUE) > > > > -- > > Jeremy Gollub, Ph.D. > > jgollub@genome.stanford.edu > > (W) 650/736-0075 > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > >
ADD REPLY

Login before adding your answer.

Traffic: 704 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6