Question

Question on PROcess package

0

Entering edit mode

Davis, Wade ▴ 350

@davis-wade-2803

Last seen 9.7 years ago

Hi Farida, I used to use the PROcess package extensively, but I haven't much for the past 2 years. I ran into the same problem that you did, so I wrote a modified version of the rmBaseline function that fixes that, and does some other things that you may find handy later on. The parts that should interest you most are the highcut and lowcut options. The normed and rawout options are not used that often. I last ran this code under R 2.6.1, so I am not sure if it will work without a few tweaks. Good luck, Wade rmBaseline2<-function(fldr, bseoffrda = NULL, breaks = 200, qntl = 0, method = "loess", lowcut=0, highcut=195000, bw = 0.1, rawout=FALSE,normed=FALSE, SpecNames = list.files(fldr, pattern = "\\.*csv\\.*")) { ################################################## ## modified BATCH function for baseline subtraction ################################################## # Modified version of rmBaseline function in PROcess package. # This version allows you to specify the mass range to consider # for baseline removal via the inputs lowcut and highcut. # This was written to accounts for minor differences in the spectra length # due to the laser firing for slightly different lengths of time. # # Use rawout=TRUE if you want all of the spectra read in and # stored in a matrix without actually baseline subtracting. # (This is useful for taking advantage of plotting routines # that were originally written for spectra after they had been baseline subtracted.) # # The use of normed=T is more rare. It was written as part of an exploratory analysis # I did to see if it made a difference if you normalized, then baseline subtracted # rather than the traditional process of baseline subtracting and then normalizing # My analysis showed that it made no appreciable difference, so I am sticking # with the status quo. SpecNames.abbrev<-unlist(strsplit(SpecNames,split = " [0-9]{3} "))[seq(2,2*length(SpecNames),2)] if(normed==FALSE){ fs <- SpecNames n <- length(fs) #peek at dimensions to create empty matrix ftemp <- read.files(file.path(fldr,paste(SpecNames), fsep ="\\")[1]) ftemp2 <- ftemp[ftemp[, 1] > lowcut & ftemp[, 1] < highcut, ] bseoffM<-matrix(data=0.0123456,ncol=n,nrow=dim(ftemp2)[1]) for (j in 1:n) { f1 <- read.files(file.path(fldr,paste(SpecNames), fsep ="\\")[j]) fcut <- f1[f1[, 1] > lowcut & f1[, 1] < highcut, ] if(rawout==FALSE){bseoffM[,j] <- bslnoff(fcut, breaks = breaks, qntl = qntl, method = method, bw = bw)[,2] } if(rawout==TRUE){bseoffM[,j]<-fcut[,2]} if (j==1){rownames(bseoffM) <- signif(bslnoff(fcut, breaks = breaks, qntl = qntl, method = method, bw = bw)[,1],6) } } colnames(bseoffM) <- SpecNames } if(normed==TRUE){ fs <- fldr n <- ncol(fs) for (j in 1:n) { f1 <- cbind(as.numeric(rownames(fs)),fs[,j]) fcut <- f1[f1[, 1] > lowcut & f1[, 1] < highcut, ] bseoff <- bslnoff(fcut, breaks = breaks, qntl = qntl, method = method, bw = bw) if (j > 1) bseoffM <- cbind(bseoffM, bseoff[, 2]) else bseoffM <- bseoff[, 2] } dimnames(bseoffM) <- list(signif(bseoff[, 1], 6), SpecNames=colnames(fldr)) } if (!is.null(bseoffrda)) save(list = bseoffM, file = bseoffrda) bseoffM } ##EXAMPLE # rmBaseline2(fldr=seldipath(basedir="W:\\Master6\\Raw Specta",chiptype="IMAC",inten="high") # ,breaks = 2 # ,qntl = 0 # ,method = "approx" # ,bw = 0.1, # highcut=50000 # ) -----Original Message----- From: Farida Mostajabi [mailto:f0most01@louisville.edu] Sent: Monday, November 30, 2009 2:48 PM To: bioconductor at stat.math.ethz.ch Subject: [BioC] Question on PROcess package To whom it may concern, I am a student from University of Louisville, USA. I am currently doing some MALDI-TOF MS data analysis research with PROcess package. I am trying to use the batch functionality of the package to do pre processing on 286 spectra. The m/z values are not exactly the same throughout the spectra, which I think it is an assumption in PROcess package. I used the code below to do baseline correction for one spectrum at a time B.fs <- list.files(my.B.files, pattern = "\\.*csv\\.*", full.names = TRUE) nb.file <- length(B.fs) foo<-lapply(seq(nb.file), function(i) read.files(B.fs[i] )) f0<-lapply(seq(nb.file), function(i) foo[[i]][foo[[i]][,1]>0,]) basecorr<-lapply(seq(nb.file), function(i) bslnoff(f0[[i]], method = "loess", bw = 0.1)) I could not use "rmBaseline" function since the row-names of the returning matrix are the m/z values, which in my case, are not identical. Would you please give some suggestions on this issue? Best Regards, Farida

PROcess PROcess • 963 views

ADD COMMENT • link updated 14.4 years ago by Farida Mostajabi ▴ 20 • written 14.4 years ago by Davis, Wade ▴ 350

score 0 · Answer 1 · 2009-12-10

Hi Wade, Thank you for the code.My question on the code is: On the part, you create empty matrix, the matrix dimension is chosen based on the dimension of the first spectrum. ftemp <- read.files(file.path(fldr,paste(SpecNames), fsep ="\\")[1]) ftemp2 <- ftemp[ftemp[, 1] > lowcut & ftemp[, 1] < highcut, ] bseoffM<-matrix(data=0.0123456,ncol=n,nrow=dim(ftemp2)[1]) what if the dimention of other spectrum are different, which is the case for our problem? On the next part of the program, when it fills the matrix elements with baseline corrected values, I receive this error "number of items to replace is not a multiple of replacement length" How did you approach this issue? Thanks, Farida Hi Farida, I used to use the PROcess package extensively, but I haven't much for the past 2 years. I ran into the same problem that you did, so I wrote a modified version of the rmBaseline function that fixes that, and does some other things that you may find handy later on. The parts that should interest you most are the highcut and lowcut options. The normed and rawout options are not used that often. I last ran this code under R 2.6.1, so I am not sure if it will work without a few tweaks. Good luck, Wade rmBaseline2<-function(fldr, bseoffrda = NULL, breaks = 200, qntl = 0, method = "loess", lowcut=0, highcut=195000, bw = 0.1, rawout=FALSE,normed=FALSE, SpecNames = list.files(fldr, pattern = "\\.*csv\\.*")) { ################################################## ## modified BATCH function for baseline subtraction ################################################## # Modified version of rmBaseline function in PROcess package. # This version allows you to specify the mass range to consider # for baseline removal via the inputs lowcut and highcut. # This was written to accounts for minor differences in the spectra length # due to the laser firing for slightly different lengths of time. # # Use rawout=TRUE if you want all of the spectra read in and # stored in a matrix without actually baseline subtracting. # (This is useful for taking advantage of plotting routines # that were originally written for spectra after they had been baseline subtracted.) # # The use of normed=T is more rare. It was written as part of an exploratory analysis # I did to see if it made a difference if you normalized, then baseline subtracted # rather than the traditional process of baseline subtracting and then normalizing # My analysis showed that it made no appreciable difference, so I am sticking # with the status quo. SpecNames.abbrev<-unlist(strsplit(SpecNames,split = " [0-9]{3} "))[seq(2,2*length(SpecNames),2)] if(normed==FALSE){ fs <- SpecNames n <- length(fs) #peek at dimensions to create empty matrix ftemp <- read.files(file.path(fldr,paste(SpecNames), fsep ="\\")[1]) ftemp2 <- ftemp[ftemp[, 1] > lowcut & ftemp[, 1] < highcut, ] bseoffM<-matrix(data=0.0123456,ncol=n,nrow=dim(ftemp2)[1]) for (j in 1:n) { f1 <- read.files(file.path(fldr,paste(SpecNames), fsep ="\\")[j]) fcut <- f1[f1[, 1] > lowcut & f1[, 1] < highcut, ] if(rawout==FALSE){bseoffM[,j] <- bslnoff(fcut, breaks = breaks, qntl = qntl, method = method, bw = bw)[,2] } if(rawout==TRUE){bseoffM[,j]<-fcut[,2]} if (j==1){rownames(bseoffM) <- signif(bslnoff(fcut, breaks = breaks, qntl = qntl, method = method, bw = bw)[,1],6) } } colnames(bseoffM) <- SpecNames } if(normed==TRUE){ fs <- fldr n <- ncol(fs) for (j in 1:n) { f1 <- cbind(as.numeric(rownames(fs)),fs[,j]) fcut <- f1[f1[, 1] > lowcut & f1[, 1] < highcut, ] bseoff <- bslnoff(fcut, breaks = breaks, qntl = qntl, method = method, bw = bw) if (j > 1) bseoffM <- cbind(bseoffM, bseoff[, 2]) else bseoffM <- bseoff[, 2] } dimnames(bseoffM) <- list(signif(bseoff[, 1], 6), SpecNames=colnames(fldr)) } if (!is.null(bseoffrda)) save(list = bseoffM, file = bseoffrda) bseoffM } ##EXAMPLE # rmBaseline2(fldr=seldipath(basedir="W:\\Master6\\Raw Specta",chiptype="IMAC",inten="high") # ,breaks = 2 # ,qntl = 0 # ,method = "approx" # ,bw = 0.1, # highcut=50000 # ) -----Original Message----- From: Farida Mostajabi [mailto:f0most01@louisville.edu] Sent: Monday, November 30, 2009 2:48 PM To: bioconductor at stat.math.ethz.ch Subject: [BioC] Question on PROcess package To whom it may concern, I am a student from University of Louisville, USA. I am currently doing some MALDI-TOF MS data analysis research with PROcess package. I am trying to use the batch functionality of the package to do pre processing on 286 spectra. The m/z values are not exactly the same throughout the spectra, which I think it is an assumption in PROcess package. I used the code below to do baseline correction for one spectrum at a time B.fs <- list.files(my.B.files, pattern = "\\.*csv\\.*", full.names = TRUE) nb.file <- length(B.fs) foo<-lapply(seq(nb.file), function(i) read.files(B.fs[i] )) f0<-lapply(seq(nb.file), function(i) foo[[i]][foo[[i]][,1]>0,]) basecorr<-lapply(seq(nb.file), function(i) bslnoff(f0[[i]], method = "loess", bw = 0.1)) I could not use "rmBaseline" function since the row-names of the returning matrix are the m/z values, which in my case, are not identical. Would you please give some suggestions on this issue? Best Regards, Farida

score 0 · Answer 2 · 2009-12-11

Hi Farida, The key here is to choose the correct values for the lowcut and/or highcut options. Most MS-TOF experts will tell you that the counts (data) arriving at the detector for the first several hundred Daltons (even 1k-2k Da, depending on the technology, machine settings, and who you ask) is basically trash, and shouldn't be used. So it is during the very first few milliseconds (e.g. hundred Daltons) where most spectra don't agree, so you use lowcut to start the your data at the lowest Dalton of interest, or the lowest Dalton value that is common to all your spectra. My function doesn't find these values (lowcut and highcut) for you automatically, because the investigator should know the Dalton range of interest that the MS analyzer was set up to detect. If you don't know that, I'd recommend 700 Da or 1000 Da for low laser-intensity experiments, which I'm assuming you are doing for simplicity's sake of this email. I will say that if your starting Da (or ending Da) of ALL your spectra differ by more than 1-3 Da, something is wrong with the experiment, or data is being mixed together that shouldn't be. If that is the case, email me directly and I will try to help you. Best of luck, Wade ________________________________________ From: Farida Mostajabi [f0most01@louisville.edu] Sent: Thursday, December 10, 2009 2:36 PM To: Davis, Wade; bioconductor at stat.math.ethz.ch Subject: RE: [BioC] Question on PROcess package Hi Wade, Thank you for the code.My question on the code is: On the part, you create empty matrix, the matrix dimension is chosen based on the dimension of the first spectrum. ftemp <- read.files(file.path(fldr,paste(SpecNames), fsep ="\\")[1]) ftemp2 <- ftemp[ftemp[, 1] > lowcut & ftemp[, 1] < highcut, ] bseoffM<-matrix(data=0.0123456,ncol=n,nrow=dim(ftemp2)[1]) what if the dimention of other spectrum are different, which is the case for our problem? On the next part of the program, when it fills the matrix elements with baseline corrected values, I receive this error "number of items to replace is not a multiple of replacement length" How did you approach this issue? Thanks, Farida Hi Farida, I used to use the PROcess package extensively, but I haven't much for the past 2 years. I ran into the same problem that you did, so I wrote a modified version of the rmBaseline function that fixes that, and does some other things that you may find handy later on. The parts that should interest you most are the highcut and lowcut options. The normed and rawout options are not used that often. I last ran this code under R 2.6.1, so I am not sure if it will work without a few tweaks. Good luck, Wade rmBaseline2<-function(fldr, bseoffrda = NULL, breaks = 200, qntl = 0, method = "loess", lowcut=0, highcut=195000, bw = 0.1, rawout=FALSE,normed=FALSE, SpecNames = list.files(fldr, pattern = "\\.*csv\\.*")) { ################################################## ## modified BATCH function for baseline subtraction ################################################## # Modified version of rmBaseline function in PROcess package. # This version allows you to specify the mass range to consider # for baseline removal via the inputs lowcut and highcut. # This was written to accounts for minor differences in the spectra length # due to the laser firing for slightly different lengths of time. # # Use rawout=TRUE if you want all of the spectra read in and # stored in a matrix without actually baseline subtracting. # (This is useful for taking advantage of plotting routines # that were originally written for spectra after they had been baseline subtracted.) # # The use of normed=T is more rare. It was written as part of an exploratory analysis # I did to see if it made a difference if you normalized, then baseline subtracted # rather than the traditional process of baseline subtracting and then normalizing # My analysis showed that it made no appreciable difference, so I am sticking # with the status quo. SpecNames.abbrev<-unlist(strsplit(SpecNames,split = " [0-9]{3} "))[seq(2,2*length(SpecNames),2)] if(normed==FALSE){ fs <- SpecNames n <- length(fs) #peek at dimensions to create empty matrix ftemp <- read.files(file.path(fldr,paste(SpecNames), fsep ="\\")[1]) ftemp2 <- ftemp[ftemp[, 1] > lowcut & ftemp[, 1] < highcut, ] bseoffM<-matrix(data=0.0123456,ncol=n,nrow=dim(ftemp2)[1]) for (j in 1:n) { f1 <- read.files(file.path(fldr,paste(SpecNames), fsep ="\\")[j]) fcut <- f1[f1[, 1] > lowcut & f1[, 1] < highcut, ] if(rawout==FALSE){bseoffM[,j] <- bslnoff(fcut, breaks = breaks, qntl = qntl, method = method, bw = bw)[,2] } if(rawout==TRUE){bseoffM[,j]<-fcut[,2]} if (j==1){rownames(bseoffM) <- signif(bslnoff(fcut, breaks = breaks, qntl = qntl, method = method, bw = bw)[,1],6) } } colnames(bseoffM) <- SpecNames } if(normed==TRUE){ fs <- fldr n <- ncol(fs) for (j in 1:n) { f1 <- cbind(as.numeric(rownames(fs)),fs[,j]) fcut <- f1[f1[, 1] > lowcut & f1[, 1] < highcut, ] bseoff <- bslnoff(fcut, breaks = breaks, qntl = qntl, method = method, bw = bw) if (j > 1) bseoffM <- cbind(bseoffM, bseoff[, 2]) else bseoffM <- bseoff[, 2] } dimnames(bseoffM) <- list(signif(bseoff[, 1], 6), SpecNames=colnames(fldr)) } if (!is.null(bseoffrda)) save(list = bseoffM, file = bseoffrda) bseoffM } ##EXAMPLE # rmBaseline2(fldr=seldipath(basedir="W:\\Master6\\Raw Specta",chiptype="IMAC",inten="high") # ,breaks = 2 # ,qntl = 0 # ,method = "approx" # ,bw = 0.1, # highcut=50000 # ) -----Original Message----- From: Farida Mostajabi [mailto:f0most01@louisville.edu] Sent: Monday, November 30, 2009 2:48 PM To: bioconductor at stat.math.ethz.ch Subject: [BioC] Question on PROcess package To whom it may concern, I am a student from University of Louisville, USA. I am currently doing some MALDI-TOF MS data analysis research with PROcess package. I am trying to use the batch functionality of the package to do pre processing on 286 spectra. The m/z values are not exactly the same throughout the spectra, which I think it is an assumption in PROcess package. I used the code below to do baseline correction for one spectrum at a time B.fs <- list.files(my.B.files, pattern = "\\.*csv\\.*", full.names = TRUE) nb.file <- length(B.fs) foo<-lapply(seq(nb.file), function(i) read.files(B.fs[i] )) f0<-lapply(seq(nb.file), function(i) foo[[i]][foo[[i]][,1]>0,]) basecorr<-lapply(seq(nb.file), function(i) bslnoff(f0[[i]], method = "loess", bw = 0.1)) I could not use "rmBaseline" function since the row-names of the returning matrix are the m/z values, which in my case, are not identical. Would you please give some suggestions on this issue? Best Regards, Farida