limma reading Agilent One-Color Data

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 6 hours ago

WEHI, Melbourne, Australia

Dear Peter, I have thought from time to time about adding single-channel capability to limma and read.maimages(). I haven't done it so far because I don't see this sort of data in my own work, so I have been unsure of the special needs here, and also undecided about the data class to read into. I guess it will happen when it is clear in my mind what needs to be done. As a work around, I think you have already figured that out. You could for example use read.maimages() with dummy values for R & Rb and then just ignore them. With this approach you would 1. Use read.maimages() with dummy arguments for R & Rb 2. Background correct as usual using backgroundCorrect() 3. Normalize the RG$G matrix using normalizeBetweenArrays() 4. Use log2(RG$G) as input to lmFit() etc. You would be in much the same position as someone reading in a matrix of expression values. See my recent reply to Lance Palmer. Best wishes Gordon On Mon, May 14, 2007 10:36 pm, Peter White wrote: > Dear Gordon, > > No problem cc'ing the list. I did come up with a function to read the flag > data from Agilent and weight the elements accordingly. Also, I worked out > that the gProcessedSignal in their FE data is, in the case of single color, > just the gMeanSignal - gBGUsed. They use a spatial detrending model to > calculate the element background (it is based on the negative control spots, > as with their arrays the local background is not adding anything to the > signal). > > Thanks, > > Peter > > myFlagFun <- function(x) { > #Weight only strongly positive spots 1, everything else 0 > present <- x$gIsPosAndSignif == 1 > probe <- x$ControlType == 0 > manual <- x$IsManualFlag == 0 > strong <- x$gIsWellAboveBG == 1 > y <- as.numeric(present & probe & manual & strong) > > #Weight weak spots 0.5 > > weak <- strong == FALSE > weak <- (present & probe & manual & weak) > weak <- grep(TRUE,weak) > y[weak] <- 0.5 > > #Weight flagged spots 0.5 > > sat <- x$gIsSaturated == 0 > xdr <- x$gIsLowPMTScaledUp == 0 > featureOL1 <- x$gIsFeatNonUnifOL == 0 > featureOL2 <- x$gIsFeatPopnOL == 0 > flagged <- (sat & xdr & featureOL1 & featureOL2) > flagged <- grep(FALSE, flagged) > good <- grep(TRUE, y==1) > flagged <- intersect(flagged, good) > y[flagged] <- 0.5 > y > } > > G <- read.maimages(targets, > columns = list(G = "gMeanSignal", Gb = "gBGUsed", R = > "gProcessedSignal", > Rb = "gBGMedianSignal"), > annotation= c("Row", "Col", "FeatureNum", "ProbeUID", > "ControlType", > "ProbeName", "GeneName", "SystematicName"), > wt.fun=myFlagFun) > > -----Original Message----- > From: Gordon Smyth [mailto:smyth at wehi.EDU.AU] > Sent: Saturday, May 12, 2007 5:35 AM > To: pwhite at mail.med.upenn.edu > Subject: Re: Agilent One-Color Data > > Dear Peter, > > Do you mind if I cc my reply to the Bioconductor mailing list? > > Best wishes > Gordon > > At 01:27 AM 10/05/2007, Peter White wrote: >>Dear Dr. Smyth, >> >>I've been using Limma for the past few years for >>processing our two-color array data, and have >>developed a number of scripts to automate our >>routine analysis using the Limma software. We >>have recently started to use more and more >>Agilent arrays, which initially were causing us >>some troubles (they include dark corners on >>their arrays which seems to mess up the grid >>dimensions). Although I am very dubious about >>using a two-color platform to generate one-color >>data, we have just done a single color >>experiment with 16 arrays. I have been trying to >>load the data into R but have realized that >>Limma does not appear to yet be setup for >>one-color Agilent arrays. Do you have any plans >>to add this functionality and any tips on how I should proceed? >> >>Thanks again for all your work on developing and >>maintaining - it is a critical component for our research. >> >>Best wishes, >> >>Peter >> >>P.S. Here was my work around for loading the data: >> >>G <- read.maimages(targets, columns = list(G = "gMeanSignal", >> Gb = "gBGUsed", R = "gProcessedSignal", Rb = > "gIsPosAndSignif"), >> annotation= c("Row", "Col", >> "FeatureNum", "ProbeUID", "ControlType", >> "ProbeName", "GeneName", "SystematicName")) >> >>Peter White, Ph.D. >>Technical Director >>Functional Genomics Core >>Department of Genetics >>University of Pennsylvania >>570 Clinical Research Building >>415 Curie Boulevard >>Philadelphia, PA 19104-6145

probe limma probe limma • 1.8k views

ADD COMMENT • link updated 16.9 years ago by Lev Soinov ▴ 470 • written 17.0 years ago by Gordon Smyth 50k

0

Entering edit mode

Lev Soinov ▴ 470

@lev-soinov-2119

Last seen 9.6 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070518/ 6b18bccd/attachment.pl

ADD COMMENT • link 16.9 years ago Lev Soinov ▴ 470

0

Entering edit mode

Well, one method would be to give lmFit a more informative data object in the first place. E.g., library(Biobase) y <- new("ExpressionSet",exprs=x) fit <- lmFit(y,design) instead of fit <- lmFit(x,design) Indeed, if your data are Affymetrix expression values (all your previous posts have been about Affy data), you would ordinarily end up with a data object in the first place from rma(), gcrma() or vsn(). Apparently you have Affy expression values on a file which are already background corrected and summarised but not normalised. That seems to me an unusual situation, as summarised data obtained from the internet is usually already normalised. Another method would be to compute the Amean component yourself: fit <- lmFit(x,design) fit$Amean <- rowMeans(x,na.rm=TRUE) At 07:47 PM 18/05/2007, Lev Soinov wrote: >Dear Gordon and List, > >Could you, please recommend the >simplest/quickest way to create MAplots using >the plotMA function if the data supplied to lmFit is ?matrix?? >In the last versions of limma you removed the >calculation of Amean and therefore, plotMA >doesn?t work directly on fit objects. >My Data.txt file is an expression matrix with 30 >samples, the first columns represent gene IDs (31 columns in total). >The code that I use is below. > >With kind regards, > > > s<-scan("Data.txt",what='character') >Read 387593 items > > sm<-matrix(s,byrow=TRUE,ncol=31) > > rownames(sm)<-sm[,1] > > sm<-sm[,2:ncol(sm)] > > snn<-apply(sm,2,as.numeric) > > rownames(snn)<-rownames(sm) > > > > signals<-snn A bit easier would be s <- read.delim("Data.txt",sep="") signals <- as.matrix(s[,-1]) rownames(signals) <- s[,1] Best wishes Gordon > > temp<-normalizeBetweenArrays(log2(signals), method="quantile") > > > > design <- model.matrix(~0 > +factor(c(3,2,4,5,1,4,3,6,6,1,2,1,3,5,4,2,5,6,3,4,5,6,1,2,3,4,5,6,1, 2))) > > > > colnames(design) <- c("Untreated","L1","L2","L1A","L2A","A") > > contrast.matrix <- > makeContrasts(L1-Untreated, L2-Untreated, > L1A-Untreated, L2A-Untreated, A-Untreated, levels=design) > > > > fit <- lmFit(temp, design) > > fit2 <- contrasts.fit(fit, contrast.matrix) > > fit2 <- eBayes(fit2) > > plotMA(fit2, array=1) >Error in xy.coords(x, y, xlabel, ylabel, log) : > 'x' and 'y' lengths differ >In addition: Warning messages: >1: is.na() applied to non-(list or vector) in: is.na(x) >2: no non-missing arguments to min; returning Inf >3: no non-missing arguments to max; returning ?Inf > > > > sessionInfo() >R version 2.4.1 (2006-12-18) >i386-pc-mingw32 > >locale: >LC_COLLATE=English_United >Kingdom.1252;LC_CTYPE=English_United >Kingdom.1252;LC_MONETARY=English_United >Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 > >attached base packages: >[1] >"tools" "stats" "graphics" "grDevices" >"utils" "datasets" "methods" "base" > >other attached packages: > vsn Biobase limma >"1.12.0" "1.12.2" "2.9.8"

ADD REPLY • link 16.9 years ago Gordon Smyth 50k

0

Entering edit mode

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070523/ cd8712ae/attachment.pl

ADD REPLY • link 16.9 years ago Lev Soinov ▴ 470

0

Entering edit mode

At 08:15 PM 23/05/2007, Lev Soinov wrote: >Dear Gordon, > >When using a script that I already asked about (below), I noticed >the following. > >My Data matrix is for 30 samples, the first column represents probe >IDs. I use for the unordered dataset (samples are not ordered in the >Data file according to the treatments) the following script: > > design <- model.matrix(~0 > +factor(c(3,2,4,5,1,4,3,6,6,1,2,1,3,5,4,2,5,6,3,4,5,6,1,2,3,4,5,6,1, 2))) > > colnames(design) <- c("Untreated","L1","L2","L1A","L2A","A") > > contrast.matrix <- makeContrasts(L1-Untreated, L2-Untreated, > L1A-Untreated, L2A-Untreated, A-Untreated, L1A-A, L2A-A, L1A-L1, > L2A-L2, levels=design) > >For the ordered data it should produce the same results in the >topTable as the script: > > design <- model.matrix(~0 > +factor(c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,4,4,4,4,4,5,5,5,5,5,6,6,6,6,6, 6))) > > colnames(design) <- c("Untreated","L1","L2","L1A","L2A","A") > > contrast.matrix <- makeContrasts(L1-Untreated, L2-Untreated, > L1A-Untreated, L2A-Untreated, A-Untreated, L1A-A, L2A-A, L1A-L1, > L2A-L2, levels=design) > >However there are some differences, especially in the AveExpr column: > >1. unordered: > ID logFC AveExpr t P.Value adj.P.Val B >168840 -5.302214 12.51356 -27.83821 3.590400e-09 2.717735e-05 11.23314 >101140 -5.175045 12.51114 -27.67796 3.756415e-09 2.717735e-05 11.20135 >123375 -4.265825 14.04540 -27.65702 3.778740e-09 2.717735e-05 11.19717 >194271 -4.237919 16.31592 -27.05930 4.483564e-09 2.717735e-05 11.07557 >142723 -5.529549 12.89764 -26.63683 5.071066e-09 2.717735e-05 10.98684 >2. ordered: > ID logFC AveExpr t P.Value adj.P.Val B >168840 -5.302013 10.23113 -27.83760 3.589533e-09 2.716231e-05 11.23349 >101140 -5.174833 10.22872 -27.68107 3.751573e-09 2.716231e-05 11.20243 >123375 -4.265788 11.76289 -27.65844 3.775669e-09 2.716231e-05 11.19792 >194271 -4.237924 14.03339 -27.06015 4.480631e-09 2.716231e-05 11.07621 >142723 -5.529383 10.61520 -26.63693 5.068864e-09 2.716231e-05 10.98732 > >Could you please comment on this? Is it essential to order samples >in the data file according to treatments before running lmFit? I >thought that this should not matter as long as one gets the actual >order in the design matrix right. Sample order makes no different to limma. If your results are different, then you've made a mistake. Best wishes Gordon >With kind regards, >Lev. > >My script: >s<-scan("Data.txt",what='character') >sm<-matrix(s,byrow=TRUE,ncol=31) >rownames(sm)<-sm[,1] >sm<-sm[,2:ncol(sm)] >snn<-apply(sm,2,as.numeric) >rownames(snn)<-rownames(sm) >signals<-snn > >temp<-vsn(signals) >exprs(temp)<-log2(exp(exprs(temp))) > >design <- model.matrix(~0 >+factor(c(3,2,4,5,1,4,3,6,6,1,2,1,3,5,4,2,5,6,3,4,5,6,1,2,3,4,5,6,1,2 ))) > >colnames(design) <- c("Untreated","L1","L2","L1A","L2A","A") >contrast.matrix <- makeContrasts(L1-Untreated, L2-Untreated, >L1A-Untreated, L2A-Untreated, A-Untreated, L1A-A, L2A-A, L1A-L1, >L2A-L2, levels=design) > >fit <- lmFit(temp, design) >fit2 <- contrasts.fit(fit, contrast.matrix) >fit2 <- eBayes(fit2) >topTable(fit2, adjust="BH") > > > design unordered: > Untreated L1 L2 L1A L2A A >1 0 0 1 0 0 0 >2 0 1 0 0 0 0 >3 0 0 0 1 0 0 >4 0 0 0 0 1 0 >5 1 0 0 0 0 0 >6 0 0 0 1 0 0 >7 0 0 1 0 0 0 >8 0 0 0 0 0 1 >9 0 0 0 0 0 1 >10 1 0 0 0 0 0 >11 0 1 0 0 0 0 >12 1 0 0 0 0 0 >13 0 0 1 0 0 0 >14 0 0 0 0 1 0 >15 0 0 0 1 0 0 >16 0 1 0 0 0 0 >17 0 0 0 0 1 0 >18 0 0 0 0 0 1 >19 0 0 1 0 0 0 >20 0 0 0 1 0 0 >21 0 0 0 0 1 0 >22 0 0 0 0 0 1 >23 1 0 0 0 0 0 >24 0 1 0 0 0 0 >25 0 0 1 0 0 0 >26 0 0 0 1 0 0 >27 0 0 0 0 1 0 >28 0 0 0 0 0 1 >29 1 0 0 0 0 0 >30 0 1 0 0 0 0 > > > design ordered: > Untreated L1 L2 L1A L2A A >1 1 0 0 0 0 0 >2 1 0 0 0 0 0 >3 1 0 0 0 0 0 >4 1 0 0 0 0 0 >5 1 0 0 0 0 0 >6 0 1 0 0 0 0 >7 0 1 0 0 0 0 >8 0 1 0 0 0 0 >9 0 1 0 0 0 0 >10 0 1 0 0 0 0 >11 0 0 1 0 0 0 >12 0 0 1 0 0 0 >13 0 0 1 0 0 0 >14 0 0 1 0 0 0 >15 0 0 0 1 0 0 >16 0 0 0 1 0 0 >17 0 0 0 1 0 0 >18 0 0 0 1 0 0 >19 0 0 0 1 0 0 >20 0 0 0 0 1 0 >21 0 0 0 0 1 0 >22 0 0 0 0 1 0 >23 0 0 0 0 1 0 >24 0 0 0 0 1 0 >25 0 0 0 0 0 1 >26 0 0 0 0 0 1 >27 0 0 0 0 0 1 >28 0 0 0 0 0 1 >29 0 0 0 0 0 1 >30 0 0 0 0 0 1 > > > > > > sessionInfo() > >R version 2.4.1 (2006-12-18) > >i386-pc-mingw32 > > > >locale: > >LC_COLLATE=English_United > >Kingdom.1252;LC_CTYPE=English_United > >Kingdom.1252;LC_MONETARY=English_United > >Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 > > > >attached base packages: > >[1] > >"tools" "stats" "graphics" "grDevices" > >"utils" "datasets" "methods" "base" > > > >other attached packages: > > vsn Biobase limma > >"1.12.0" "1.12.2" "2.9.8" > > > >Copy addresses and emails from any email account to Yahoo! Mail - >quick, easy and free. ><http: us.rd.yahoo.com="" mail="" uk="" taglines="" default="" trueswitch="" *http:="" u="" k.docs.yahoo.com="" trueswitch2.html="">Do >it now...

ADD REPLY • link 16.9 years ago Gordon Smyth 50k

0

Entering edit mode

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070523/ 8cf5766f/attachment.pl

ADD REPLY • link 16.9 years ago Lev Soinov ▴ 470

0

Entering edit mode

Dear Lev, > Thank you for the reply. I have checked it and it is something to do with vsn, it changes its output somehow when you change the sample order. Below is an example for the same data (6 samples, treatments 1 and 2) only in the ordered data i have them as 1,1,1,2,2,2 and in the unordered data i have them as 2,1,1,2,2,1. There are two issues: - After ML estimation of the normalization transformation, a constant overall offset is added by vsn to the resulting data in order to ensure that the result for high-intensity features is approximately the same as the log-transformation (i.e. log(x)~glog(x) for large x). This should not affect any differential expression analysis. In "vsn", this offset is calculated for the first array (and applied to all). In "vsn2" (versions 2.x of the package), the offset is called from an 'average array' (and applied to all). This should reduce confusion about order-dependence. So please consider updating your package and using vsn2. Note that since we are on log-scale, adding an overall offset is equivalent to a change of units, like measuring distances in km versus in miles. As long as it's done consistently, it has no significance. - More substantially, the result of vsn is a ML estimate obtained from a numerical optimisation algorithm, and the results of this can vary slightly depending on initial conditions and convergence thresholds. However, the variability introduced by this should be negligible compared to the biological or technical variability. This also seems to apply to your example. However, please let me know if this is not the case. Best wishes Wolfgang ------------------------------------------------------------------ Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber > I would be really grateful for any comments. > Lev. > > > 1. ordered data: > temp_ord<-vsn(signals_ord) > design_ord <- model.matrix(~0 +factor(c(1,1,1,2,2,2))) > colnames(design_ord) <- c("group1", "group2") > contrast.matrix <- makeContrasts(group2-group1, levels=design_ord) > exprs(temp_ord)<- log2(exp(exprs(temp_ord))) > fit_ord <- lmFit(temp_ord, design_ord) > fit2_ord <- contrasts.fit(fit_ord, contrast.matrix) > fit2_ord <- eBayes(fit2_ord) > topTable(fit2_ord, adjust="BH") > > ID logFC AveExpr t P.Value adj.P.Val B > 168840 -5.302013 10.23113 -27.83760 3.589533e-09 2.716231e-05 11.23349 > 101140 -5.174833 10.22872 -27.68107 3.751573e-09 2.716231e-05 11.20243 > 123375 -4.265788 11.76289 -27.65844 3.775669e-09 2.716231e-05 11.19792 > 194271 -4.237924 14.03339 -27.06015 4.480631e-09 2.716231e-05 11.07621 > > 2. unordered data: > temp<-vsn(signals) > design <- model.matrix(~0 +factor(c(2,1,1,2,2,1))) > colnames(design) <- c("group1", "group2") > contrast.matrix <- makeContrasts(group2-group1, levels=design) > exprs(temp)<- log2(exp(exprs(temp))) > fit <- lmFit(temp, design) > fit2 <- contrasts.fit(fit, contrast.matrix) > fit2 <- eBayes(fit2) > topTable(fit2, adjust="BH") > > ID logFC AveExpr t P.Value adj.P.Val B > 168840 -5.302214 12.51356 -27.83821 3.590400e-09 2.717735e-05 11.23314 > 101140 -5.175045 12.51114 -27.67796 3.756415e-09 2.717735e-05 11.20135 > 123375 -4.265825 14.04540 -27.65702 3.778740e-09 2.717735e-05 11.19717 > 194271 -4.237919 16.31592 -27.05930 4.483564e-09 2.717735e-05 11.07557 > > > signals_ord[1:2, 1:6] > 282357 191.85 542.75 644.07 544.29 > 224689 66.95 134.27 127.79 102.97 > > 282357 119.78 936.58 > 224689 72.42 357.10 > > signals[1:2, 1:6] > 282357 936.58 542.75 644.07 544.29 > 224689 357.10 134.27 127.79 102.97 > > 282357 119.78 191.85 > 224689 72.42 66.95 > > exprs(temp_ord)[1:2,1:6] > 282357 8.011335 8.393353 8.384370 8.176880 > 224689 7.058134 7.339350 7.396429 7.254554 > > 282357 7.550265 8.338583 > 224689 7.228878 7.677248 > > exprs(temp)[1:2,1:6] > 282357 10.621015 10.675780 10.666816 10.459286 > 224689 9.959617 9.621677 9.678802 9.536861 > > 282357 9.832487 10.293657 > 224689 9.511012 9.340247 > > > sessionInfo() > R version 2.4.1 (2006-12-18) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 > > attached base packages: > [1] "tools" "stats" "graphics" "grDevices" "utils" "datasets" "methods" "base" > > other attached packages: > vsn Biobase limma > "1.12.0" "1.12.2" "2.9.8" > > > > --------------------------------- > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 16.9 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070523/ 98daf976/attachment.pl

ADD REPLY • link 16.9 years ago Lev Soinov ▴ 470

0

Entering edit mode

Dear Lev, what you say is correct. In vsn2.x, it even doesn't matter which array is the first if you are interested in Amean values. Best wishes Wolfgang Lev Soinov ha scritto: > Dear Wolfgang, > > Thank you ever so much for the explanation. > So, as I understand it really doesn't matter which array is the first one if we are not interested in Amean values. This is exactly what my example shows I guess, p, B and M values as well as t statistics are almost identical for different orderings. Therefore, it should not affect differntial expression analysis. Please, correct me if I got something wrong. > > With kind regards, > Lev. >

ADD REPLY • link 16.9 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070525/ 3a8450d8/attachment.pl

ADD REPLY • link 16.9 years ago Lev Soinov ▴ 470

0

Entering edit mode

Hi Lev, Lev Soinov wrote: > Dear Biocore Data Team, > > I am working with Affy Barley chips at the moment. I have noticed that the order of probes in the .GIN file for these arrays is not the same as their order in the CDF environment, barley1.cdf, provided by Bioconductor. > So, for example, when analysing these arrays I am getting an output such as: > > 16578 EBma05_SQ003_C16_s_at > 4886 Contig1579_s_at > > However, in the barley .GIN file these probes have different order numbers. > Is it OK, or is it a sign that probes were misplaced/misnumbered in the barley1.cdf? > Is it a reason for concern? What exactly is a .GIN file? And what do you mean by order number? We use the .CDF file supplied by Affymetrix to create the cdf packages, utilizing the mapping in that file to decide what the (x,y) location of each probe is. Literally thousands of people have been using these for over five years now, and nobody has claimed they are anything but accurate, so my assumption would be that they truly are accurate. But if you have any data that might indicate something different, I would be interested to see it. Best, Jim > > Thank you in advance. > Looking forward to hearing from you, > Lev. > > > > --------------------------------- > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, MS Biostatistician UMCCC cDNA and Affymetrix Core University of Michigan 1500 E Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.

ADD REPLY • link 16.9 years ago James W. MacDonald 65k

Login before adding your answer.