Issue with limma and normalization of Agilent data generated with a 20-bit scan
1
0
Entering edit mode
Peter White ▴ 130
@peter-white-3162
Last seen 7.1 years ago
I have noticed an issue with the limma normalizeWithinArrays function (and also with marray and maNorm). When normalizing two color data generated with an Agilent 20-bt scanner it fails to normalize the high intensity data (i.e. any points with an A value > 16). In our dataset we have in excess of 400 elements with red and green intensities ranging from 65500 to 475100. When we loess normalize the data any points beyond A=16 appear to be untouched by the normalization. If the attached figures come through this should be clear - when using maNorm and maPlot it will plot the loess line and you can see it stop at 16. Is it possible for loess normalization to be extended to this higher intensity data? Or am I just doing something wrong? Thanks, Peter Peter White, Ph.D. Director, Biomedical Genomics Core<http: genomics.nchresearch.org=""/> Research Assistant Professor of Pediatrics The Research Institute at Nationwide Children's Hospital and The Ohio State University Mailing Address: The Research Institute at Nationwide Children's Hospital 700 Children's Drive, W510 Columbus, OH 43205 Assistant (Jennifer Neelans): (614) 722-2915 Office: (614) 355-2671 Lab: (614) 355-5252 Fax: (614) 722-2818 Web: http://genomics.nchresearch.org/ ________________________________ Confidentiality Notice: The following mail message, incl...{{dropped:17}}
Normalization limma marray Normalization limma marray • 1.0k views
0
Entering edit mode
@gordon-smyth
Last seen 2 hours ago
WEHI, Melbourne, Australia

Dear Peter,

You can't send attachments to the Bioconductor mailing list, so I have not seen your plots. However I am not aware of any issue such as you describe. The limma function normalizeWithinArrays includes all spots in the normalization, regardless of how large the A-value is. You haven't shown us any code, or any problem we can reproduce, so we can't tell whether or not you're doing something wrong. We don't know whether you're using probe weights, whether you've filtered control spots, etc etc.

Best wishes
Gordon

0
Entering edit mode
Dear Gordon, The plots are visible in the blog view on gmane.org: http://permalink.gmane.org/gmane.science.biology.informatics.conductor /27731 I thought you may be on to something with the weights but I tried it with and without a flag function (also double checked the Agilent file and the high intensity spots are not flagged). It really does look like the loess is just not fitted beyond for elements with an A value > 16??? These 20-bit scans from Agilent are quite new and I suspect most folks with just use the Agilent normalized data rather than starting with the raw data, so maybe this just hasn't been observed before now? Thanks, Peter Below is the code I used: library(limma) agilentFiles <- list.files(pattern="U") rawObj <- read.maimages(agilentFiles, columns = list(G = "gMedianSignal", Gb = "gBGMedianSignal", R = "rMedianSignal", Rb = "rBGMedianSignal"), annotation= c("ProbeName", "SystematicName","ControlType")) #Remove spike controls and remove background signals bgObj <- rawObj posControls <- grep(T,rawObj$genes$ControlType == 1) bgObj$G[posControls,] <- NA bgObj$R[posControls,] <- NA bgObj$Gb <- bgObj$Rb <- NULL #Loess normalize normObj <- normalizeWithinArrays(bgObj, method="loess", weights=NULL) #Plot MvA for (i in 1:ncol(normObj)) { figureName <- paste(i, " MvA Plots") mat <- matrix(c(3,1,2),nrow=3,ncol=1) layout(mat,heights=c(1,10,10)) plotMA(rawObj, array=i, main = "Pre-Normalization MvA", ylim=c(-3.5,3.5), zero.weights=TRUE) abline(0,0) plotMA(normObj, array=i, main = "Normalized MvA", ylim=c(-3.5,3.5), zero.weights=TRUE) abline(0,0) layout(1) mtext(figureName, cex=1.25, line=3) savePlot(filename=figureName, type=c("png"), device=dev.cur()) } > sessionInfo() R version 2.10.1 (2009-12-14) i386-pc-mingw32 locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] grDevices datasets splines graphics stats tcltk utils [8] methods base other attached packages: [1] limma_3.2.2 svSocket_0.9-48 TinnR_1.0.3 R2HTML_1.59-1 [5] Hmisc_3.7-0 survival_2.35-9 loaded via a namespace (and not attached): [1] cluster_1.12.1 grid_2.10.1 lattice_0.18-3 svMisc_0.9-56 tools_2.10.1 > -----Original Message----- > From: Gordon K Smyth [mailto:smyth at wehi.EDU.AU] > Sent: Saturday, March 13, 2010 6:39 PM > To: White, Peter > Cc: Bioconductor mailing list > Subject: [BioC] Issue with limma and normalization of Agilent data > generated with a 20-bit scan > > Dear Peter, > > You can't send attachments to the Bioconductor mailing list, so I have > not > seen your plots. However I am not aware of any issue such as you > describe. The limma function normalizeWithinArrays includes all spots > in > the normalization, regardless of how large the A-value is. You haven't > shown us any code, or any problem we can reproduce, so we can't tell > whether or not you're doing something wrong. We don't know whether > you're > using probe weights, whether you've filtered control spots, etc etc. > > Best wishes > Gordon > > > Date: Fri, 12 Mar 2010 10:21:41 -0500 > > From: "White, Peter" <peter.white at="" nationwidechildrens.org=""> > > To: "'bioconductor at stat.math.ethz.ch'" > > <bioconductor at="" stat.math.ethz.ch=""> > > Subject: [BioC] Issue with limma and normalization of Agilent data > > generated with a 20-bit scan > > Content-Type: text/plain > > > > I have noticed an issue with the limma normalizeWithinArrays function > > (and also with marray and maNorm). When normalizing two color data > > generated with an Agilent 20-bt scanner it fails to normalize the > high > > intensity data (i.e. any points with an A value > 16). In our dataset > we > > have in excess of 400 elements with red and green intensities ranging > > from 65500 to 475100. When we loess normalize the data any points > beyond > > A=16 appear to be untouched by the normalization. If the attached > > figures come through this should be clear - when using maNorm and > maPlot > > it will plot the loess line and you can see it stop at 16. > > > > Is it possible for loess normalization to be extended to this higher > > intensity data? Or am I just doing something wrong? > > > > Thanks, > > > > Peter > > > > > > Peter White, Ph.D. > > Director, Biomedical Genomics Core<http: genomics.nchresearch.org=""/> > > Research Assistant Professor of Pediatrics > > The Research Institute at > > Nationwide Children's Hospital and > > The Ohio State University > > > > Mailing Address: > > > > The Research Institute at > > Nationwide Children's Hospital > > 700 Children's Drive, W510 > > Columbus, OH 43205 > > > > Assistant (Jennifer Neelans): (614) 722-2915 > > Office: (614) 355-2671 > > Lab: (614) 355-5252 > > Fax: (614) 722-2818 > > Web: http://genomics.nchresearch.org/ > > ______________________________________________________________________ > The information in this email is confidential and intended solely for > the addressee. > You must not disclose, forward, print or use it without the permission > of the sender. > ______________________________________________________________________ Confidentiality Notice: The following mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. The recipient is responsible to maintain the confidentiality of this information and to use the information only for authorized purposes. If you are not the intended recipient (or authorized to receive information for the intended recipient), you are hereby notified that any review, use, disclosure, distribution, copying, printing, or action taken in reliance on the contents of this e-mail is strictly prohibited. If you have received this communication in error, please notify Nationwide Children's Hospital immediately by replying to this e-mail and destroy all copies of the original message. Thank you. -------------- next part -------------- A non-text attachment was scrubbed... Name: 4 MvA Plots.png Type: image/png Size: 13173 bytes Desc: 4 MvA Plots.png URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20100315="" f1820247="" attachment.png="">
0
Entering edit mode
Dear Peter have you tried with different (i.e. smaller) values of the "span" parameter for the loess fit? The data seem badly saturated... I'd prefer avoiding the kind of saturation such as seen in the data you posted by better settings of the scanner, rather than doing post hoc loess normalisation. Best wishes Wolfgang White, Peter scripsit 15/03/10 15:53: > Dear Gordon, > > The plots are visible in the blog view on gmane.org: > > http://permalink.gmane.org/gmane.science.biology.informatics.conduct or/27731 > > I thought you may be on to something with the weights but I tried it with and without a flag function (also double checked the Agilent file and the high intensity spots are not flagged). It really does look like the loess is just not fitted beyond for elements with an A value > 16??? These 20-bit scans from Agilent are quite new and I suspect most folks with just use the Agilent normalized data rather than starting with the raw data, so maybe this just hasn't been observed before now? > > Thanks, > > Peter > > Below is the code I used: > > library(limma) > agilentFiles <- list.files(pattern="U") > rawObj <- read.maimages(agilentFiles, > columns = list(G = "gMedianSignal", Gb = "gBGMedianSignal", > R = "rMedianSignal", Rb = "rBGMedianSignal"), > annotation= c("ProbeName", "SystematicName","ControlType")) > #Remove spike controls and remove background signals > bgObj <- rawObj > posControls <- grep(T,rawObj$genes$ControlType == 1) > bgObj$G[posControls,] <- NA > bgObj$R[posControls,] <- NA > bgObj$Gb <- bgObj$Rb <- NULL > #Loess normalize > normObj <- normalizeWithinArrays(bgObj, method="loess", weights=NULL) > #Plot MvA > for (i in 1:ncol(normObj)) { > figureName <- paste(i, " MvA Plots") > mat <- matrix(c(3,1,2),nrow=3,ncol=1) > layout(mat,heights=c(1,10,10)) > plotMA(rawObj, array=i, main = "Pre-Normalization MvA", > ylim=c(-3.5,3.5), zero.weights=TRUE) > abline(0,0) > plotMA(normObj, array=i, main = "Normalized MvA", > ylim=c(-3.5,3.5), zero.weights=TRUE) > abline(0,0) > layout(1) > mtext(figureName, cex=1.25, line=3) > savePlot(filename=figureName, type=c("png"), device=dev.cur()) > } > >> sessionInfo() > R version 2.10.1 (2009-12-14) > i386-pc-mingw32 > > locale: > [1] LC_COLLATE=English_United States.1252 > [2] LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] grDevices datasets splines graphics stats tcltk utils > [8] methods base > > other attached packages: > [1] limma_3.2.2 svSocket_0.9-48 TinnR_1.0.3 R2HTML_1.59-1 > [5] Hmisc_3.7-0 survival_2.35-9 > > loaded via a namespace (and not attached): > [1] cluster_1.12.1 grid_2.10.1 lattice_0.18-3 svMisc_0.9-56 tools_2.10.1 > >> -----Original Message----- >> From: Gordon K Smyth [mailto:smyth at wehi.EDU.AU] >> Sent: Saturday, March 13, 2010 6:39 PM >> To: White, Peter >> Cc: Bioconductor mailing list >> Subject: [BioC] Issue with limma and normalization of Agilent data >> generated with a 20-bit scan >> >> Dear Peter, >> >> You can't send attachments to the Bioconductor mailing list, so I have >> not >> seen your plots. However I am not aware of any issue such as you >> describe. The limma function normalizeWithinArrays includes all spots >> in >> the normalization, regardless of how large the A-value is. You haven't >> shown us any code, or any problem we can reproduce, so we can't tell >> whether or not you're doing something wrong. We don't know whether >> you're >> using probe weights, whether you've filtered control spots, etc etc. >> >> Best wishes >> Gordon >> >>> Date: Fri, 12 Mar 2010 10:21:41 -0500 >>> From: "White, Peter" <peter.white at="" nationwidechildrens.org=""> >>> To: "'bioconductor at stat.math.ethz.ch'" >>> <bioconductor at="" stat.math.ethz.ch=""> >>> Subject: [BioC] Issue with limma and normalization of Agilent data >>> generated with a 20-bit scan >>> Content-Type: text/plain >>> >>> I have noticed an issue with the limma normalizeWithinArrays function >>> (and also with marray and maNorm). When normalizing two color data >>> generated with an Agilent 20-bt scanner it fails to normalize the >> high >>> intensity data (i.e. any points with an A value > 16). In our dataset >> we >>> have in excess of 400 elements with red and green intensities ranging >>> from 65500 to 475100. When we loess normalize the data any points >> beyond >>> A=16 appear to be untouched by the normalization. If the attached >>> figures come through this should be clear - when using maNorm and >> maPlot >>> it will plot the loess line and you can see it stop at 16. >>> >>> Is it possible for loess normalization to be extended to this higher >>> intensity data? Or am I just doing something wrong? >>> >>> Thanks, >>> >>> Peter >>> >>> >>> Peter White, Ph.D. >>> Director, Biomedical Genomics Core<http: genomics.nchresearch.org=""/> >>> Research Assistant Professor of Pediatrics >>> The Research Institute at >>> Nationwide Children's Hospital and >>> The Ohio State University >>> >>> Mailing Address: >>> >>> The Research Institute at >>> Nationwide Children's Hospital >>> 700 Children's Drive, W510 >>> Columbus, OH 43205 >>> >>> Assistant (Jennifer Neelans): (614) 722-2915 >>> Office: (614) 355-2671 >>> Lab: (614) 355-5252 >>> Fax: (614) 722-2818 >>> Web: http://genomics.nchresearch.org/ >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for >> the addressee. >> You must not disclose, forward, print or use it without the permission >> of the sender. >> ______________________________________________________________________ > > Confidentiality Notice: The following mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. The recipient is responsible to maintain the confidentiality of this information and to use the information only for authorized purposes. If you are not the intended recipient (or authorized to receive information for the intended recipient), you are hereby notified that any review, use, disclosure, distribution, copying, printing, or action taken in reliance on the contents of this e-mail is strictly prohibited. If you have received this communication in error, please notify Nationwide Children's Hospital immediately by replying to this e-mail and destroy all copies of the original message. Thank you. > > > > > -------------------------------------------------------------------- ---- > > > -------------------------------------------------------------------- ---- > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Best wishes Wolfgang -- Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber/contact
0
Entering edit mode
Hi Wolfgang, So with the new scanner from Agilent this data is not saturated. The scanner went from 16-bit (0-65,000) to 20-bit (0-1,048,576). All of these values are well below the new saturation point, yet they are not being normalized. Thanks, Peter > -----Original Message----- > From: Wolfgang Huber [mailto:whuber at embl.de] > Sent: Monday, March 15, 2010 5:25 PM > To: White, Peter > Cc: 'Gordon K Smyth'; 'Bioconductor mailing list' > Subject: Re: [BioC] Issue with limma and normalization of Agilent data > generated with a 20-bit scan > > > Dear Peter > > have you tried with different (i.e. smaller) values of the "span" > parameter for the loess fit? > > The data seem badly saturated... I'd prefer avoiding the kind of > saturation such as seen in the data you posted by better settings of > the > scanner, rather than doing post hoc loess normalisation. > > Best wishes > Wolfgang > > > White, Peter scripsit 15/03/10 15:53: > > Dear Gordon, > > > > The plots are visible in the blog view on gmane.org: > > > > > http://permalink.gmane.org/gmane.science.biology.informatics.conduct or/ > 27731 > > > > I thought you may be on to something with the weights but I tried it > with and without a flag function (also double checked the Agilent file > and the high intensity spots are not flagged). It really does look like > the loess is just not fitted beyond for elements with an A value > > 16??? These 20-bit scans from Agilent are quite new and I suspect most > folks with just use the Agilent normalized data rather than starting > with the raw data, so maybe this just hasn't been observed before now? > > > > Thanks, > > > > Peter > > > > Below is the code I used: > > > > library(limma) > > agilentFiles <- list.files(pattern="U") > > rawObj <- read.maimages(agilentFiles, > > columns = list(G = "gMedianSignal", Gb = "gBGMedianSignal", > > R = "rMedianSignal", Rb = "rBGMedianSignal"), > > annotation= c("ProbeName", "SystematicName","ControlType")) > > #Remove spike controls and remove background signals > > bgObj <- rawObj > > posControls <- grep(T,rawObj$genes$ControlType == 1) > > bgObj$G[posControls,] <- NA > > bgObj$R[posControls,] <- NA > > bgObj$Gb <- bgObj$Rb <- NULL > > #Loess normalize > > normObj <- normalizeWithinArrays(bgObj, method="loess", > weights=NULL) > > #Plot MvA > > for (i in 1:ncol(normObj)) { > > figureName <- paste(i, " MvA Plots") > > mat <- matrix(c(3,1,2),nrow=3,ncol=1) > > layout(mat,heights=c(1,10,10)) > > plotMA(rawObj, array=i, main = "Pre-Normalization MvA", > > ylim=c(-3.5,3.5), zero.weights=TRUE) > > abline(0,0) > > plotMA(normObj, array=i, main = "Normalized MvA", > > ylim=c(-3.5,3.5), zero.weights=TRUE) > > abline(0,0) > > layout(1) > > mtext(figureName, cex=1.25, line=3) > > savePlot(filename=figureName, type=c("png"), device=dev.cur()) > > } > > > >> sessionInfo() > > R version 2.10.1 (2009-12-14) > > i386-pc-mingw32 > > > > locale: > > [1] LC_COLLATE=English_United States.1252 > > [2] LC_CTYPE=English_United States.1252 > > [3] LC_MONETARY=English_United States.1252 > > [4] LC_NUMERIC=C > > [5] LC_TIME=English_United States.1252 > > > > attached base packages: > > [1] grDevices datasets splines graphics stats tcltk utils > > [8] methods base > > > > other attached packages: > > [1] limma_3.2.2 svSocket_0.9-48 TinnR_1.0.3 R2HTML_1.59-1 > > [5] Hmisc_3.7-0 survival_2.35-9 > > > > loaded via a namespace (and not attached): > > [1] cluster_1.12.1 grid_2.10.1 lattice_0.18-3 svMisc_0.9-56 > tools_2.10.1 > > > >> -----Original Message----- > >> From: Gordon K Smyth [mailto:smyth at wehi.EDU.AU] > >> Sent: Saturday, March 13, 2010 6:39 PM > >> To: White, Peter > >> Cc: Bioconductor mailing list > >> Subject: [BioC] Issue with limma and normalization of Agilent data > >> generated with a 20-bit scan > >> > >> Dear Peter, > >> > >> You can't send attachments to the Bioconductor mailing list, so I > have > >> not > >> seen your plots. However I am not aware of any issue such as you > >> describe. The limma function normalizeWithinArrays includes all > spots > >> in > >> the normalization, regardless of how large the A-value is. You > haven't > >> shown us any code, or any problem we can reproduce, so we can't tell > >> whether or not you're doing something wrong. We don't know whether > >> you're > >> using probe weights, whether you've filtered control spots, etc etc. > >> > >> Best wishes > >> Gordon > >> > >>> Date: Fri, 12 Mar 2010 10:21:41 -0500 > >>> From: "White, Peter" <peter.white at="" nationwidechildrens.org=""> > >>> To: "'bioconductor at stat.math.ethz.ch'" > >>> <bioconductor at="" stat.math.ethz.ch=""> > >>> Subject: [BioC] Issue with limma and normalization of Agilent data > >>> generated with a 20-bit scan > >>> Content-Type: text/plain > >>> > >>> I have noticed an issue with the limma normalizeWithinArrays > function > >>> (and also with marray and maNorm). When normalizing two color data > >>> generated with an Agilent 20-bt scanner it fails to normalize the > >> high > >>> intensity data (i.e. any points with an A value > 16). In our > dataset > >> we > >>> have in excess of 400 elements with red and green intensities > ranging > >>> from 65500 to 475100. When we loess normalize the data any points > >> beyond > >>> A=16 appear to be untouched by the normalization. If the attached > >>> figures come through this should be clear - when using maNorm and > >> maPlot > >>> it will plot the loess line and you can see it stop at 16. > >>> > >>> Is it possible for loess normalization to be extended to this > higher > >>> intensity data? Or am I just doing something wrong? > >>> > >>> Thanks, > >>> > >>> Peter > >>> > >>> > >>> Peter White, Ph.D. > >>> Director, Biomedical Genomics > Core<http: genomics.nchresearch.org=""/> > >>> Research Assistant Professor of Pediatrics > >>> The Research Institute at > >>> Nationwide Children's Hospital and > >>> The Ohio State University > >>> > >>> Mailing Address: > >>> > >>> The Research Institute at > >>> Nationwide Children's Hospital > >>> 700 Children's Drive, W510 > >>> Columbus, OH 43205 > >>> > >>> Assistant (Jennifer Neelans): (614) 722-2915 > >>> Office: (614) 355-2671 > >>> Lab: (614) 355-5252 > >>> Fax: (614) 722-2818 > >>> Web: http://genomics.nchresearch.org/ > >> > ______________________________________________________________________ > >> The information in this email is confidential and intended solely > for > >> the addressee. > >> You must not disclose, forward, print or use it without the > permission > >> of the sender. > >> > ______________________________________________________________________ > > > > Confidentiality Notice: The following mail message, including any > attachments, is for the sole use of the intended recipient(s) and may > contain confidential and privileged information. The recipient is > responsible to maintain the confidentiality of this information and to > use the information only for authorized purposes. If you are not the > intended recipient (or authorized to receive information for the > intended recipient), you are hereby notified that any review, use, > disclosure, distribution, copying, printing, or action taken in > reliance on the contents of this e-mail is strictly prohibited. If you > have received this communication in error, please notify Nationwide > Children's Hospital immediately by replying to this e-mail and destroy > all copies of the original message. Thank you. > > > > > > > > > > --------------------------------------------------------------------- > --- > > > > > > --------------------------------------------------------------------- > --- > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > > Best wishes > Wolfgang > > > -- > Wolfgang Huber > EMBL > http://www.embl.de/research/units/genome_biology/huber/contact >
0
Entering edit mode
Dear Peter what is the "saturation point"? Non-linear response / saturation may occur even well below the nominal maximal value (2^20-1) of the detector, and perhaps this need not even be related to the detector, but rather to other steps in the process. How else do you explain the shape of the data before normalisation? (Try also looking at the data in the normal scatterplot.) Best wishes Wolfgang Il giorno Mar 15, 2010, alle ore 10:47 PM, White, Peter ha scritto: Hi Wolfgang, So with the new scanner from Agilent this data is not saturated. The scanner went from 16-bit (0-65,000) to 20-bit (0-1,048,576). All of these values are well below the new saturation point, yet they are not being normalized. Thanks, Peter > -----Original Message----- > From: Wolfgang Huber [mailto:whuber at embl.de] > Sent: Monday, March 15, 2010 5:25 PM > To: White, Peter > Cc: 'Gordon K Smyth'; 'Bioconductor mailing list' > Subject: Re: [BioC] Issue with limma and normalization of Agilent data > generated with a 20-bit scan > > > Dear Peter > > have you tried with different (i.e. smaller) values of the "span" > parameter for the loess fit? > > The data seem badly saturated... I'd prefer avoiding the kind of > saturation such as seen in the data you posted by better settings of > the > scanner, rather than doing post hoc loess normalisation. > > Best wishes > Wolfgang > > > White, Peter scripsit 15/03/10 15:53: >> Dear Gordon, >> >> The plots are visible in the blog view on gmane.org: >> >> > http://permalink.gmane.org/gmane.science.biology.informatics.conduct or/ > 27731 >> >> I thought you may be on to something with the weights but I tried it > with and without a flag function (also double checked the Agilent file > and the high intensity spots are not flagged). It really does look like > the loess is just not fitted beyond for elements with an A value > > 16??? These 20-bit scans from Agilent are quite new and I suspect most > folks with just use the Agilent normalized data rather than starting > with the raw data, so maybe this just hasn't been observed before now? >> >> Thanks, >> >> Peter >> >> Below is the code I used: >> >> library(limma) >> agilentFiles <- list.files(pattern="U") >> rawObj <- read.maimages(agilentFiles, >> columns = list(G = "gMedianSignal", Gb = "gBGMedianSignal", >> R = "rMedianSignal", Rb = "rBGMedianSignal"), >> annotation= c("ProbeName", "SystematicName","ControlType")) >> #Remove spike controls and remove background signals >> bgObj <- rawObj >> posControls <- grep(T,rawObj$genes$ControlType == 1) >> bgObj$G[posControls,] <- NA >> bgObj$R[posControls,] <- NA >> bgObj$Gb <- bgObj$Rb <- NULL >> #Loess normalize >> normObj <- normalizeWithinArrays(bgObj, method="loess", > weights=NULL) >> #Plot MvA >> for (i in 1:ncol(normObj)) { >> figureName <- paste(i, " MvA Plots") >> mat <- matrix(c(3,1,2),nrow=3,ncol=1) >> layout(mat,heights=c(1,10,10)) >> plotMA(rawObj, array=i, main = "Pre-Normalization MvA", >> ylim=c(-3.5,3.5), zero.weights=TRUE) >> abline(0,0) >> plotMA(normObj, array=i, main = "Normalized MvA", >> ylim=c(-3.5,3.5), zero.weights=TRUE) >> abline(0,0) >> layout(1) >> mtext(figureName, cex=1.25, line=3) >> savePlot(filename=figureName, type=c("png"), device=dev.cur()) >> } >> >>> sessionInfo() >> R version 2.10.1 (2009-12-14) >> i386-pc-mingw32 >> >> locale: >> [1] LC_COLLATE=English_United States.1252 >> [2] LC_CTYPE=English_United States.1252 >> [3] LC_MONETARY=English_United States.1252 >> [4] LC_NUMERIC=C >> [5] LC_TIME=English_United States.1252 >> >> attached base packages: >> [1] grDevices datasets splines graphics stats tcltk utils >> [8] methods base >> >> other attached packages: >> [1] limma_3.2.2 svSocket_0.9-48 TinnR_1.0.3 R2HTML_1.59-1 >> [5] Hmisc_3.7-0 survival_2.35-9 >> >> loaded via a namespace (and not attached): >> [1] cluster_1.12.1 grid_2.10.1 lattice_0.18-3 svMisc_0.9-56 > tools_2.10.1 >> >>> -----Original Message----- >>> From: Gordon K Smyth [mailto:smyth at wehi.EDU.AU] >>> Sent: Saturday, March 13, 2010 6:39 PM >>> To: White, Peter >>> Cc: Bioconductor mailing list >>> Subject: [BioC] Issue with limma and normalization of Agilent data >>> generated with a 20-bit scan >>> >>> Dear Peter, >>> >>> You can't send attachments to the Bioconductor mailing list, so I > have >>> not >>> seen your plots. However I am not aware of any issue such as you >>> describe. The limma function normalizeWithinArrays includes all > spots >>> in >>> the normalization, regardless of how large the A-value is. You > haven't >>> shown us any code, or any problem we can reproduce, so we can't tell >>> whether or not you're doing something wrong. We don't know whether >>> you're >>> using probe weights, whether you've filtered control spots, etc etc. >>> >>> Best wishes >>> Gordon >>> >>>> Date: Fri, 12 Mar 2010 10:21:41 -0500 >>>> From: "White, Peter" <peter.white at="" nationwidechildrens.org=""> >>>> To: "'bioconductor at stat.math.ethz.ch'" >>>> <bioconductor at="" stat.math.ethz.ch=""> >>>> Subject: [BioC] Issue with limma and normalization of Agilent data >>>> generated with a 20-bit scan >>>> Content-Type: text/plain >>>> >>>> I have noticed an issue with the limma normalizeWithinArrays > function >>>> (and also with marray and maNorm). When normalizing two color data >>>> generated with an Agilent 20-bt scanner it fails to normalize the >>> high >>>> intensity data (i.e. any points with an A value > 16). In our > dataset >>> we >>>> have in excess of 400 elements with red and green intensities > ranging >>>> from 65500 to 475100. When we loess normalize the data any points >>> beyond >>>> A=16 appear to be untouched by the normalization. If the attached >>>> figures come through this should be clear - when using maNorm and >>> maPlot >>>> it will plot the loess line and you can see it stop at 16. >>>> >>>> Is it possible for loess normalization to be extended to this > higher >>>> intensity data? Or am I just doing something wrong? >>>> >>>> Thanks, >>>> >>>> Peter >>>> >>>> >>>> Peter White, Ph.D. >>>> Director, Biomedical Genomics > Core<http: genomics.nchresearch.org=""/> >>>> Research Assistant Professor of Pediatrics >>>> The Research Institute at >>>> Nationwide Children's Hospital and >>>> The Ohio State University >>>> >>>> Mailing Address: >>>> >>>> The Research Institute at >>>> Nationwide Children's Hospital >>>> 700 Children's Drive, W510 >>>> Columbus, OH 43205 >>>> >>>> Assistant (Jennifer Neelans): (614) 722-2915 >>>> Office: (614) 355-2671 >>>> Lab: (614) 355-5252 >>>> Fax: (614) 722-2818 >>>> Web: http://genomics.nchresearch.org/ >>> > ______________________________________________________________________ >>> The information in this email is confidential and intended solely > for >>> the addressee. >>> You must not disclose, forward, print or use it without the > permission >>> of the sender. >>> > ______________________________________________________________________ >> >> Confidentiality Notice: The following mail message, including any > attachments, is for the sole use of the intended recipient(s) and may > contain confidential and privileged information. The recipient is > responsible to maintain the confidentiality of this information and to > use the information only for authorized purposes. If you are not the > intended recipient (or authorized to receive information for the > intended recipient), you are hereby notified that any review, use, > disclosure, distribution, copying, printing, or action taken in > reliance on the contents of this e-mail is strictly prohibited. If you > have received this communication in error, please notify Nationwide > Children's Hospital immediately by replying to this e-mail and destroy > all copies of the original message. Thank you. >> >> >> >> >> --------------------------------------------------------------------- > --- >> >> >> --------------------------------------------------------------------- > --- >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > > Best wishes > Wolfgang > > > -- > Wolfgang Huber > EMBL > http://www.embl.de/research/units/genome_biology/huber/contact >
0
Entering edit mode
I think what Wolfgang is saying is that the data is so affected by technical bias at the tail that even if you could get loess normalisation to get that tail straight, you might not want to believe anything that comes from there as the data is unreliable. I have no idea why the ready built functions don't touch your tail, but you loess normalisation isn't *that* much of a complicated procedure - you should be able to fit a model using the loess() function and do the normalisation yourself. ________________________________________ From: bioconductor-bounces@stat.math.ethz.ch [bioconductor- bounces@stat.math.ethz.ch] On Behalf Of Wolfgang Huber [whuber@embl.de] Sent: 15 March 2010 22:03 To: White, Peter Cc: 'Gordon K Smyth'; 'Bioconductor mailing list' Subject: Re: [BioC] Issue with limma and normalization of Agilent data generated with a 20-bit scan Dear Peter what is the "saturation point"? Non-linear response / saturation may occur even well below the nominal maximal value (2^20-1) of the detector, and perhaps this need not even be related to the detector, but rather to other steps in the process. How else do you explain the shape of the data before normalisation? (Try also looking at the data in the normal scatterplot.) Best wishes Wolfgang Il giorno Mar 15, 2010, alle ore 10:47 PM, White, Peter ha scritto: Hi Wolfgang, So with the new scanner from Agilent this data is not saturated. The scanner went from 16-bit (0-65,000) to 20-bit (0-1,048,576). All of these values are well below the new saturation point, yet they are not being normalized. Thanks, Peter > -----Original Message----- > From: Wolfgang Huber [mailto:whuber at embl.de] > Sent: Monday, March 15, 2010 5:25 PM > To: White, Peter > Cc: 'Gordon K Smyth'; 'Bioconductor mailing list' > Subject: Re: [BioC] Issue with limma and normalization of Agilent data > generated with a 20-bit scan > > > Dear Peter > > have you tried with different (i.e. smaller) values of the "span" > parameter for the loess fit? > > The data seem badly saturated... I'd prefer avoiding the kind of > saturation such as seen in the data you posted by better settings of > the > scanner, rather than doing post hoc loess normalisation. > > Best wishes > Wolfgang > > > White, Peter scripsit 15/03/10 15:53: >> Dear Gordon, >> >> The plots are visible in the blog view on gmane.org: >> >> > http://permalink.gmane.org/gmane.science.biology.informatics.conduct or/ > 27731 >> >> I thought you may be on to something with the weights but I tried it > with and without a flag function (also double checked the Agilent file > and the high intensity spots are not flagged). It really does look like > the loess is just not fitted beyond for elements with an A value > > 16??? These 20-bit scans from Agilent are quite new and I suspect most > folks with just use the Agilent normalized data rather than starting > with the raw data, so maybe this just hasn't been observed before now? >> >> Thanks, >> >> Peter >> >> Below is the code I used: >> >> library(limma) >> agilentFiles <- list.files(pattern="U") >> rawObj <- read.maimages(agilentFiles, >> columns = list(G = "gMedianSignal", Gb = "gBGMedianSignal", >> R = "rMedianSignal", Rb = "rBGMedianSignal"), >> annotation= c("ProbeName", "SystematicName","ControlType")) >> #Remove spike controls and remove background signals >> bgObj <- rawObj >> posControls <- grep(T,rawObj$genes$ControlType == 1) >> bgObj$G[posControls,] <- NA >> bgObj$R[posControls,] <- NA >> bgObj$Gb <- bgObj$Rb <- NULL >> #Loess normalize >> normObj <- normalizeWithinArrays(bgObj, method="loess", > weights=NULL) >> #Plot MvA >> for (i in 1:ncol(normObj)) { >> figureName <- paste(i, " MvA Plots") >> mat <- matrix(c(3,1,2),nrow=3,ncol=1) >> layout(mat,heights=c(1,10,10)) >> plotMA(rawObj, array=i, main = "Pre-Normalization MvA", >> ylim=c(-3.5,3.5), zero.weights=TRUE) >> abline(0,0) >> plotMA(normObj, array=i, main = "Normalized MvA", >> ylim=c(-3.5,3.5), zero.weights=TRUE) >> abline(0,0) >> layout(1) >> mtext(figureName, cex=1.25, line=3) >> savePlot(filename=figureName, type=c("png"), device=dev.cur()) >> } >> >>> sessionInfo() >> R version 2.10.1 (2009-12-14) >> i386-pc-mingw32 >> >> locale: >> [1] LC_COLLATE=English_United States.1252 >> [2] LC_CTYPE=English_United States.1252 >> [3] LC_MONETARY=English_United States.1252 >> [4] LC_NUMERIC=C >> [5] LC_TIME=English_United States.1252 >> >> attached base packages: >> [1] grDevices datasets splines graphics stats tcltk utils >> [8] methods base >> >> other attached packages: >> [1] limma_3.2.2 svSocket_0.9-48 TinnR_1.0.3 R2HTML_1.59-1 >> [5] Hmisc_3.7-0 survival_2.35-9 >> >> loaded via a namespace (and not attached): >> [1] cluster_1.12.1 grid_2.10.1 lattice_0.18-3 svMisc_0.9-56 > tools_2.10.1 >> >>> -----Original Message----- >>> From: Gordon K Smyth [mailto:smyth at wehi.EDU.AU] >>> Sent: Saturday, March 13, 2010 6:39 PM >>> To: White, Peter >>> Cc: Bioconductor mailing list >>> Subject: [BioC] Issue with limma and normalization of Agilent data >>> generated with a 20-bit scan >>> >>> Dear Peter, >>> >>> You can't send attachments to the Bioconductor mailing list, so I > have >>> not >>> seen your plots. However I am not aware of any issue such as you >>> describe. The limma function normalizeWithinArrays includes all > spots >>> in >>> the normalization, regardless of how large the A-value is. You > haven't >>> shown us any code, or any problem we can reproduce, so we can't tell >>> whether or not you're doing something wrong. We don't know whether >>> you're >>> using probe weights, whether you've filtered control spots, etc etc. >>> >>> Best wishes >>> Gordon >>> >>>> Date: Fri, 12 Mar 2010 10:21:41 -0500 >>>> From: "White, Peter" <peter.white at="" nationwidechildrens.org=""> >>>> To: "'bioconductor at stat.math.ethz.ch'" >>>> <bioconductor at="" stat.math.ethz.ch=""> >>>> Subject: [BioC] Issue with limma and normalization of Agilent data >>>> generated with a 20-bit scan >>>> Content-Type: text/plain >>>> >>>> I have noticed an issue with the limma normalizeWithinArrays > function >>>> (and also with marray and maNorm). When normalizing two color data >>>> generated with an Agilent 20-bt scanner it fails to normalize the >>> high >>>> intensity data (i.e. any points with an A value > 16). In our > dataset >>> we >>>> have in excess of 400 elements with red and green intensities > ranging >>>> from 65500 to 475100. When we loess normalize the data any points >>> beyond >>>> A=16 appear to be untouched by the normalization. If the attached >>>> figures come through this should be clear - when using maNorm and >>> maPlot >>>> it will plot the loess line and you can see it stop at 16. >>>> >>>> Is it possible for loess normalization to be extended to this > higher >>>> intensity data? Or am I just doing something wrong? >>>> >>>> Thanks, >>>> >>>> Peter >>>> >>>> >>>> Peter White, Ph.D. >>>> Director, Biomedical Genomics > Core<http: genomics.nchresearch.org=""/> >>>> Research Assistant Professor of Pediatrics >>>> The Research Institute at >>>> Nationwide Children's Hospital and >>>> The Ohio State University >>>> >>>> Mailing Address: >>>> >>>> The Research Institute at >>>> Nationwide Children's Hospital >>>> 700 Children's Drive, W510 >>>> Columbus, OH 43205 >>>> >>>> Assistant (Jennifer Neelans): (614) 722-2915 >>>> Office: (614) 355-2671 >>>> Lab: (614) 355-5252 >>>> Fax: (614) 722-2818 >>>> Web: http://genomics.nchresearch.org/ >>> > ______________________________________________________________________ >>> The information in this email is confidential and intended solely > for >>> the addressee. >>> You must not disclose, forward, print or use it without the > permission >>> of the sender. >>> > ______________________________________________________________________ >> >> Confidentiality Notice: The following mail message, including any > attachments, is for the sole use of the intended recipient(s) and may > contain confidential and privileged information. The recipient is > responsible to maintain the confidentiality of this information and to > use the information only for authorized purposes. If you are not the > intended recipient (or authorized to receive information for the > intended recipient), you are hereby notified that any review, use, > disclosure, distribution, copying, printing, or action taken in > reliance on the contents of this e-mail is strictly prohibited. If you > have received this communication in error, please notify Nationwide > Children's Hospital immediately by replying to this e-mail and destroy > all copies of the original message. Thank you. >> >> >> >> >> --------------------------------------------------------------------- > --- >> >> >> --------------------------------------------------------------------- > --- >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > > Best wishes > Wolfgang > > > -- > Wolfgang Huber > EMBL > http://www.embl.de/research/units/genome_biology/huber/contact > _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
0
Entering edit mode
Yes, I can see what Wolfgang is saying. It definitely is weird that there is such a striking curve in the raw data. I'm not sure how much really is known about the Agilent 20-bit scanning. They claim: "Agilent improved XDR scanning by increasing the dynamic range of a single scan from 16-bit to 20-bit, thereby realizing a similar dynamic range as with XDR scanning using only a single scan. This is not only faster, it represents a 12-fold increase in dynamic range." Maybe in theory is does represent a 12-fold increase, but in reality the chemistry is saturating and as you say Michael becomes unreliable. Not sure how many folks out there may have data on this - we were using the new Agilent Low-Amp Kit for labeling? Anyway, attached are three plots comparing using different span values with the default and a figure showing the Agilent Processed signal (this is the data normalized by Agilent's Feature Extraction software). Thanks so much for your comments. Peter > -----Original Message----- > From: michael watson (IAH-C) [mailto:michael.watson at bbsrc.ac.uk] > Sent: Monday, March 15, 2010 6:30 PM > To: Wolfgang Huber; White, Peter > Cc: 'Gordon K Smyth'; 'Bioconductor mailing list' > Subject: RE: [BioC] Issue with limma and normalization of Agilent data > generated with a 20-bit scan > > I think what Wolfgang is saying is that the data is so affected by > technical bias at the tail that even if you could get loess > normalisation to get that tail straight, you might not want to believe > anything that comes from there as the data is unreliable. > > > I have no idea why the ready built functions don't touch your tail, but > you loess normalisation isn't *that* much of a complicated procedure - > you should be able to fit a model using the loess() function and do the > normalisation yourself. > ________________________________________ > From: bioconductor-bounces at stat.math.ethz.ch [bioconductor- > bounces at stat.math.ethz.ch] On Behalf Of Wolfgang Huber [whuber at embl.de] > Sent: 15 March 2010 22:03 > To: White, Peter > Cc: 'Gordon K Smyth'; 'Bioconductor mailing list' > Subject: Re: [BioC] Issue with limma and normalization of Agilent data > generated with a 20-bit scan > > Dear Peter > > what is the "saturation point"? > > Non-linear response / saturation may occur even well below the nominal > maximal value (2^20-1) of the detector, and perhaps this need not even > be related to the detector, but rather to other steps in the process. > How else do you explain the shape of the data before normalisation? > (Try also looking at the data in the normal scatterplot.) > > Best wishes > Wolfgang > > > Il giorno Mar 15, 2010, alle ore 10:47 PM, White, Peter ha scritto: > > Hi Wolfgang, > > So with the new scanner from Agilent this data is not saturated. The > scanner went from 16-bit (0-65,000) to 20-bit (0-1,048,576). All of > these values are well below the new saturation point, yet they are not > being normalized. > > Thanks, > > Peter > > > -----Original Message----- > > From: Wolfgang Huber [mailto:whuber at embl.de] > > Sent: Monday, March 15, 2010 5:25 PM > > To: White, Peter > > Cc: 'Gordon K Smyth'; 'Bioconductor mailing list' > > Subject: Re: [BioC] Issue with limma and normalization of Agilent > data > > generated with a 20-bit scan > > > > > > Dear Peter > > > > have you tried with different (i.e. smaller) values of the "span" > > parameter for the loess fit? > > > > The data seem badly saturated... I'd prefer avoiding the kind of > > saturation such as seen in the data you posted by better settings of > > the > > scanner, rather than doing post hoc loess normalisation. > > > > Best wishes > > Wolfgang > > > > > > White, Peter scripsit 15/03/10 15:53: > >> Dear Gordon, > >> > >> The plots are visible in the blog view on gmane.org: > >> > >> > > > http://permalink.gmane.org/gmane.science.biology.informatics.conduct or/ > > 27731 > >> > >> I thought you may be on to something with the weights but I tried it > > with and without a flag function (also double checked the Agilent > file > > and the high intensity spots are not flagged). It really does look > like > > the loess is just not fitted beyond for elements with an A value > > > 16??? These 20-bit scans from Agilent are quite new and I suspect > most > > folks with just use the Agilent normalized data rather than starting > > with the raw data, so maybe this just hasn't been observed before > now? > >> > >> Thanks, > >> > >> Peter > >> > >> Below is the code I used: > >> > >> library(limma) > >> agilentFiles <- list.files(pattern="U") > >> rawObj <- read.maimages(agilentFiles, > >> columns = list(G = "gMedianSignal", Gb = "gBGMedianSignal", > >> R = "rMedianSignal", Rb = "rBGMedianSignal"), > >> annotation= c("ProbeName", "SystematicName","ControlType")) > >> #Remove spike controls and remove background signals > >> bgObj <- rawObj > >> posControls <- grep(T,rawObj$genes$ControlType == 1) > >> bgObj$G[posControls,] <- NA > >> bgObj$R[posControls,] <- NA > >> bgObj$Gb <- bgObj$Rb <- NULL > >> #Loess normalize > >> normObj <- normalizeWithinArrays(bgObj, method="loess", > > weights=NULL) > >> #Plot MvA > >> for (i in 1:ncol(normObj)) { > >> figureName <- paste(i, " MvA Plots") > >> mat <- matrix(c(3,1,2),nrow=3,ncol=1) > >> layout(mat,heights=c(1,10,10)) > >> plotMA(rawObj, array=i, main = "Pre-Normalization MvA", > >> ylim=c(-3.5,3.5), zero.weights=TRUE) > >> abline(0,0) > >> plotMA(normObj, array=i, main = "Normalized MvA", > >> ylim=c(-3.5,3.5), zero.weights=TRUE) > >> abline(0,0) > >> layout(1) > >> mtext(figureName, cex=1.25, line=3) > >> savePlot(filename=figureName, type=c("png"), device=dev.cur()) > >> } > >> > >>> sessionInfo() > >> R version 2.10.1 (2009-12-14) > >> i386-pc-mingw32 > >> > >> locale: > >> [1] LC_COLLATE=English_United States.1252 > >> [2] LC_CTYPE=English_United States.1252 > >> [3] LC_MONETARY=English_United States.1252 > >> [4] LC_NUMERIC=C > >> [5] LC_TIME=English_United States.1252 > >> > >> attached base packages: > >> [1] grDevices datasets splines graphics stats tcltk > utils > >> [8] methods base > >> > >> other attached packages: > >> [1] limma_3.2.2 svSocket_0.9-48 TinnR_1.0.3 R2HTML_1.59-1 > >> [5] Hmisc_3.7-0 survival_2.35-9 > >> > >> loaded via a namespace (and not attached): > >> [1] cluster_1.12.1 grid_2.10.1 lattice_0.18-3 svMisc_0.9-56 > > tools_2.10.1 > >> > >>> -----Original Message----- > >>> From: Gordon K Smyth [mailto:smyth at wehi.EDU.AU] > >>> Sent: Saturday, March 13, 2010 6:39 PM > >>> To: White, Peter > >>> Cc: Bioconductor mailing list > >>> Subject: [BioC] Issue with limma and normalization of Agilent data > >>> generated with a 20-bit scan > >>> > >>> Dear Peter, > >>> > >>> You can't send attachments to the Bioconductor mailing list, so I > > have > >>> not > >>> seen your plots. However I am not aware of any issue such as you > >>> describe. The limma function normalizeWithinArrays includes all > > spots > >>> in > >>> the normalization, regardless of how large the A-value is. You > > haven't > >>> shown us any code, or any problem we can reproduce, so we can't > tell > >>> whether or not you're doing something wrong. We don't know whether > >>> you're > >>> using probe weights, whether you've filtered control spots, etc > etc. > >>> > >>> Best wishes > >>> Gordon > >>> > >>>> Date: Fri, 12 Mar 2010 10:21:41 -0500 > >>>> From: "White, Peter" <peter.white at="" nationwidechildrens.org=""> > >>>> To: "'bioconductor at stat.math.ethz.ch'" > >>>> <bioconductor at="" stat.math.ethz.ch=""> > >>>> Subject: [BioC] Issue with limma and normalization of Agilent data > >>>> generated with a 20-bit scan > >>>> Content-Type: text/plain > >>>> > >>>> I have noticed an issue with the limma normalizeWithinArrays > > function > >>>> (and also with marray and maNorm). When normalizing two color data > >>>> generated with an Agilent 20-bt scanner it fails to normalize the > >>> high > >>>> intensity data (i.e. any points with an A value > 16). In our > > dataset > >>> we > >>>> have in excess of 400 elements with red and green intensities > > ranging > >>>> from 65500 to 475100. When we loess normalize the data any points > >>> beyond > >>>> A=16 appear to be untouched by the normalization. If the attached > >>>> figures come through this should be clear - when using maNorm and > >>> maPlot > >>>> it will plot the loess line and you can see it stop at 16. > >>>> > >>>> Is it possible for loess normalization to be extended to this > > higher > >>>> intensity data? Or am I just doing something wrong? > >>>> > >>>> Thanks, > >>>> > >>>> Peter > >>>> > >>>> > >>>> Peter White, Ph.D. > >>>> Director, Biomedical Genomics > > Core<http: genomics.nchresearch.org=""/> > >>>> Research Assistant Professor of Pediatrics > >>>> The Research Institute at > >>>> Nationwide Children's Hospital and > >>>> The Ohio State University > >>>> > >>>> Mailing Address: > >>>> > >>>> The Research Institute at > >>>> Nationwide Children's Hospital > >>>> 700 Children's Drive, W510 > >>>> Columbus, OH 43205 > >>>> > >>>> Assistant (Jennifer Neelans): (614) 722-2915 > >>>> Office: (614) 355-2671 > >>>> Lab: (614) 355-5252 > >>>> Fax: (614) 722-2818 > >>>> Web: http://genomics.nchresearch.org/ > >>> > > > ______________________________________________________________________ > >>> The information in this email is confidential and intended solely > > for > >>> the addressee. > >>> You must not disclose, forward, print or use it without the > > permission > >>> of the sender. > >>> > > > ______________________________________________________________________ > >> > >> Confidentiality Notice: The following mail message, including any > > attachments, is for the sole use of the intended recipient(s) and may > > contain confidential and privileged information. The recipient is > > responsible to maintain the confidentiality of this information and > to > > use the information only for authorized purposes. If you are not the > > intended recipient (or authorized to receive information for the > > intended recipient), you are hereby notified that any review, use, > > disclosure, distribution, copying, printing, or action taken in > > reliance on the contents of this e-mail is strictly prohibited. If > you > > have received this communication in error, please notify Nationwide > > Children's Hospital immediately by replying to this e-mail and > destroy > > all copies of the original message. Thank you. > >> > >> > >> > >> > >> -------------------------------------------------------------------- > - > > --- > >> > >> > >> -------------------------------------------------------------------- > - > > --- > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor at stat.math.ethz.ch > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > -- > > > > Best wishes > > Wolfgang > > > > > > -- > > Wolfgang Huber > > EMBL > > http://www.embl.de/research/units/genome_biology/huber/contact > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -------------- next part -------------- A non-text attachment was scrubbed... Name: Agilent vs Limma Processed MvA Plots.png Type: image/png Size: 11083 bytes Desc: Agilent vs Limma Processed MvA Plots.png URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20100315="" 9dc7383b="" attachment.png=""> -------------- next part -------------- A non-text attachment was scrubbed... Name: Agilent Default vs Span 0.1 Plots.png Type: image/png Size: 9663 bytes Desc: Agilent Default vs Span 0.1 Plots.png URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20100315="" 9dc7383b="" attachment-0001.png="">
0
Entering edit mode
One more PNG showing the difference between using span values of 0.1 or 0.01. > -----Original Message---- > From: michael watson (IAH-C) [mailto:michael.watson at bbsrc.ac.uk] > Sent: Monday, March 15, 2010 6:30 PM > To: Wolfgang Huber; White, Peter > Cc: 'Gordon K Smyth'; 'Bioconductor mailing list' > Subject: RE: [BioC] Issue with limma and normalization of Agilent data > generated with a 20-bit scan > > I think what Wolfgang is saying is that the data is so affected by > technical bias at the tail that even if you could get loess > normalisation to get that tail straight, you might not want to believe > anything that comes from there as the data is unreliable. > > > I have no idea why the ready built functions don't touch your tail, but > you loess normalisation isn't *that* much of a complicated procedure - > you should be able to fit a model using the loess() function and do the > normalisation yourself. > ________________________________________ > From: bioconductor-bounces at stat.math.ethz.ch [bioconductor- > bounces at stat.math.ethz.ch] On Behalf Of Wolfgang Huber [whuber at embl.de] > Sent: 15 March 2010 22:03 > To: White, Peter > Cc: 'Gordon K Smyth'; 'Bioconductor mailing list' > Subject: Re: [BioC] Issue with limma and normalization of Agilent data > generated with a 20-bit scan > > Dear Peter > > what is the "saturation point"? > > Non-linear response / saturation may occur even well below the nominal > maximal value (2^20-1) of the detector, and perhaps this need not even > be related to the detector, but rather to other steps in the process. > How else do you explain the shape of the data before normalisation? > (Try also looking at the data in the normal scatterplot.) > > Best wishes > Wolfgang > > > Il giorno Mar 15, 2010, alle ore 10:47 PM, White, Peter ha scritto: > > Hi Wolfgang, > > So with the new scanner from Agilent this data is not saturated. The > scanner went from 16-bit (0-65,000) to 20-bit (0-1,048,576). All of > these values are well below the new saturation point, yet they are not > being normalized. > > Thanks, > > Peter > > > -----Original Message----- > > From: Wolfgang Huber [mailto:whuber at embl.de] > > Sent: Monday, March 15, 2010 5:25 PM > > To: White, Peter > > Cc: 'Gordon K Smyth'; 'Bioconductor mailing list' > > Subject: Re: [BioC] Issue with limma and normalization of Agilent > data > > generated with a 20-bit scan > > > > > > Dear Peter > > > > have you tried with different (i.e. smaller) values of the "span" > > parameter for the loess fit? > > > > The data seem badly saturated... I'd prefer avoiding the kind of > > saturation such as seen in the data you posted by better settings of > > the > > scanner, rather than doing post hoc loess normalisation. > > > > Best wishes > > Wolfgang > > > > > > White, Peter scripsit 15/03/10 15:53: > >> Dear Gordon, > >> > >> The plots are visible in the blog view on gmane.org: > >> > >> > > > http://permalink.gmane.org/gmane.science.biology.informatics.conduct or/ > > 27731 > >> > >> I thought you may be on to something with the weights but I tried it > > with and without a flag function (also double checked the Agilent > file > > and the high intensity spots are not flagged). It really does look > like > > the loess is just not fitted beyond for elements with an A value > > > 16??? These 20-bit scans from Agilent are quite new and I suspect > most > > folks with just use the Agilent normalized data rather than starting > > with the raw data, so maybe this just hasn't been observed before > now? > >> > >> Thanks, > >> > >> Peter > >> > >> Below is the code I used: > >> > >> library(limma) > >> agilentFiles <- list.files(pattern="U") > >> rawObj <- read.maimages(agilentFiles, > >> columns = list(G = "gMedianSignal", Gb = "gBGMedianSignal", > >> R = "rMedianSignal", Rb = "rBGMedianSignal"), > >> annotation= c("ProbeName", "SystematicName","ControlType")) > >> #Remove spike controls and remove background signals > >> bgObj <- rawObj > >> posControls <- grep(T,rawObj$genes$ControlType == 1) > >> bgObj$G[posControls,] <- NA > >> bgObj$R[posControls,] <- NA > >> bgObj$Gb <- bgObj$Rb <- NULL > >> #Loess normalize > >> normObj <- normalizeWithinArrays(bgObj, method="loess", > > weights=NULL) > >> #Plot MvA > >> for (i in 1:ncol(normObj)) { > >> figureName <- paste(i, " MvA Plots") > >> mat <- matrix(c(3,1,2),nrow=3,ncol=1) > >> layout(mat,heights=c(1,10,10)) > >> plotMA(rawObj, array=i, main = "Pre-Normalization MvA", > >> ylim=c(-3.5,3.5), zero.weights=TRUE) > >> abline(0,0) > >> plotMA(normObj, array=i, main = "Normalized MvA", > >> ylim=c(-3.5,3.5), zero.weights=TRUE) > >> abline(0,0) > >> layout(1) > >> mtext(figureName, cex=1.25, line=3) > >> savePlot(filename=figureName, type=c("png"), device=dev.cur()) > >> } > >> > >>> sessionInfo() > >> R version 2.10.1 (2009-12-14) > >> i386-pc-mingw32 > >> > >> locale: > >> [1] LC_COLLATE=English_United States.1252 > >> [2] LC_CTYPE=English_United States.1252 > >> [3] LC_MONETARY=English_United States.1252 > >> [4] LC_NUMERIC=C > >> [5] LC_TIME=English_United States.1252 > >> > >> attached base packages: > >> [1] grDevices datasets splines graphics stats tcltk > utils > >> [8] methods base > >> > >> other attached packages: > >> [1] limma_3.2.2 svSocket_0.9-48 TinnR_1.0.3 R2HTML_1.59-1 > >> [5] Hmisc_3.7-0 survival_2.35-9 > >> > >> loaded via a namespace (and not attached): > >> [1] cluster_1.12.1 grid_2.10.1 lattice_0.18-3 svMisc_0.9-56 > > tools_2.10.1 > >> > >>> -----Original Message----- > >>> From: Gordon K Smyth [mailto:smyth at wehi.EDU.AU] > >>> Sent: Saturday, March 13, 2010 6:39 PM > >>> To: White, Peter > >>> Cc: Bioconductor mailing list > >>> Subject: [BioC] Issue with limma and normalization of Agilent data > >>> generated with a 20-bit scan > >>> > >>> Dear Peter, > >>> > >>> You can't send attachments to the Bioconductor mailing list, so I > > have > >>> not > >>> seen your plots. However I am not aware of any issue such as you > >>> describe. The limma function normalizeWithinArrays includes all > > spots > >>> in > >>> the normalization, regardless of how large the A-value is. You > > haven't > >>> shown us any code, or any problem we can reproduce, so we can't > tell > >>> whether or not you're doing something wrong. We don't know whether > >>> you're > >>> using probe weights, whether you've filtered control spots, etc > etc. > >>> > >>> Best wishes > >>> Gordon > >>> > >>>> Date: Fri, 12 Mar 2010 10:21:41 -0500 > >>>> From: "White, Peter" <peter.white at="" nationwidechildrens.org=""> > >>>> To: "'bioconductor at stat.math.ethz.ch'" > >>>> <bioconductor at="" stat.math.ethz.ch=""> > >>>> Subject: [BioC] Issue with limma and normalization of Agilent data > >>>> generated with a 20-bit scan > >>>> Content-Type: text/plain > >>>> > >>>> I have noticed an issue with the limma normalizeWithinArrays > > function > >>>> (and also with marray and maNorm). When normalizing two color data > >>>> generated with an Agilent 20-bt scanner it fails to normalize the > >>> high > >>>> intensity data (i.e. any points with an A value > 16). In our > > dataset > >>> we > >>>> have in excess of 400 elements with red and green intensities > > ranging > >>>> from 65500 to 475100. When we loess normalize the data any points > >>> beyond > >>>> A=16 appear to be untouched by the normalization. If the attached > >>>> figures come through this should be clear - when using maNorm and > >>> maPlot > >>>> it will plot the loess line and you can see it stop at 16. > >>>> > >>>> Is it possible for loess normalization to be extended to this > > higher > >>>> intensity data? Or am I just doing something wrong? > >>>> > >>>> Thanks, > >>>> > >>>> Peter > >>>> > >>>> > >>>> Peter White, Ph.D. > >>>> Director, Biomedical Genomics > > Core<http: genomics.nchresearch.org=""/> > >>>> Research Assistant Professor of Pediatrics > >>>> The Research Institute at > >>>> Nationwide Children's Hospital and > >>>> The Ohio State University > >>>> > >>>> Mailing Address: > >>>> > >>>> The Research Institute at > >>>> Nationwide Children's Hospital > >>>> 700 Children's Drive, W510 > >>>> Columbus, OH 43205 > >>>> > >>>> Assistant (Jennifer Neelans): (614) 722-2915 > >>>> Office: (614) 355-2671 > >>>> Lab: (614) 355-5252 > >>>> Fax: (614) 722-2818 > >>>> Web: http://genomics.nchresearch.org/ > >>> > > > ______________________________________________________________________ > >>> The information in this email is confidential and intended solely > > for > >>> the addressee. > >>> You must not disclose, forward, print or use it without the > > permission > >>> of the sender. > >>> > > > ______________________________________________________________________ > >> > >> Confidentiality Notice: The following mail message, including any > > attachments, is for the sole use of the intended recipient(s) and may > > contain confidential and privileged information. The recipient is > > responsible to maintain the confidentiality of this information and > to > > use the information only for authorized purposes. If you are not the > > intended recipient (or authorized to receive information for the > > intended recipient), you are hereby notified that any review, use, > > disclosure, distribution, copying, printing, or action taken in > > reliance on the contents of this e-mail is strictly prohibited. If > you > > have received this communication in error, please notify Nationwide > > Children's Hospital immediately by replying to this e-mail and > destroy > > all copies of the original message. Thank you. > >> > >> > >> > >> > >> -------------------------------------------------------------------- > - > > --- > >> > >> > >> -------------------------------------------------------------------- > - > > --- > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor at stat.math.ethz.ch > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > -- > > > > Best wishes > > Wolfgang > > > > > > -- > > Wolfgang Huber > > EMBL > > http://www.embl.de/research/units/genome_biology/huber/contact > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -------------- next part -------------- A non-text attachment was scrubbed... Name: Agilent Span 0.1 vs 0.01 MvA Plots.png Type: image/png Size: 9854 bytes Desc: Agilent Span 0.1 vs 0.01 MvA Plots.png URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20100315="" 2f849782="" attachment.png="">
0
Entering edit mode

Dear Peter,

The plots on the permalink website do not correspond to the R code in your email.  Apparently the plots are from the marray package, produced by R code which you do not give.  I can't comment on the behaviour of someone else's package.

As I've already told you, normalizeWithinArrays() uses all the points, and having an A-value over 16 is not an issue. However, the loess curve is designed to be quite stiff, and also to ignore outliers, and therefore the curve will not follow a highly localized J-curve at the right end of the range, which is what your MA-plot seems to show.  The loess curve is designed to follow the main trend, not local trends involving small proportions of the points.  If you believe that your platform has a different MvsA relationship for A>16 than for A<16, and this should be removed by the normalization curve, then you can do one of two things. One possibility is simply to make the curve more local by reducing the span paramater:

  normObj <- normalizeWithinArrays(bgObj, method="loess", span=0.1)

Choosing span small enough will certainly remove the J-curve.  However I don't recommend this approach, as it is not specific to A>16.  I recommend instead that you use the modifyWeights() function to give the spots with A>16 increased weights, so the loess curve will follow it more closely. You might find a combination of slightly reduced span and increased weights will work best.

BTW, I would recommend that you use spottypes to display control spots, and weights to downweight them in the loess normalization, rather than hacking NA values into your data.  It gives you more information and flexibility.

Best wishes
Gordon

0
Entering edit mode

Dear Gordon,

Thanks so much for your detailed response. I did try setting the span to  and it definitely improved the J-curve. Scroll all the way to the bottom of:

I also tried setting the span to 0.01 and it looked even better (lower than that and crazy things started to happen):

As far as I could see lowering the span had no effect on the elements less than an average log2 intensity of 16. I did try adding a weight function that elements with a median red or green intensity > 60,000 were weighted 5, but it made no discernable difference to the default normalization.

All the best,

Peter

0
Entering edit mode

With the weights, you could easily increase them to 100 without any likely problems.

Gordon