aroma.light advice sought

0

Entering edit mode

Iain Gallagher ▴ 930

@iain-gallagher-2532

Last seen 8.8 years ago

United Kingdom

Dear List I was wondering if someone could give me some advice on the workflow for the R/Bioconductor package aroma.light. I have 12 Exiqon arrays, stained with Cy3 and scanned at 3 different PMT settings of 250, 300 and 350 (75% laser power) using a Genepix 4200A Autoloader. These settings are based on the an initial scan with PMT set to auto which scanned each array at ~350. I have read the papers by H. Bengtsson et al from 2004 and 2006 describing the scanner offset problem and the solution (as implemented in aroma). I am, however rather naive with microarray data handling and I am unsure however how to proceed in terms of analysis. Specifically, do I need to carry out the calibrateMultiscan.matrix procedure for each triplicate of arrays or can I just proceed to affine normalization? Once I have the normalized data to I back transform this using the backtransformAffine.matrix procedure or do I use the data straight after the nomalization? After this step I am fairly confident and can use SAM and limma to investigate differential expression. I would, however like to use the information gathered from across the scans to maximise my data collection opportunity Thanks for any advice. Iain [[alternative HTML version deleted]]

Microarray limma aroma.light Microarray limma aroma.light • 857 views

ADD COMMENT • link updated 15.8 years ago by Henrik Bengtsson ★ 2.4k • written 15.8 years ago by Iain Gallagher ▴ 930

0

Entering edit mode

Henrik Bengtsson ★ 2.4k

@henrik-bengtsson-4333

Last seen 15 days ago

United States

Hi Iain, On Wed, Jul 16, 2008 at 1:54 AM, Iain Gallagher <iaingallagher at="" btopenworld.com=""> wrote: > Dear List > > I was wondering if someone could give me some advice on the workflow for the R/Bioconductor package aroma.light. I have 12 Exiqon arrays, stained with Cy3 and scanned at 3 different PMT settings of 250, 300 and 350 (75% laser power) using a Genepix 4200A Autoloader. These settings are based on the an initial scan with PMT set to auto which scanned each array at ~350. So, if I understand it correctly they're scanned at in either the order (350, 300, 250) or (350, 250, 300). Just for clarification, the order does not matter, but many people are more comfortable with having the first scan set to their defaults just in case there is, say, dye bleaching. We didn't find dye bleaching to be a problem. The only thing to be careful about is to not set the PMT too low or too high. If too low, the scanner noise will take over and if too high, scanner saturation/censoring takes place. Otherwise, the noise you obtain does indeed scale with the signal, i.e. the exact PMT setting is not critical as long as you're using "decent" settings. Also, make sure to read help("1. Calibration and Normalization"), especially the suggestion that you should keep as much as possible fixed between scans but the PMT, e.g. avoid washing arrays etc. > > I have read the papers by H. Bengtsson et al from 2004 and 2006 describing the scanner offset problem and the solution (as implemented in aroma). I am, however rather naive with microarray data handling and I am unsure however how to proceed in terms of analysis. > > Specifically, do I need to carry out the calibrateMultiscan.matrix procedure for each triplicate of arrays or can I just proceed to affine normalization? Once I have the normalized data to I back transform this using the backtransformAffine.matrix procedure or do I use the data straight after the nomalization? Consider multiscan calibration to be a step totally independent of following normalization. Always do multiscan calibration *before anything else*. I prefer to use affine normalization to normalize between channels, but others prefer curve-fit normalization ("loess") or quantile normalization etc. What you choose is independent of the multiscan calibration. In either case, you never have to call backtransformAffine() yourself (that's a low-level method). Multiscan calibration is a calibration method that is applied to each hybridization and each channel separately. Say you have K=3 scans, each with N signals in both channels. Take your signals across all arrays in the first channel put these signals in a NxK matrix 'XR'. Do the same for the other(s) channel(s). Then do: XRc <- calibrateMultiscan(XR); XGc <- calibrateMultiscan(XG); Now you have two Nx1 matrices with calibrated signals for the red and the green channels for that hybridization. That's all you need to do "merge" multiple scans for one hybridization. No parameters to choose - nothing. If you want to see the parameter estimates, do: fit <- attr(XRc, "modelFit"); The scanner offset is e=fit$adiag[1] and the relative scale (to the first channel) of each channel is bb=fit$b. These are denoted e_c and bb_c = (1, b_2, ..., b_K) in Bengtsson et al. 2004. It is the scanner offset that cause problem, not the relative scales. We see offsets in the range of 15 to 25 units (out of 65,535). It would be interesting to hear back from you what *scanner offsets* you observe with your scanner and how stable this is across arrays. Foreground and/or background signals? (this question is typically asked sooner or later) First of all, if you as I prefer to work with foreground signals only, then the answer is simple - use only foreground signals in XR and XG above. The underlying model multiscan calibration method is based on effects that goes on in the scanner and not on the array. That is, it assumes that every pixel intensity undergoes the same transform regardless whether it is a pixel, say, inside or outside a spot. Thus, if you add background signals to your 'XR' and 'XG' above, they should increase the precision of your estimate. However, there are typically enough foreground signals to achieve high-precision estimates anyway, so it doesn't really make difference in the end of the day. However, there is a risk fitting with background signals, and that is that the background estimates (the are many different methods out there) might be biased relative to the foreground estimates. We didn't study this, so I don't know if it is a real problem. To summaries, don't worry and use foreground only if that is use downstream and use foreground & background if that is used downstream. Finally, if you want to see how strong of an effect your scanner offset it, you can look at the within-channel log-ratios between the choose(K,2) (=3) scan pairs like this: plotMvsAPairs(XR); If there is a scanner offset, you find the MvsA data points to curve at the lower intensities (cf. Figure 7a in the paper), otherwise not (Figure 7b). Actually, you should see them converge to M=0 at A=log2(e) if you have a scanner offset. To see if the calibration controls for this, try (Figure 7c): XRc2 <- calibrateMultiscan(XR, average=NULL); plotMvsAPairs(XRc2); Now all pairs should overlap almost perfectly. Let me know if you have any other questions. Henrik > > After this step I am fairly confident and can use SAM and limma to investigate differential expression. I would, however like to use the information gathered from across the scans to maximise my data collection opportunity > > Thanks for any advice. > > Iain > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 15.8 years ago Henrik Bengtsson ★ 2.4k

Login before adding your answer.