Combining data from scans at different intensities

0

Entering edit mode

John Fowler ▴ 60

@john-fowler-2043

Last seen 11.3 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070213/ 26abd3fe/attachment.pl

• 770 views

ADD COMMENT • link 18.8 years ago John Fowler ▴ 60

0

Entering edit mode

Henrik Bengtsson ★ 2.4k

@henrik-bengtsson-4333

Last seen 19 months ago

United States

Hi. On 2/13/07, John Fowler <fowlerj at="" science.oregonstate.edu=""> wrote: > Hello, > > I would like to use data extracted from images scanned at 3 different > intensities in our GenePix scanner. There are a couple of papers > that I could find (Lyng et al 04, Piepho et al 06) that describe > methods to combine these data and thus help deal with problems of > saturation and signals across the dynamic range of the scanner. > > I looked for a way to do this in bioconductor, and found a post from > Dr. Henrik Bengtsson, indicating that this was possible using the > aroma.light package in bioconductor. However, he indicated that this > should be done with data from scans in which the laser intensity =was > not changed=. > > Unfortunately, my scans used two different laser intensities. So, what was your settings for the three scans? If two scans have the same laser setting, how does the third scan differ? Different PMT settings? > > Does this invalidate using aroma.light for this purpose? Is there > any other Bioconductor package that could deal with my (apparently > incorrectly obtained) data? What we observed from scanning at different sensitivity (=PMT) levels was that the scanner adds an offset to the signals and that this offset is independent of the PMT setting. We also observed that this offset is more or less constant across arrays (also roughly between channels), indicating that the offset is added either in the PMT (photomultiplier type) or more likely in the analogue-to-digital electronics just after the PMT. We observed this in both of the scanners investigated, Axon GenePix 4000A and Agilent G2505A. The multiscan calibration model is applied to each channel separately. Let c={R,G} be the two channels, and let e_c be the offset in channel c. Say you do multiple scans k=1,...,K. Then y_{c,i}^(k) denotes the probe signal in channel c for probe i and scan k. Let the unknown amount of hybridized sequence in this probe is denoted by x_{c,i}, which is independent of scan k. To be really precise here, x_{c,i} is the amount of light emitted from probe i entering the PMT. We proposed the model: y_{c,i}^(k) = a_c^(k) + b_c^(k)*x_{c,i} + eps_{c,i}^(k) \approx e_c + b_c^(k)*x_{c,i} + eps_{c,i}^(k) (*) where eps_{c,i}^(k) is zero-mean noise. By do multiscan at various *PMT settings*, we can identify e_c and all of the b_c^(k). Even better, we get a good estimate of x_{c,i}, the amount of light entering the PMT tube, so in the end of the day we control for effects in the PMT and the electronics afterwards. We strongly believe this is a good model for those effects. Now, if you adjust the laser power, you effectively adjust the amount of light being emitted from each probe too, that is, you can no longer assume x_{c,i} being constant, but you have x_{c,i}^{m} where m=1,...,M is the different *laser levels*. You may provide a similar model to (*) for laser-adjusted scans, e.g. x_{c,i}^(m) \approx d_c + g_c^(m)*z_{c,i} + xi_{c,i}^(m) (**) where now z_{c,i} is the amount of labels on the hybridized target on probe i ,and x_{c,i}^(m) is the amount of light emitted by this probe at laser level m. One open question is if "laser offset" d_c is constant or if it depends on m too. Now, if (**) is true, when combining (*) and (**), which are both so called _affine_ functions, you will get another affine function: y_{c,i}^(k) = e_c + b_c^(k)*(d_c + g_c^(m)*z_{c,i} + xi_{c,i}^(m)) + eps_{c,i}^(k) = e_c + d_c*b_c^(k) + h_c^(k,m)*z_{c,i} + nu_{c,i}^(k,m) (***) where nu_{c,i}^(k,m) is confounded noise. Compare Models (***) and (*). If d_c = 0, then (*) and (***) are similar, and you can use (*) for your data. If d_c != 0, then d_c*b_c^(k) must be estimated too. The Y <- calibrateMultiscan(X) in aroma.light applies to Model (*). There is no implementation for Model (***) when d_c != 0, but I would say give it a try. If you want to, I can have a look at your multiscan data for a typical array. If so, we'll have to figure out a way to transfer three GPR files. Best Henrik > > many thanks! > John > > -- > John Fowler Associate Professor > Botany and Plant Pathology (BPP) Dept. > 2082 Cordley Hall Phone: (541) 737-5307 > Oregon State University FAX: (541) 737-3573 > Corvallis, OR 97331-2902 USA Email: fowlerj at science.oregonstate.edu > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 18.8 years ago Henrik Bengtsson ★ 2.4k

0

Entering edit mode

John Fowler ▴ 60

@john-fowler-2043

Last seen 11.3 years ago

Henrik Bengtsson <hb at="" ...=""> writes: > > Hi. > > On 2/13/07, John Fowler <fowlerj at="" ...=""> wrote: > > Hello, > > > > I would like to use data extracted from images scanned at 3 different > > intensities in our GenePix scanner. There are a couple of papers > > that I could find (Lyng et al 04, Piepho et al 06) that describe > > methods to combine these data and thus help deal with problems of > > saturation and signals across the dynamic range of the scanner. > > > > I looked for a way to do this in bioconductor, and found a post from > > Dr. Henrik Bengtsson, indicating that this was possible using the > > aroma.light package in bioconductor. However, he indicated that this > > should be done with data from scans in which the laser intensity =was > > not changed=. > > > > Unfortunately, my scans used two different laser intensities. > > So, what was your settings for the three scans? If two scans have the > same laser setting, how does the third scan differ? Different PMT > settings? > > > > > Does this invalidate using aroma.light for this purpose? Is there > > any other Bioconductor package that could deal with my (apparently > > incorrectly obtained) data? > > What we observed from scanning at different sensitivity (=PMT) levels > was that the scanner adds an offset to the signals and that this > offset is independent of the PMT setting. We also observed that this > offset is more or less constant across arrays (also roughly between > channels), indicating that the offset is added either in the PMT > (photomultiplier type) or more likely in the analogue-to-digital > electronics just after the PMT. We observed this in both of the > scanners investigated, Axon GenePix 4000A and Agilent G2505A. > > The multiscan calibration model is applied to each channel separately. > Let c={R,G} be the two channels, and let e_c be the offset in channel > c. Say you do multiple scans k=1,...,K. Then y_{c,i}^(k) denotes the > probe signal in channel c for probe i and scan k. Let the unknown > amount of hybridized sequence in this probe is denoted by x_{c,i}, > which is independent of scan k. To be really precise here, x_{c,i} is > the amount of light emitted from probe i entering the PMT. We > proposed the model: > > y_{c,i}^(k) = a_c^(k) + b_c^(k)*x_{c,i} + eps_{c,i}^(k) > \approx e_c + b_c^(k)*x_{c,i} + eps_{c,i}^(k) (*) > > where eps_{c,i}^(k) is zero-mean noise. By do multiscan at various > *PMT settings*, we can identify e_c and all of the b_c^(k). Even > better, we get a good estimate of x_{c,i}, the amount of light > entering the PMT tube, so in the end of the day we control for effects > in the PMT and the electronics afterwards. We strongly believe this > is a good model for those effects. > > Now, if you adjust the laser power, you effectively adjust the amount > of light being emitted from each probe too, that is, you can no longer > assume x_{c,i} being constant, but you have x_{c,i}^{m} where > m=1,...,M is the different *laser levels*. You may provide a similar > model to (*) for laser-adjusted scans, e.g. > > x_{c,i}^(m) \approx d_c + g_c^(m)*z_{c,i} + xi_{c,i}^(m) (**) > > where now z_{c,i} is the amount of labels on the hybridized target on > probe i ,and x_{c,i}^(m) is the amount of light emitted by this probe > at laser level m. One open question is if "laser offset" d_c is > constant or if it depends on m too. > > Now, if (**) is true, when combining (*) and (**), which are both so > called _affine_ functions, you will get another affine function: > > y_{c,i}^(k) = e_c + b_c^(k)*(d_c + g_c^(m)*z_{c,i} + xi_{c,i}^(m)) + > eps_{c,i}^(k) > = e_c + d_c*b_c^(k) + h_c^(k,m)*z_{c,i} + nu_{c,i}^(k,m) (***) > > where nu_{c,i}^(k,m) is confounded noise. Compare Models (***) and > (*). If d_c = 0, then (*) and (***) are similar, and you can use (*) > for your data. If d_c != 0, then d_c*b_c^(k) must be estimated too. > > The Y <- calibrateMultiscan(X) in aroma.light applies to Model (*). > There is no implementation for Model (***) when d_c != 0, but I would > say give it a try. > > If you want to, I can have a look at your multiscan data for a typical > array. If so, we'll have to figure out a way to transfer three GPR > files. > > Best > > Henrik > > > > > many thanks! > > John > > > > -- > > John Fowler Associate Professor > > Botany and Plant Pathology (BPP) Dept. > > 2082 Cordley Hall Phone: (541) 737-5307 > > Oregon State University FAX: (541) 737-3573 > > Corvallis, OR 97331-2902 USA Email: fowlerj at ... > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at ... > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at ... > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > Hi Henrik, thank you very much for the rapid reply! My three scans are something like this, I don't have the exact numbers right now: 'low' scan - 80% laser power, PMT at ~350 'medium' scan - 80% laser power, PMT at ~400 'high' scan - 90% laser power, PMT at ~400 In retrospect, I am quieting cursing at myself for changing two variables... Anyway, after noting your post, I went back and checked the papers by Lyng et al 04 and Piepho et al 06 that I had seem previously, and saw that in both cases they also kept the laser power constant and changed the PMT. I actually scanned some of these slides just this morning, and so have the opportunity to go back and re-scan them - sounds like this might be the best approach? Unfortunately, there are some older slides in this experiment for which this is not an option. Also, I must admit that I don't follow the details of the statistical solutions you explained. However, I think I grasp the gist of it. If I try using calibrateMultiscan(X) with my data, how would I know that it was giving me an invalid output? Have you looked at either of the papers I referenced above, to see what you think of them, and whether the approaches used in those papers would work better for my situation? Thank you for your responses, if it seems like it would be worthwhile for me to get you my .gpr files, and you can take the time to look at them, I think I should be able to figure out a way to post them someplace where you could download them. again, thanks again for your help! John

ADD COMMENT • link 18.8 years ago John Fowler ▴ 60

0

Entering edit mode

Hi. On 2/13/07, John Fowler <fowlerj at="" science.oregonstate.edu=""> wrote: > > Henrik Bengtsson <hb at="" ...=""> writes: > > > > > Hi. > > > > On 2/13/07, John Fowler <fowlerj at="" ...=""> wrote: > > > Hello, > > > > > > I would like to use data extracted from images scanned at 3 different > > > intensities in our GenePix scanner. There are a couple of papers > > > that I could find (Lyng et al 04, Piepho et al 06) that describe > > > methods to combine these data and thus help deal with problems of > > > saturation and signals across the dynamic range of the scanner. > > > > > > I looked for a way to do this in bioconductor, and found a post from > > > Dr. Henrik Bengtsson, indicating that this was possible using the > > > aroma.light package in bioconductor. However, he indicated that this > > > should be done with data from scans in which the laser intensity =was > > > not changed=. > > > > > > Unfortunately, my scans used two different laser intensities. > > > > So, what was your settings for the three scans? If two scans have the > > same laser setting, how does the third scan differ? Different PMT > > settings? > > > > > > > > Does this invalidate using aroma.light for this purpose? Is there > > > any other Bioconductor package that could deal with my (apparently > > > incorrectly obtained) data? > > > > What we observed from scanning at different sensitivity (=PMT) levels > > was that the scanner adds an offset to the signals and that this > > offset is independent of the PMT setting. We also observed that this > > offset is more or less constant across arrays (also roughly between > > channels), indicating that the offset is added either in the PMT > > (photomultiplier type) or more likely in the analogue-to-digital > > electronics just after the PMT. We observed this in both of the > > scanners investigated, Axon GenePix 4000A and Agilent G2505A. > > > > The multiscan calibration model is applied to each channel separately. > > Let c={R,G} be the two channels, and let e_c be the offset in channel > > c. Say you do multiple scans k=1,...,K. Then y_{c,i}^(k) denotes the > > probe signal in channel c for probe i and scan k. Let the unknown > > amount of hybridized sequence in this probe is denoted by x_{c,i}, > > which is independent of scan k. To be really precise here, x_{c,i} is > > the amount of light emitted from probe i entering the PMT. We > > proposed the model: > > > > y_{c,i}^(k) = a_c^(k) + b_c^(k)*x_{c,i} + eps_{c,i}^(k) > > \approx e_c + b_c^(k)*x_{c,i} + eps_{c,i}^(k) (*) > > > > where eps_{c,i}^(k) is zero-mean noise. By do multiscan at various > > *PMT settings*, we can identify e_c and all of the b_c^(k). Even > > better, we get a good estimate of x_{c,i}, the amount of light > > entering the PMT tube, so in the end of the day we control for effects > > in the PMT and the electronics afterwards. We strongly believe this > > is a good model for those effects. > > > > Now, if you adjust the laser power, you effectively adjust the amount > > of light being emitted from each probe too, that is, you can no longer > > assume x_{c,i} being constant, but you have x_{c,i}^{m} where > > m=1,...,M is the different *laser levels*. You may provide a similar > > model to (*) for laser-adjusted scans, e.g. > > > > x_{c,i}^(m) \approx d_c + g_c^(m)*z_{c,i} + xi_{c,i}^(m) (**) > > > > where now z_{c,i} is the amount of labels on the hybridized target on > > probe i ,and x_{c,i}^(m) is the amount of light emitted by this probe > > at laser level m. One open question is if "laser offset" d_c is > > constant or if it depends on m too. > > > > Now, if (**) is true, when combining (*) and (**), which are both so > > called _affine_ functions, you will get another affine function: > > > > y_{c,i}^(k) = e_c + b_c^(k)*(d_c + g_c^(m)*z_{c,i} + xi_{c,i}^(m)) + > > eps_{c,i}^(k) > > = e_c + d_c*b_c^(k) + h_c^(k,m)*z_{c,i} + nu_{c,i}^(k,m) (***) > > > > where nu_{c,i}^(k,m) is confounded noise. Compare Models (***) and > > (*). If d_c = 0, then (*) and (***) are similar, and you can use (*) > > for your data. If d_c != 0, then d_c*b_c^(k) must be estimated too. > > > > The Y <- calibrateMultiscan(X) in aroma.light applies to Model (*). > > There is no implementation for Model (***) when d_c != 0, but I would > > say give it a try. > > > > If you want to, I can have a look at your multiscan data for a typical > > array. If so, we'll have to figure out a way to transfer three GPR > > files. > > > > Best > > > > Henrik > > > > > > > > many thanks! > > > John > > > > > > -- > > > John Fowler Associate Professor > > > Botany and Plant Pathology (BPP) Dept. > > > 2082 Cordley Hall Phone: (541) 737-5307 > > > Oregon State University FAX: (541) 737-3573 > > > Corvallis, OR 97331-2902 USA Email: fowlerj at ... > > > [[alternative HTML version deleted]] > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor at ... > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at ... > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > Hi Henrik, > > thank you very much for the rapid reply! > > My three scans are something like this, I don't have the exact numbers right now: > > 'low' scan - 80% laser power, PMT at ~350 > 'medium' scan - 80% laser power, PMT at ~400 > 'high' scan - 90% laser power, PMT at ~400 > > > In retrospect, I am quieting cursing at myself for changing two variables... > > > Anyway, after noting your post, I went back and checked the papers by Lyng et al > 04 and Piepho et al 06 that I had seem previously, and saw that in both cases > they also kept the laser power constant and changed the PMT. > > I actually scanned some of these slides just this morning, and so have the > opportunity to go back and re-scan them - sounds like this might be the best > approach? Unfortunately, there are some older slides in this experiment for > which this is not an option. > > Also, I must admit that I don't follow the details of the statistical solutions > you explained. However, I think I grasp the gist of it. If I try using > calibrateMultiscan(X) with my data, how would I know that it was giving me an > invalid output? Sorry about all those details. The summary is that we found that the scanners are very much linear in its measurements except from a small offset added, which we believe is added on purpose to avoid non-sense negative signals. By doing "PMT" scans we can identify and correct for this offset. If not corrected for it will add artifacts to your data. By doing "laser-power" scans we would be able to identify another type of offset in the scanner, which cannot be detected by PMT scans. This other offset may or may not be there. If it is not there, or is much smaller than the "PMT offset", you can safely use the calibrateMultiscan(). But, if the "laser offset" is of the same order or large than the "PMT offset", a specially designed calibration method is required. I do not know of any other methods available that deals with this. Having said this, I then said that you could still give it a try. The reason for this is that you probably will get better results than not calibrating the data at all. I think you can do even better though. I have to see the data in order to be more precise. > > Have you looked at either of the papers I referenced above, to see what you > think of them, and whether the approaches used in those papers would work better > for my situation? If you look at the graphs for the GenePix scanner of Lyng et al (2004), you see that they also observer an offset in the scanner. In their paper they report that the "intensities measured ...approaching a constant value of about 20". (For the ScanArray scanner they observe a negative offset, which is interesting). This is in the same range as the offset we detected too. They the conclude that "At spot intensities ... below 200 (scan 2) the relationships deviated from linearity". The main reason for this is that there is an offset of about 20 in the signals and below 200, that offset has a serious impact on ratios and on the log scale, e.g. compare M=log2((20+50)/(20+100)) = -0.78 to Mcalib=log(50/100)=-1. However, if you do calibrate for the scanner offset, I claim that you will get a linear relationship between the amount of DNA in the spots and what you measure at much weaker signals. From what I read in Lyng et al, I believe they would agree with this too. To deal with saturated probe signals, I agree with the authors that the median rather than the mean pixel intensity should be used for the probe signal. Saturation is take care of by our multiscan method in the sense that the estimates are robust (I can give more arguments but that will mean even more details). Lyng et al do not correct for the scanner offset, which is what you most likely have in your GenePix data. I am less familiar with the details of Piepho et al (2006) - they target the problem of saturation, and only mention in the discussion that offsets could be modeled too. I cannot remember the reference, but there is another paper on how to correct saturated signals by using the relationship between the mean and the median pixel intensities. As long as you have some scans where your spots are not saturated, I would worry less about the saturated spots than about scanner offset. There are few other papers on how to combine two or more scans, but I do not know of anyone dealing with the case where both laser and PMT have been adjusted. Finally, the people making scanners really know what they are doing, the parts such as the PMTs have been around in other technologies for many years, and they have been optimized for a long time. I think we can trust that the scanners are very "linear" and have large dynamical range (much more than the rest of the microarray process). However, the scanner offset is there and must be corrected for, and to the best of my knowledge it is added on purpose by the scanner manufacturer in order to avoid nonsense (censored) negative signals (due to noise). The reason why we see that saturated spot signals curve of as we approach the upper limit of the scanner is most likely not due to the scanner is not linear there, but that we are taking the average (mean or median) of many pixels per spot; Figure 1 in Piepho et al illustrates this nicely. > > Thank you for your responses, if it seems like it would be worthwhile for me to > get you my .gpr files, and you can take the time to look at them, I think I > should be able to figure out a way to post them someplace where you could > download them. Yes, it would be great if I could have a look at your laser-adjusted scans, so I don't have to guess about the effects. Cheers Henrik > > again, thanks again for your help! > John > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 18.8 years ago Henrik Bengtsson ★ 2.4k

Login before adding your answer.