Development of GCRMA-like methods

0

Entering edit mode

Matthew Hannah ▴ 940

@matthew-hannah-621

Last seen 11.5 years ago

Hi, I've been reading a few papers and I'm looking for some other opinions on a couple of questions/ideas I have. Basically this is directed towards a discussion on how methods such as GCRMA will develop. The main points of reference are - (1) A model based BG adjustment for Oligo arrays. Wu, Irizarry et al.. 2003. Unpublished? (2) Solving the riddle of the bright MM's. Naef & Magnasco. 2003. Phys Rev E 68, 011906. My understanding is that- The first of these papers shows that MM intensity is related to GC content, and weights MM values towards the average distribution of the binding of MM with similar GC contents. The second proposes that most MM>PM occur because when the middle PM A/G is changed to MM T/C the smaller size of the substituted pyrimidine (C or T) allows room for the label on the target RNA (U or C) which would otherwise interfere with the binding to the PM. Are there plans to combine these ideas and would there be any benefit from doing so? Would sub-setting the MM based on both GC content and the middle base provide more accurate distributions to weight the MM's to? The majority of the MM>PM would have C (or T) as their middle base and 'averaging' them must surely distort things for the MM's with A or G? Finally Fig 3 in ref (2) shows nice fits of the positional effect due to having individual bases at different positions (1-25) in the PM probe. What would such fits look like for the MM probes, would it be similar or random/distorted due to non-specific binding? And would it help in answering why C has a smaller effect than G on intensity - or is this already known? Cheers Matt Dr. Matt Hannah Max-Planck Institute of Molecular Plant Physiology Am M?hlenburg 1 14476 Golm Germany + 49 (331) 567 8255 (phone) + 49 (331) 567 8250 (fax)

gcrma oligo gcrma oligo • 1.9k views

ADD COMMENT • link 22.0 years ago Matthew Hannah ▴ 940

0

Entering edit mode

Rafael A. Irizarry ★ 2.3k

@rafael-a-irizarry-205

Last seen 11.5 years ago

already done. its implemented in the latest version of gcrma (1.0.2) and described (not fully) in this paper: Wu, Z and Irizarry, RA, Stochastic Models Inspired by Hybridization Theory for Short Oligonucleotide Arrays . Proceedings of RECOMB 2004 (to appear) http://www.biostat.jhsph.edu/~ririzarr/papers/p177-irizarry.pdf the reference you cite is outdated. we have been waiting (since July 2003!) for referee reports from the journal of the american statistical association (JASA) to update it all in one go. apologies for this. -r On Thu, 5 Feb 2004, Matthew Hannah wrote: > Hi, > > I've been reading a few papers and I'm looking for some other opinions on a couple of questions/ideas I have. Basically this is directed towards a discussion on how methods such as GCRMA will develop. > The main points of reference are - > (1) A model based BG adjustment for Oligo arrays. Wu, Irizarry et al.. 2003. Unpublished? > (2) Solving the riddle of the bright MM's. Naef & Magnasco. 2003. Phys Rev E 68, 011906. > > My understanding is that- > The first of these papers shows that MM intensity is related to GC content, and weights MM values towards the average distribution of the binding of MM with similar GC contents. > > The second proposes that most MM>PM occur because when the middle PM A/G is changed to MM T/C the smaller size of the substituted pyrimidine (C or T) allows room for the label on the target RNA (U or C) which would otherwise interfere with the binding to the PM. > > Are there plans to combine these ideas and would there be any benefit from doing so? Would sub-setting the MM based on both GC content and the middle base provide more accurate distributions to weight the MM's to? The majority of the MM>PM would have C (or T) as their middle base and 'averaging' them must surely distort things for the MM's with A or G? > > Finally Fig 3 in ref (2) shows nice fits of the positional effect due to having individual bases at different positions (1-25) in the PM probe. What would such fits look like for the MM probes, would it be similar or random/distorted due to non-specific binding? And would it help in answering why C has a smaller effect than G on intensity - or is this already known? > > Cheers > Matt > > > Dr. Matt Hannah > Max-Planck Institute of Molecular Plant Physiology > Am M?hlenburg 1 > 14476 Golm > Germany > > + 49 (331) 567 8255 (phone) > + 49 (331) 567 8250 (fax) > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >

ADD COMMENT • link 22.0 years ago Rafael A. Irizarry ★ 2.3k

0

Entering edit mode

Hay. Just a couple of notes on your questions ... > > My understanding is that- > > The first of these papers shows that MM intensity > is related to GC content, and weights MM values > towards the average distribution of the binding of > MM with similar GC contents. The signal intensity is more a function of the CT content. The lights attach to the back of the of Gs and As on the target cDNA. Remarkeably, this is syergistic : the more Cs and Ts you have lined up together, the more the signal. The locaion of the Cs and Ts are also important. They are stronger in the middle. > > > > The second proposes that most MM>PM occur because > when the middle PM A/G is changed to MM T/C the > smaller size of the substituted pyrimidine (C or T) > allows room for the label on the target RNA (U or C) > which would otherwise interfere with the binding to > the PM. Label gets put on the cDNA. RNA is converted to cDNA at the last step. Cross hybridization comes from everywhere. The Gs and As overhelm the the correct hybridization at low levels of expressions. At higher levels, the PM goes above MM. It's not just that middle base, it's what's around it, too. The more Cs and T's surrounding the mismatch spot, the stronger the signal. > > > > Are there plans to combine these ideas and would > there be any benefit from doing so? Would > sub-setting the MM based on both GC content and the > middle base provide more accurate distributions to > weight the MM's to? The majority of the MM>PM would > have C (or T) as their middle base and 'averaging' > them must surely distort things for the MM's with A > or G? > > > > Finally Fig 3 in ref (2) shows nice fits of the > positional effect due to having individual bases at > different positions (1-25) in the PM probe. What > would such fits look like for the MM probes, would > it be similar or random/distorted due to > non-specific binding? And would it help in answering > why C has a smaller effect than G on intensity - or > is this already known? I don't have the Fig.; but MM should look the same with a dip at the mismatch spot. Nota Bene : The signal strenth is also a function of the probablility that the target RNA will fold and a function of the distance from the 3` end. __________________________________ Yahoo! Finance: Get your refund fast by filing online.

ADD REPLY • link 22.0 years ago Richard Finney ▴ 180

0

Entering edit mode

although all the below statements may be true in theory, in practice we see something slightly different. say you want to predict intensities for a probe using its sequence information? we have tried prediction using models such as those based on the nearest neighbor models, and they dont work nearly as well as statistical based models (that use training data) as the one suggested by Naef. Naef's idea of simply modeling the log of the sequence effect as an additive linear model using the position/base as predictors (with the across position effect for fixed bases modeled with a smooth function of position) works much better at prediction. the data demonstrates that having a C near the middle results in high intensities and having an A near the middle in low intensities. the closer to the middle, the larger the effect. adding interactions (to account for nearest neighbor effects) does not seem to help prediction at all. both Naef and Jean Wu find this. The G and T dont appear to have much of an effect, although there is some. To see the plots you can look at Figure 3 in Naef and Magnasco's paper "Solving the riddle of the ..." Physical Review E, 68:011906, 2003. we demonstrate that Naef's simple additive model also works for predicting intensities in arrays where one expects only non-specific binding (NSB). (http://www.biostat.jhsph.edu/~ririzarr/papers/p177-irizarry.pdf) and you can see the same ATGC effect for NSB on page 66 of http://www.biostat.jhsph.edu/~ririzarr/Talks/jnj-affy.pdf if anybody has empirical evidence (in microarray data) demonstrating some of the below statements i would be interested in seeing it. On Thu, 5 Feb 2004, Richard Finney wrote: > Hay. Just a couple of notes on your questions ... > > > > My understanding is that- > > > The first of these papers shows that MM intensity > > is related to GC content, and weights MM values > > towards the average distribution of the binding of > > MM with similar GC contents. > > The signal intensity is more a function of the > CT content. The lights attach to the back of the > of Gs and As on the target cDNA. > Remarkeably, this is syergistic : > the more Cs and Ts you have lined up together, > the more the signal. The locaion of the Cs and Ts > are also important. They are stronger in the middle. > > > > > > > > The second proposes that most MM>PM occur because > > when the middle PM A/G is changed to MM T/C the > > smaller size of the substituted pyrimidine (C or T) > > allows room for the label on the target RNA (U or C) > > which would otherwise interfere with the binding to > > the PM. > > Label gets put on the cDNA. RNA is converted to > cDNA at the last step. > > Cross hybridization comes from everywhere. > The Gs and As overhelm the the correct hybridization > at low levels of expressions. At higher levels, > the PM goes above MM. It's not just that middle > base, it's what's around it, too. The more Cs and > T's surrounding the mismatch spot, the stronger > the signal. > > > > > > > > Are there plans to combine these ideas and would > > there be any benefit from doing so? Would > > sub-setting the MM based on both GC content and the > > middle base provide more accurate distributions to > > weight the MM's to? The majority of the MM>PM would > > have C (or T) as their middle base and 'averaging' > > them must surely distort things for the MM's with A > > or G? > > > > > > Finally Fig 3 in ref (2) shows nice fits of the > > positional effect due to having individual bases at > > different positions (1-25) in the PM probe. What > > would such fits look like for the MM probes, would > > it be similar or random/distorted due to > > non-specific binding? And would it help in answering > > why C has a smaller effect than G on intensity - or > > is this already known? > > I don't have the Fig.; but MM should look the same > with a dip at the mismatch spot. > > Nota Bene : The signal strenth is also a function > of the probablility that the target RNA will fold > and a function of the distance from the 3` end. > > > > __________________________________ > > Yahoo! Finance: Get your refund fast by filing online. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >

ADD REPLY • link 22.0 years ago Rafael A. Irizarry ★ 2.3k

0

Entering edit mode

Matthew Hannah ▴ 940

@matthew-hannah-621

Last seen 11.5 years ago

Thanks, I guess I now need the R-devel now as the 1.0.2 version is a developmental package. I've seen some discussions over computing time for GCRMA on here and saw that you could separate the GC background from the expression measurements but how would this be done in practice? Also I'm interested if anyone has got some experience of the reproducability of actual experimental treatments being improved using GCRMA vs. RMA (ie: %of overlapping genes in multiple replicates). Basically do these methods work well for more complex changes than are found in dilution and spike-in data sets. Also for us R novices using windows is there likely to be a GCRMAexpress? Matt

ADD COMMENT • link 22.0 years ago Matthew Hannah ▴ 940

0

Entering edit mode

Matthew Hannah ▴ 940

@matthew-hannah-621

Last seen 11.5 years ago

>The signal intensity is more a function of the CT content. >The lights attach to the back of the of Gs and As on the target cDNA. >Remarkeably, this is syergistic : the more Cs and Ts you have lined >up together, the more the signal. The locaion of the Cs and Ts are >also important. They are stronger in the middle. >Label gets put on the cDNA. RNA is converted to >cDNA at the last step. I think theres some confusion as to target and probe? My understanding was that RNA>cDNA>labelled cRNA (I missed the 'c' in the last post). The labelled nucleotides used are U and C. I understand your reply as C & Ts (on the probe) increase the signal because they bind the labelled Gs & As, but surely it is the reverse with the binding of Gs & As (probe) to Cs & Us (target) being depressed due to interferance from the biotin labels? Cheers, Matt

ADD COMMENT • link 22.0 years ago Matthew Hannah ▴ 940

Login before adding your answer.