annotation for ath1121501 (?)

0

Entering edit mode

Rhonda DeCook ▴ 90

@rhonda-decook-1033

Last seen 11.3 years ago

Recent posts have made me aware of the 'geneplotter' package. This is exactly what I've been wanting to do, but for the ath1121501 chip (Arabidopsis) which does not have an annotation package available at the metadata page yet. Past posts suggest folks are working on getting the annotation for this chip built... https://stat.ethz.ch/pipermail/bioconductor/2004-August/005818.html I was just wondering if anyone had new information on the progress of building this annotation package. Rhonda

Annotation ath1121501 Annotation ath1121501 • 1.5k views

ADD COMMENT • link updated 20.9 years ago by Pita ▴ 120 • written 20.9 years ago by Rhonda DeCook ▴ 90

0

Entering edit mode

Pita ▴ 120

@pita-1011

Last seen 11.3 years ago

I have 3 questions: How are duplicate spots treated when performing the normalizationWithinArrays? Are they treated separately as far as the regression is concerned? How are duplicate spots treated when performing the normalizationBetweenArrays? Are they somehow treated together in the scale normalization or independently? For the case of using quantile normalization, is the number of quantiles the total number of spots on the chip, or it is for the case of duplicate spotting, the number of quantiles are n-spots/2, where each pair are adjusted together in some way? For the case of duplicate spotting, what is the significance of merging the raw channels seperately prior to creating MA values with the loess normalization, then between chip scaling. How many spots in a chip would be required to run quantile normalization vs scale normalization when using normalizeBetweenArrays? Thanks for any insight into this. I am not a statistician, so I am unfamiliar with the ramifications of duplicate treatment in regression and in normalization. Peter W. At 12:16 PM 2/7/2005, Rhonda DeCook wrote: >Recent posts have made me aware of the 'geneplotter' package. This is >exactly >what I've been wanting to do, but for the ath1121501 chip (Arabidopsis) which >does not have an annotation package available at the metadata page yet. > >Past posts suggest folks are working on getting the annotation for this chip >built... >https://stat.ethz.ch/pipermail/bioconductor/2004-August/005818.html > >I was just wondering if anyone had new information on the progress of >building >this annotation package. > > >Rhonda > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor

ADD COMMENT • link 20.9 years ago Pita ▴ 120

0

Entering edit mode

Hi Peter, > I have 3 questions: > > How are duplicate spots treated when performing the > normalizationWithinArrays? Are they treated separately as far as the > regression is concerned? > How are duplicate spots treated when performing the > normalizationBetweenArrays? Are they somehow treated together in the > scale normalization or independently? For the case of using quantile > normalization, is the number of quantiles the total number of spots on > the chip, or it is for the case of duplicate spotting, the number of > quantiles are n-spots/2, where each pair are adjusted together in some > way? The duplicate spots are kept separate in the normalization functions normalizeWithinArrays() and normalizeBetweenArrays(). Normalization is intended to remove systematic biases from the data. Some effects might vary locally on the array, affecting one of the duplicates but not the other, so it would seem reasonable to keep the duplicates separate so that the normalization has a chance to correct such a bias. The information from the duplicate spots can be summarised using lmFit() with the appropriate arguments. The approach taken in limma is to assume that the duplicate spots are correlated by being on the same array, a fixed distance apart (the function duplicateCorrelation() is used to estimate this correlation). An alternative approach would be to average the duplicate log-ratios prior to fitting the linear model. > For the case of duplicate spotting, what is the significance of > merging the raw channels seperately prior to creating MA values with > the loess normalization, then between chip scaling. I'm not sure what you mean here. There are usually two channels per array for two-colour microarrays. Do you mean create 4 channels per array, one for each duplicate set in each channel? I'm not sure that this would be helpful. > How many spots in a chip would be required to run quantile > normalization vs scale normalization when using normalizeBetweenArrays? The lower limit for quantile normalization is 2 spots, and for scale normalization it's 1 spot. Normalization is probably not such a big deal with so few spots though ;) > Thanks for any insight into this. I am not a statistician, so I am > unfamiliar with the ramifications of duplicate treatment in regression > and in normalization. > > Peter W. Best wishes, Matt Ritchie

ADD REPLY • link 20.9 years ago Matthew Ritchie ▴ 1000

0

Entering edit mode

At 08:05 PM 2/7/2005, Matthew Ritchie wrote: >The information from the duplicate spots can be summarised using lmFit() >with the appropriate arguments. The approach taken in limma is to assume >that the duplicate spots are correlated by being on the same array, a >fixed distance apart (the function duplicateCorrelation() is used to >estimate this correlation). An alternative approach would be to average >the duplicate log-ratios prior to fitting the linear model. > >>For the case of duplicate spotting, what is the significance of merging >>the raw channels seperately prior to creating MA values with the loess >>normalization, then between chip scaling. > >I'm not sure what you mean here. There are usually two channels per array >for two-colour microarrays. Do you mean create 4 channels per array, one >for each duplicate set in each channel? I'm not sure that this would be >helpful. Actually, my bad. I meant merging the duplicate spots WITHIN each raw channel seperately PRIOR to calculating the log-ratios (M-values). The duplicate spots on our arrays correlate very very well, to the point where I think that spotting probes twice seems wasteful (it would be better if the duplicate spots were randomly distributed or duplicate spotting to be meaningful IMHO, but the spotting technology is not capable of doing this f). I like the idea of using quantile scaling between chips, assuming n-spots for m-genes that will be fine. however when there are duplicate spots for each probe, each probe is adjusted independently, and when I compared the M values with the raw R and G channel duplicates, the correlation between the duplicate M-values was quite poor. I am expecting this is because the quantile normalization assumes that each duplicate-spot is handled separately. So my question is, do I gain or loose by merging the raw duplicate values within the R and G separately prior to calculating the M values. I am no expert in statistics to say whether or not this is acceptable. >>How many spots in a chip would be required to run quantile normalization >>vs scale normalization when using normalizeBetweenArrays? > >The lower limit for quantile normalization is 2 spots, and for scale >normalization it's 1 spot. Normalization is probably not such a big deal >with so few spots though ;) yes if I only had 2 good spots I would generally be unhappy with microarray. But it seems that I need to use scale normalisation for small chips, like 300 spots, and quantile for large arrays like 19k, because with such a large scale of points, scale normalization may force more genes into the tails of the distribution of M-values, if you were looking at the box-plots. Thanks for the help Peter

ADD REPLY • link 20.9 years ago Pita ▴ 120

0

Entering edit mode

Hi Peter, >> The information from the duplicate spots can be summarised using >> lmFit() with the appropriate arguments. The approach taken in limma >> is to assume that the duplicate spots are correlated by being on the >> same array, a fixed distance apart (the function >> duplicateCorrelation() is used to estimate this correlation). An >> alternative approach would be to average the duplicate log-ratios >> prior to fitting the linear model. >> >>> For the case of duplicate spotting, what is the significance of >>> merging the raw channels seperately prior to creating MA values with >>> the loess normalization, then between chip scaling. >> >> >> I'm not sure what you mean here. There are usually two channels per >> array for two-colour microarrays. Do you mean create 4 channels per >> array, one for each duplicate set in each channel? I'm not sure that >> this would be helpful. > > > > Actually, my bad. I meant merging the duplicate spots WITHIN each raw > channel seperately PRIOR to calculating the log-ratios (M-values). > The duplicate spots on our arrays correlate very very well, to the > point where I think that spotting probes twice seems wasteful (it > would be better if the duplicate spots were randomly distributed or > duplicate spotting to be meaningful IMHO, but the spotting technology > is not capable of doing this f). > > I like the idea of using quantile scaling between chips, assuming > n-spots for m-genes that will be fine. however when there are > duplicate spots for each probe, each probe is adjusted independently, > and when I compared the M values with the raw R and G channel > duplicates, the correlation between the duplicate M-values was quite > poor. I am expecting this is because the quantile normalization > assumes that each duplicate-spot is handled separately. > > So my question is, do I gain or loose by merging the raw duplicate > values within the R and G separately prior to calculating the M > values. I am no expert in statistics to say whether or not this is > acceptable. I'm not aware of any careful study that assesses whether it is better to 'merge' (I assume you mean average?) the raw R and G intensities from duplicate spots or keep them separate (this might be a research question for you). Obviously you won't be able to make use of the method I described to you in the previous email, where the duplicate correlation is used in the linear model. This approach has been studied and can offer improvements over averaging if you are assessing differential expression. For the reference see Smyth, G. K., Michaud, J., and Scott, H. (2005). The use of within- array replicate spots for assessing differential expression in microarray experiments. Bioinformatics 21, to appear. (available from http://www.statsci.org/smyth/pubs/dupcor.pdf) Best wishes, Matt Ritchie >>> How many spots in a chip would be required to run quantile >>> normalization vs scale normalization when using normalizeBetweenArrays? >> >> >> The lower limit for quantile normalization is 2 spots, and for scale >> normalization it's 1 spot. Normalization is probably not such a big >> deal with so few spots though ;) > > > yes if I only had 2 good spots I would generally be unhappy with > microarray. But it seems that I need to use scale normalisation for > small chips, like 300 spots, and quantile for large arrays like 19k, > because with such a large scale of points, scale normalization may > force more genes into the tails of the distribution of M-values, if > you were looking at the box-plots. > Thanks for the help > > Peter [[alternative HTML version deleted]]

ADD REPLY • link 20.9 years ago Matthew Ritchie ▴ 1000

Login before adding your answer.