Question: Array Set - Multiple Testing Problem
0
10.3 years ago by
Tefina Paloma220
Tefina Paloma220 wrote:
Dear all unfortunately I did not get any reply on my post, so thats why I am asking again, assuming that lots of people already came across that problem. Working with an array set ( cDNA or any single color platform) just means that the probes you are interested in, are spread out over more than one array (usually due to space limitations), So sample samples, but different features. But actually that kind of separation of the probes is rather random. The question arises at which level of the analysis the arrays should be aggregated. I think the normalization and also the model fitting should be done separately. But as we do not only consider contrasts within each array of the array set, but at the contrast, we want to look at the results of all arrays at the same time, the p-values must be adjusted somehow for this array-effect. To do this in a "global" manner similar to the "global method" of decide.tests will probably result in being overly conservative. Any suggestions? Best, Tefina [[alternative HTML version deleted]]
normalization • 522 views
modified 10.3 years ago by Sean Davis21k • written 10.3 years ago by Tefina Paloma220
Answer: Array Set - Multiple Testing Problem
0
10.3 years ago by
Sean Davis21k
United States
Sean Davis21k wrote:
On Fri, Sep 11, 2009 at 8:58 AM, Tefina Paloma <tefina.paloma@gmail.com>wrote: > Dear all > > unfortunately I did not get any reply on my post, so thats why I am asking > again, > assuming that lots of people already came across that problem. > > Working with an array set ( cDNA or any single color platform) just means > that the probes you are interested in, are spread out over more than one > array > (usually due to space limitations), > So sample samples, but different features. > > But actually that kind of separation of the probes is rather random. > The question arises at which level of the analysis the arrays should be > aggregated. > > I think the normalization and also the model fitting should be done > separately. > > But as we do not only consider contrasts within each array of the array > set, > but at the contrast, > we want to look at the results of all arrays at the same time, the p-values > must be adjusted somehow for > this array-effect. > > To do this in a "global" manner similar to the "global method" of > decide.tests will probably result in being overly > conservative. > > Any suggestions? > > Why not just normalize each array in the set separately and then combine the normalized data for analysis? I'm not sure I see why the arrays would need to be treated independently for analysis, assuming the technology was the same for each array in the set. Sean [[alternative HTML version deleted]]
To be able to fit the same model to all arrays, an additional between- array normalization would be necessary, so to make all the arrays really comparable and I don't want to over-normalize the data either..... therefore I just thought of an sensible p value adjustment 2009/9/11 Sean Davis <seandavi@gmail.com> > > > On Fri, Sep 11, 2009 at 8:58 AM, Tefina Paloma <tefina.paloma@gmail.com>wrote: > >> Dear all >> >> unfortunately I did not get any reply on my post, so thats why I am asking >> again, >> assuming that lots of people already came across that problem. >> >> Working with an array set ( cDNA or any single color platform) just means >> that the probes you are interested in, are spread out over more than one >> array >> (usually due to space limitations), >> So sample samples, but different features. >> >> But actually that kind of separation of the probes is rather random. >> The question arises at which level of the analysis the arrays should be >> aggregated. >> >> I think the normalization and also the model fitting should be done >> separately. >> >> But as we do not only consider contrasts within each array of the array >> set, >> but at the contrast, >> we want to look at the results of all arrays at the same time, the >> p-values >> must be adjusted somehow for >> this array-effect. >> >> To do this in a "global" manner similar to the "global method" of >> decide.tests will probably result in being overly >> conservative. >> >> Any suggestions? >> >> > Why not just normalize each array in the set separately and then combine > the normalized data for analysis? I'm not sure I see why the arrays would > need to be treated independently for analysis, assuming the technology was > the same for each array in the set. > > Sean > > [[alternative HTML version deleted]]
On Fri, Sep 11, 2009 at 9:47 AM, Tefina Paloma <tefina.paloma@gmail.com>wrote: > To be able to fit the same model to all arrays, an additional between-array > normalization would be necessary, so to make all the arrays really > comparable > and I don't want to over-normalize the data either..... > > therefore I just thought of an sensible p value adjustment > > You can adjust the entire list of p-values from all lists, if you like, as an alternative. However, assuming that the arrays are of the same technology, the probe-level variances should be similar, so you could also combine the normalized data. I'm not sure what "model" you mean, as each test is done within a probe and, therefore, would not cross arrays. But I may have misunderstood what you are trying to do. Sean > 2009/9/11 Sean Davis <seandavi@gmail.com> > > > > > > > On Fri, Sep 11, 2009 at 8:58 AM, Tefina Paloma <tefina.paloma@gmail.com> >wrote: > > > >> Dear all > >> > >> unfortunately I did not get any reply on my post, so thats why I am > asking > >> again, > >> assuming that lots of people already came across that problem. > >> > >> Working with an array set ( cDNA or any single color platform) just > means > >> that the probes you are interested in, are spread out over more than one > >> array > >> (usually due to space limitations), > >> So sample samples, but different features. > >> > >> But actually that kind of separation of the probes is rather random. > >> The question arises at which level of the analysis the arrays should be > >> aggregated. > >> > >> I think the normalization and also the model fitting should be done > >> separately. > >> > >> But as we do not only consider contrasts within each array of the array > >> set, > >> but at the contrast, > >> we want to look at the results of all arrays at the same time, the > >> p-values > >> must be adjusted somehow for > >> this array-effect. > >> > >> To do this in a "global" manner similar to the "global method" of > >> decide.tests will probably result in being overly > >> conservative. > >> > >> Any suggestions? > >> > >> > > Why not just normalize each array in the set separately and then combine > > the normalized data for analysis? I'm not sure I see why the arrays > would > > need to be treated independently for analysis, assuming the technology > was > > the same for each array in the set. > > > > Sean > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
On Fri, Sep 11, 2009 at 11:20 AM, Sean Davis <seandavi@gmail.com> wrote: > > > On Fri, Sep 11, 2009 at 9:47 AM, Tefina Paloma <tefina.paloma@gmail.com>wrote: > >> To be able to fit the same model to all arrays, an additional >> between-array >> normalization would be necessary, so to make all the arrays really >> comparable >> and I don't want to over-normalize the data either..... >> >> therefore I just thought of an sensible p value adjustment >> >> > You can adjust the entire list of p-values from all lists, if you like, as > an alternative. However, assuming that the arrays are of the same > technology, the probe-level variances should be similar, so you could also > combine the normalized data. I'm not sure what "model" you mean, as each > test is done within a probe and, therefore, would not cross arrays. But I > may have misunderstood what you are trying to do. > > I made a further assumption above, which I should probably make explicit. While the array technology is important in determing the variance, the biologic behavior of the probes on the array contributes, also. If the biologic behavior of probes on one array is expected to be "different" in some way, then the assumption of approximately equal variance will be violated. Then I agree that doing an analysis "within array" is the best way to go. Sean > 2009/9/11 Sean Davis <seandavi@gmail.com> >> >> > >> > >> > On Fri, Sep 11, 2009 at 8:58 AM, Tefina Paloma <tefina.paloma@gmail.com>> >wrote: >> > >> >> Dear all >> >> >> >> unfortunately I did not get any reply on my post, so thats why I am >> asking >> >> again, >> >> assuming that lots of people already came across that problem. >> >> >> >> Working with an array set ( cDNA or any single color platform) just >> means >> >> that the probes you are interested in, are spread out over more than >> one >> >> array >> >> (usually due to space limitations), >> >> So sample samples, but different features. >> >> >> >> But actually that kind of separation of the probes is rather random. >> >> The question arises at which level of the analysis the arrays should be >> >> aggregated. >> >> >> >> I think the normalization and also the model fitting should be done >> >> separately. >> >> >> >> But as we do not only consider contrasts within each array of the array >> >> set, >> >> but at the contrast, >> >> we want to look at the results of all arrays at the same time, the >> >> p-values >> >> must be adjusted somehow for >> >> this array-effect. >> >> >> >> To do this in a "global" manner similar to the "global method" of >> >> decide.tests will probably result in being overly >> >> conservative. >> >> >> >> Any suggestions? >> >> >> >> >> > Why not just normalize each array in the set separately and then combine >> > the normalized data for analysis? I'm not sure I see why the arrays >> would >> > need to be treated independently for analysis, assuming the technology >> was >> > the same for each array in the set. >> > >> > Sean >> > >> > >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]]
Hi Sean, On Sep 11, 2009, at 11:44 AM, Sean Davis wrote: > On Fri, Sep 11, 2009 at 11:20 AM, Sean Davis <seandavi at="" gmail.com=""> > wrote: > >> >> >> On Fri, Sep 11, 2009 at 9:47 AM, Tefina Paloma <tefina.paloma at="" gmail.com="">> >wrote: >> >>> To be able to fit the same model to all arrays, an additional >>> between-array >>> normalization would be necessary, so to make all the arrays really >>> comparable >>> and I don't want to over-normalize the data either..... >>> >>> therefore I just thought of an sensible p value adjustment >>> >>> >> You can adjust the entire list of p-values from all lists, if you >> like, as >> an alternative. However, assuming that the arrays are of the same >> technology, the probe-level variances should be similar, so you >> could also >> combine the normalized data. I'm not sure what "model" you mean, >> as each >> test is done within a probe and, therefore, would not cross >> arrays. But I >> may have misunderstood what you are trying to do. >> >> > I made a further assumption above, which I should probably make > explicit. > While the array technology is important in determing the variance, the > biologic behavior of the probes on the array contributes, also. Sorry if this is too noob-ish of a question, but I'm curious about your choice of words. Could you explain this point a bit further? It sounds like you are referring to the actual probes that are synthesized onto the array, no? What biologic behavior do you expect these probes to have? Are you referring to them forming some secondary structure or something? If so, why would one expect some explicitly differing behavior between the same probes on different arrays (assuming no array impurities and the arrays were performed using the same protocol, or whatever). Just curious, thanks ... -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
On Fri, Sep 11, 2009 at 11:56 AM, Steve Lianoglou < mailinglist.honeypot@gmail.com> wrote: > Hi Sean, > > On Sep 11, 2009, at 11:44 AM, Sean Davis wrote: > > On Fri, Sep 11, 2009 at 11:20 AM, Sean Davis <seandavi@gmail.com> wrote: >> >> >>> >>> On Fri, Sep 11, 2009 at 9:47 AM, Tefina Paloma <tefina.paloma@gmail.com>>> >wrote: >>> >>> To be able to fit the same model to all arrays, an additional >>>> between-array >>>> normalization would be necessary, so to make all the arrays really >>>> comparable >>>> and I don't want to over-normalize the data either..... >>>> >>>> therefore I just thought of an sensible p value adjustment >>>> >>>> >>>> You can adjust the entire list of p-values from all lists, if you like, >>> as >>> an alternative. However, assuming that the arrays are of the same >>> technology, the probe-level variances should be similar, so you could >>> also >>> combine the normalized data. I'm not sure what "model" you mean, as each >>> test is done within a probe and, therefore, would not cross arrays. But >>> I >>> may have misunderstood what you are trying to do. >>> >>> >>> I made a further assumption above, which I should probably make >> explicit. >> While the array technology is important in determing the variance, the >> biologic behavior of the probes on the array contributes, also. >> > > Sorry if this is too noob-ish of a question, but I'm curious about your > choice of words. Could you explain this point a bit further? It sounds like > you are referring to the actual probes that are synthesized onto the array, > no? > > What biologic behavior do you expect these probes to have? Are you > referring to them forming some secondary structure or something? If so, why > would one expect some explicitly differing behavior between the same probes > on different arrays (assuming no array impurities and the arrays were > performed using the same protocol, or whatever). > > The classic example that I can think of is the hgu133a and b where the probes on the a array were "refseq-based" and so represented well- validated genes while the probes on the b array were generally ESTs and, being less "qualified" as probesets, had much different error qualities than those on the a array. If using something like limma or SAM that has some sort of "variance pooling", the variances will be inflated in one array of the set and decreased in the other array of the set. I hope that helps. I have done a particularly bad job of explaining myself above--sorry about confusion. Sean [[alternative HTML version deleted]]
Howdy, On Sep 11, 2009, at 12:01 PM, Sean Davis wrote: > The classic example that I can think of is the hgu133a and b where > the probes on the a array were "refseq-based" and so represented > well-validated genes while the probes on the b array were generally > ESTs and, being less "qualified" as probesets, had much different > error qualities than those on the a array. If using something like > limma or SAM that has some sort of "variance pooling", the variances > will be inflated in one array of the set and decreased in the other > array of the set. Wow ... I never used them, but I didn't know that part of hgu133*'s history ... thanks for the lesson! > I hope that helps. I have done a particularly bad job of explaining > myself above--sorry about confusion. Sure it helped, thanks. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact