Array Set - Multiple Testing Problem

0

Entering edit mode

Tefina Paloma ▴ 220

@tefina-paloma-3676

Last seen 9.6 years ago

Dear all unfortunately I did not get any reply on my post, so thats why I am asking again, assuming that lots of people already came across that problem. Working with an array set ( cDNA or any single color platform) just means that the probes you are interested in, are spread out over more than one array (usually due to space limitations), So sample samples, but different features. But actually that kind of separation of the probes is rather random. The question arises at which level of the analysis the arrays should be aggregated. I think the normalization and also the model fitting should be done separately. But as we do not only consider contrasts within each array of the array set, but at the contrast, we want to look at the results of all arrays at the same time, the p-values must be adjusted somehow for this array-effect. To do this in a "global" manner similar to the "global method" of decide.tests will probably result in being overly conservative. Any suggestions? Best, Tefina [[alternative HTML version deleted]]

Normalization Normalization • 1.3k views

ADD COMMENT • link updated 14.6 years ago by Sean Davis 21k • written 14.6 years ago by Tefina Paloma ▴ 220

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 3 months ago

United States

On Fri, Sep 11, 2009 at 8:58 AM, Tefina Paloma <tefina.paloma@gmail.com>wrote: > Dear all > > unfortunately I did not get any reply on my post, so thats why I am asking > again, > assuming that lots of people already came across that problem. > > Working with an array set ( cDNA or any single color platform) just means > that the probes you are interested in, are spread out over more than one > array > (usually due to space limitations), > So sample samples, but different features. > > But actually that kind of separation of the probes is rather random. > The question arises at which level of the analysis the arrays should be > aggregated. > > I think the normalization and also the model fitting should be done > separately. > > But as we do not only consider contrasts within each array of the array > set, > but at the contrast, > we want to look at the results of all arrays at the same time, the p-values > must be adjusted somehow for > this array-effect. > > To do this in a "global" manner similar to the "global method" of > decide.tests will probably result in being overly > conservative. > > Any suggestions? > > Why not just normalize each array in the set separately and then combine the normalized data for analysis? I'm not sure I see why the arrays would need to be treated independently for analysis, assuming the technology was the same for each array in the set. Sean [[alternative HTML version deleted]]

ADD COMMENT • link 14.6 years ago Sean Davis 21k

0

Entering edit mode

To be able to fit the same model to all arrays, an additional between- array normalization would be necessary, so to make all the arrays really comparable and I don't want to over-normalize the data either..... therefore I just thought of an sensible p value adjustment 2009/9/11 Sean Davis <seandavi@gmail.com> > > > On Fri, Sep 11, 2009 at 8:58 AM, Tefina Paloma <tefina.paloma@gmail.com>wrote: > >> Dear all >> >> unfortunately I did not get any reply on my post, so thats why I am asking >> again, >> assuming that lots of people already came across that problem. >> >> Working with an array set ( cDNA or any single color platform) just means >> that the probes you are interested in, are spread out over more than one >> array >> (usually due to space limitations), >> So sample samples, but different features. >> >> But actually that kind of separation of the probes is rather random. >> The question arises at which level of the analysis the arrays should be >> aggregated. >> >> I think the normalization and also the model fitting should be done >> separately. >> >> But as we do not only consider contrasts within each array of the array >> set, >> but at the contrast, >> we want to look at the results of all arrays at the same time, the >> p-values >> must be adjusted somehow for >> this array-effect. >> >> To do this in a "global" manner similar to the "global method" of >> decide.tests will probably result in being overly >> conservative. >> >> Any suggestions? >> >> > Why not just normalize each array in the set separately and then combine > the normalized data for analysis? I'm not sure I see why the arrays would > need to be treated independently for analysis, assuming the technology was > the same for each array in the set. > > Sean > > [[alternative HTML version deleted]]

ADD REPLY • link 14.6 years ago Tefina Paloma ▴ 220

0

Entering edit mode

On Fri, Sep 11, 2009 at 9:47 AM, Tefina Paloma <tefina.paloma@gmail.com>wrote: > To be able to fit the same model to all arrays, an additional between-array > normalization would be necessary, so to make all the arrays really > comparable > and I don't want to over-normalize the data either..... > > therefore I just thought of an sensible p value adjustment > > You can adjust the entire list of p-values from all lists, if you like, as an alternative. However, assuming that the arrays are of the same technology, the probe-level variances should be similar, so you could also combine the normalized data. I'm not sure what "model" you mean, as each test is done within a probe and, therefore, would not cross arrays. But I may have misunderstood what you are trying to do. Sean > 2009/9/11 Sean Davis <seandavi@gmail.com> > > > > > > > On Fri, Sep 11, 2009 at 8:58 AM, Tefina Paloma <tefina.paloma@gmail.com> >wrote: > > > >> Dear all > >> > >> unfortunately I did not get any reply on my post, so thats why I am > asking > >> again, > >> assuming that lots of people already came across that problem. > >> > >> Working with an array set ( cDNA or any single color platform) just > means > >> that the probes you are interested in, are spread out over more than one > >> array > >> (usually due to space limitations), > >> So sample samples, but different features. > >> > >> But actually that kind of separation of the probes is rather random. > >> The question arises at which level of the analysis the arrays should be > >> aggregated. > >> > >> I think the normalization and also the model fitting should be done > >> separately. > >> > >> But as we do not only consider contrasts within each array of the array > >> set, > >> but at the contrast, > >> we want to look at the results of all arrays at the same time, the > >> p-values > >> must be adjusted somehow for > >> this array-effect. > >> > >> To do this in a "global" manner similar to the "global method" of > >> decide.tests will probably result in being overly > >> conservative. > >> > >> Any suggestions? > >> > >> > > Why not just normalize each array in the set separately and then combine > > the normalized data for analysis? I'm not sure I see why the arrays > would > > need to be treated independently for analysis, assuming the technology > was > > the same for each array in the set. > > > > Sean > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 14.6 years ago Sean Davis 21k

0

Entering edit mode

On Fri, Sep 11, 2009 at 11:20 AM, Sean Davis <seandavi@gmail.com> wrote: > > > On Fri, Sep 11, 2009 at 9:47 AM, Tefina Paloma <tefina.paloma@gmail.com>wrote: > >> To be able to fit the same model to all arrays, an additional >> between-array >> normalization would be necessary, so to make all the arrays really >> comparable >> and I don't want to over-normalize the data either..... >> >> therefore I just thought of an sensible p value adjustment >> >> > You can adjust the entire list of p-values from all lists, if you like, as > an alternative. However, assuming that the arrays are of the same > technology, the probe-level variances should be similar, so you could also > combine the normalized data. I'm not sure what "model" you mean, as each > test is done within a probe and, therefore, would not cross arrays. But I > may have misunderstood what you are trying to do. > > I made a further assumption above, which I should probably make explicit. While the array technology is important in determing the variance, the biologic behavior of the probes on the array contributes, also. If the biologic behavior of probes on one array is expected to be "different" in some way, then the assumption of approximately equal variance will be violated. Then I agree that doing an analysis "within array" is the best way to go. Sean > 2009/9/11 Sean Davis <seandavi@gmail.com> >> >> > >> > >> > On Fri, Sep 11, 2009 at 8:58 AM, Tefina Paloma <tefina.paloma@gmail.com>> >wrote: >> > >> >> Dear all >> >> >> >> unfortunately I did not get any reply on my post, so thats why I am >> asking >> >> again, >> >> assuming that lots of people already came across that problem. >> >> >> >> Working with an array set ( cDNA or any single color platform) just >> means >> >> that the probes you are interested in, are spread out over more than >> one >> >> array >> >> (usually due to space limitations), >> >> So sample samples, but different features. >> >> >> >> But actually that kind of separation of the probes is rather random. >> >> The question arises at which level of the analysis the arrays should be >> >> aggregated. >> >> >> >> I think the normalization and also the model fitting should be done >> >> separately. >> >> >> >> But as we do not only consider contrasts within each array of the array >> >> set, >> >> but at the contrast, >> >> we want to look at the results of all arrays at the same time, the >> >> p-values >> >> must be adjusted somehow for >> >> this array-effect. >> >> >> >> To do this in a "global" manner similar to the "global method" of >> >> decide.tests will probably result in being overly >> >> conservative. >> >> >> >> Any suggestions? >> >> >> >> >> > Why not just normalize each array in the set separately and then combine >> > the normalized data for analysis? I'm not sure I see why the arrays >> would >> > need to be treated independently for analysis, assuming the technology >> was >> > the same for each array in the set. >> > >> > Sean >> > >> > >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]]

ADD REPLY • link 14.6 years ago Sean Davis 21k

0

Entering edit mode

Hi Sean, On Sep 11, 2009, at 11:44 AM, Sean Davis wrote: > On Fri, Sep 11, 2009 at 11:20 AM, Sean Davis <seandavi at="" gmail.com=""> > wrote: > >> >> >> On Fri, Sep 11, 2009 at 9:47 AM, Tefina Paloma <tefina.paloma at="" gmail.com="">> >wrote: >> >>> To be able to fit the same model to all arrays, an additional >>> between-array >>> normalization would be necessary, so to make all the arrays really >>> comparable >>> and I don't want to over-normalize the data either..... >>> >>> therefore I just thought of an sensible p value adjustment >>> >>> >> You can adjust the entire list of p-values from all lists, if you >> like, as >> an alternative. However, assuming that the arrays are of the same >> technology, the probe-level variances should be similar, so you >> could also >> combine the normalized data. I'm not sure what "model" you mean, >> as each >> test is done within a probe and, therefore, would not cross >> arrays. But I >> may have misunderstood what you are trying to do. >> >> > I made a further assumption above, which I should probably make > explicit. > While the array technology is important in determing the variance, the > biologic behavior of the probes on the array contributes, also. Sorry if this is too noob-ish of a question, but I'm curious about your choice of words. Could you explain this point a bit further? It sounds like you are referring to the actual probes that are synthesized onto the array, no? What biologic behavior do you expect these probes to have? Are you referring to them forming some secondary structure or something? If so, why would one expect some explicitly differing behavior between the same probes on different arrays (assuming no array impurities and the arrays were performed using the same protocol, or whatever). Just curious, thanks ... -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 14.6 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

On Fri, Sep 11, 2009 at 11:56 AM, Steve Lianoglou < mailinglist.honeypot@gmail.com> wrote: > Hi Sean, > > On Sep 11, 2009, at 11:44 AM, Sean Davis wrote: > > On Fri, Sep 11, 2009 at 11:20 AM, Sean Davis <seandavi@gmail.com> wrote: >> >> >>> >>> On Fri, Sep 11, 2009 at 9:47 AM, Tefina Paloma <tefina.paloma@gmail.com>>> >wrote: >>> >>> To be able to fit the same model to all arrays, an additional >>>> between-array >>>> normalization would be necessary, so to make all the arrays really >>>> comparable >>>> and I don't want to over-normalize the data either..... >>>> >>>> therefore I just thought of an sensible p value adjustment >>>> >>>> >>>> You can adjust the entire list of p-values from all lists, if you like, >>> as >>> an alternative. However, assuming that the arrays are of the same >>> technology, the probe-level variances should be similar, so you could >>> also >>> combine the normalized data. I'm not sure what "model" you mean, as each >>> test is done within a probe and, therefore, would not cross arrays. But >>> I >>> may have misunderstood what you are trying to do. >>> >>> >>> I made a further assumption above, which I should probably make >> explicit. >> While the array technology is important in determing the variance, the >> biologic behavior of the probes on the array contributes, also. >> > > Sorry if this is too noob-ish of a question, but I'm curious about your > choice of words. Could you explain this point a bit further? It sounds like > you are referring to the actual probes that are synthesized onto the array, > no? > > What biologic behavior do you expect these probes to have? Are you > referring to them forming some secondary structure or something? If so, why > would one expect some explicitly differing behavior between the same probes > on different arrays (assuming no array impurities and the arrays were > performed using the same protocol, or whatever). > > The classic example that I can think of is the hgu133a and b where the probes on the a array were "refseq-based" and so represented well- validated genes while the probes on the b array were generally ESTs and, being less "qualified" as probesets, had much different error qualities than those on the a array. If using something like limma or SAM that has some sort of "variance pooling", the variances will be inflated in one array of the set and decreased in the other array of the set. I hope that helps. I have done a particularly bad job of explaining myself above--sorry about confusion. Sean [[alternative HTML version deleted]]

ADD REPLY • link 14.6 years ago Sean Davis 21k

0

Entering edit mode

Howdy, On Sep 11, 2009, at 12:01 PM, Sean Davis wrote: > The classic example that I can think of is the hgu133a and b where > the probes on the a array were "refseq-based" and so represented > well-validated genes while the probes on the b array were generally > ESTs and, being less "qualified" as probesets, had much different > error qualities than those on the a array. If using something like > limma or SAM that has some sort of "variance pooling", the variances > will be inflated in one array of the set and decreased in the other > array of the set. Wow ... I never used them, but I didn't know that part of hgu133*'s history ... thanks for the lesson! > I hope that helps. I have done a particularly bad job of explaining > myself above--sorry about confusion. Sure it helped, thanks. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 14.6 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Hello, I might have misunderstood something, but assuming such an array set consists of k arrays, wouldn't it be the easiest to perform k normalizations and analyses which give you k lists of p-values for k (non-overlapping (I assume!?)) sets of genes. To adjust for multiple testing you need to bind those k lists together to one long vector of p-values and apply p.adjust or whatever function is your favourite one. This saves you from normalizing between arrays that have different genes on them etc. and seems very easy to do. Claus > -----Original Message----- > From: bioconductor-bounces at stat.math.ethz.ch [mailto:bioconductor- > bounces at stat.math.ethz.ch] On Behalf Of Tefina Paloma > Sent: 11 September 2009 14:48 > To: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] Array Set - Multiple Testing Problem > > To be able to fit the same model to all arrays, an additional between- > array > normalization would be necessary, so to make all the arrays really > comparable > and I don't want to over-normalize the data either..... > > therefore I just thought of an sensible p value adjustment > > 2009/9/11 Sean Davis <seandavi at="" gmail.com=""> > > > > > > > On Fri, Sep 11, 2009 at 8:58 AM, Tefina Paloma > <tefina.paloma at="" gmail.com="">wrote: > > > >> Dear all > >> > >> unfortunately I did not get any reply on my post, so thats why I am > asking > >> again, > >> assuming that lots of people already came across that problem. > >> > >> Working with an array set ( cDNA or any single color platform) just > means > >> that the probes you are interested in, are spread out over more than > one > >> array > >> (usually due to space limitations), > >> So sample samples, but different features. > >> > >> But actually that kind of separation of the probes is rather random. > >> The question arises at which level of the analysis the arrays should be > >> aggregated. > >> > >> I think the normalization and also the model fitting should be done > >> separately. > >> > >> But as we do not only consider contrasts within each array of the array > >> set, > >> but at the contrast, > >> we want to look at the results of all arrays at the same time, the > >> p-values > >> must be adjusted somehow for > >> this array-effect. > >> > >> To do this in a "global" manner similar to the "global method" of > >> decide.tests will probably result in being overly > >> conservative. > >> > >> Any suggestions? > >> > >> > > Why not just normalize each array in the set separately and then combine > > the normalized data for analysis? I'm not sure I see why the arrays > would > > need to be treated independently for analysis, assuming the technology > was > > the same for each array in the set. > > > > Sean > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor The University of Aberdeen is a charity registered in Scotland, No SC013683.

ADD REPLY • link 14.6 years ago Mayer, Claus-Dieter ▴ 120

0

Entering edit mode

Hello, first of all, thanks for all the answers. Unfortunately I do not have information about the exact probes- behaviour. The library was in-house selected and surely does not represent an uniform-behaving set of genes (like in the example of the hgu133a and b chip) In my special case we are talking of cDNA arrays, unfortunately the quality is not very consistent and the arrays behave very different. So, I do have doubts about combining the data after normalization, the quality is just not good enough. Another issue is that in 2 arrays of the array set about a quarter of the spots are the same. But only in these 2 arrays. I still have to sort out how to deal with this. I think the "safest approach" would be adjusting all p values from all lists together, I am curious if this will work..(so if this approach will leave me with some significant p values, (assuming that there is an effect), or if the number of tests will be just too large) Best, Tefina 2009/9/11 Mayer, Claus-Dieter <c.mayer@abdn.ac.uk> > Hello, > > I might have misunderstood something, but assuming such an array set > consists of k arrays, wouldn't it be the easiest to perform k normalizations > and analyses which give you k lists of p-values for k (non- overlapping (I > assume!?)) sets of genes. > > To adjust for multiple testing you need to bind those k lists together to > one long vector of p-values and apply p.adjust or whatever function is your > favourite one. > > This saves you from normalizing between arrays that have different genes on > them etc. and seems very easy to do. > > Claus > > > -----Original Message----- > > From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor- > > bounces@stat.math.ethz.ch] On Behalf Of Tefina Paloma > > Sent: 11 September 2009 14:48 > > To: bioconductor@stat.math.ethz.ch > > Subject: Re: [BioC] Array Set - Multiple Testing Problem > > > > To be able to fit the same model to all arrays, an additional between- > > array > > normalization would be necessary, so to make all the arrays really > > comparable > > and I don't want to over-normalize the data either..... > > > > therefore I just thought of an sensible p value adjustment > > > > 2009/9/11 Sean Davis <seandavi@gmail.com> > > > > > > > > > > > On Fri, Sep 11, 2009 at 8:58 AM, Tefina Paloma > > <tefina.paloma@gmail.com>wrote: > > > > > >> Dear all > > >> > > >> unfortunately I did not get any reply on my post, so thats why I am > > asking > > >> again, > > >> assuming that lots of people already came across that problem. > > >> > > >> Working with an array set ( cDNA or any single color platform) just > > means > > >> that the probes you are interested in, are spread out over more than > > one > > >> array > > >> (usually due to space limitations), > > >> So sample samples, but different features. > > >> > > >> But actually that kind of separation of the probes is rather random. > > >> The question arises at which level of the analysis the arrays should > be > > >> aggregated. > > >> > > >> I think the normalization and also the model fitting should be done > > >> separately. > > >> > > >> But as we do not only consider contrasts within each array of the > array > > >> set, > > >> but at the contrast, > > >> we want to look at the results of all arrays at the same time, the > > >> p-values > > >> must be adjusted somehow for > > >> this array-effect. > > >> > > >> To do this in a "global" manner similar to the "global method" of > > >> decide.tests will probably result in being overly > > >> conservative. > > >> > > >> Any suggestions? > > >> > > >> > > > Why not just normalize each array in the set separately and then > combine > > > the normalized data for analysis? I'm not sure I see why the arrays > > would > > > need to be treated independently for analysis, assuming the technology > > was > > > the same for each array in the set. > > > > > > Sean > > > > > > > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > The University of Aberdeen is a charity registered in Scotland, No > SC013683. > [[alternative HTML version deleted]]

ADD REPLY • link 14.6 years ago Tefina Paloma ▴ 220

Login before adding your answer.