Normalized microarray data and meta-analysis
4
0
Entering edit mode
@mcmahon-kevin-3198
Last seen 10.2 years ago
Hello Bioconductor-inos, I have more of a statistical/philosophical question regarding using raw vs. normalized data in a microarray meta-analysis. I've looked through the bioconductor archives and have found some addressing of this issue, but not exactly what I'm concerned with. I don't mean to waste anyone's time, but I was hoping I could get some help here. I've performed a meta-analysis using the downloaded data from 3 different GEO data sets (GDS). It is my understanding that these are normalized data from the various microarray experiments. Seems to me that the data from those normalized results are normally distributed, those three experiments are perfectly comparable (if you think the author's respective normalization approaches were reasonable). All you need to do is calculate some sort of effect size/determine a p-value/etc. for all genes in the experimental conditions of interest and then combine these statistics across the different experiments. However, I consistently read things like "raw data are required for a microarray meta-analysis." Does this mean that normalized data are not directly comparable with eachother? If so, then why does GEO even host such data? Any help would be wonderful! Wyatt K. Wyatt McMahon, Ph.D. Texas Tech University Health Sciences Center Department of Internal Medicine 3601 4th St. Lubbock, TX - 79430 806-743-4072 "It's been a good year in the lab when three things work. . . and one of those is the lights." - Tom Maniatis [[alternative HTML version deleted]]
Microarray Normalization Microarray Normalization • 2.3k views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 3 months ago
United States
On Wed, Dec 17, 2008 at 5:31 PM, Mcmahon, Kevin <kwyatt.mcmahon at="" ttuhsc.edu=""> wrote: > Hello Bioconductor-inos, > > > > I have more of a statistical/philosophical question regarding using raw > vs. normalized data in a microarray meta-analysis. I've looked through > the bioconductor archives and have found some addressing of this issue, > but not exactly what I'm concerned with. I don't mean to waste anyone's > time, but I was hoping I could get some help here. > > > > I've performed a meta-analysis using the downloaded data from 3 > different GEO data sets (GDS). It is my understanding that these are > normalized data from the various microarray experiments. Seems to me > that the data from those normalized results are normally distributed, > those three experiments are perfectly comparable (if you think the > author's respective normalization approaches were reasonable). All you > need to do is calculate some sort of effect size/determine a > p-value/etc. for all genes in the experimental conditions of interest > and then combine these statistics across the different experiments. > However, I consistently read things like "raw data are required for a > microarray meta-analysis." Does this mean that normalized data are not > directly comparable with eachother? If so, then why does GEO even host > such data? It depends entirely on what you want to do with the data. However, I think that many people like to have the raw data, not for normalization purposes only, but for quality control, also. Sean
ADD COMMENT
0
Entering edit mode
@thomas-hampton-2820
Last seen 10.2 years ago
The question, I think, has to do with what sort of comparisons you plan to make. When people normalize using RMA, each slide ends up with a common distribution -- the only variable being how the elements of the distribution map to probes on any given slide. This is already some pretty hairy normalization, but it seems to work ok for lining up arrays done by the same people at the same time and place so that you can meaningfully compare expression values head to head, calculate averages, and do significance tests. With or without raw data, the idea of a meaningful direct comparisons between of say, an expression value of 7.5 in one lab with an expression value of 8.3 in another seem very optimistic to me. Saying something like gene X was in the top 1% in expression in both cases seems pretty reasonable... Tom On Dec 17, 2008, at 5:31 PM, Mcmahon, Kevin wrote: > Hello Bioconductor-inos, > > > > I have more of a statistical/philosophical question regarding using > raw > vs. normalized data in a microarray meta-analysis. I've looked > through > the bioconductor archives and have found some addressing of this > issue, > but not exactly what I'm concerned with. I don't mean to waste > anyone's > time, but I was hoping I could get some help here. > > > > I've performed a meta-analysis using the downloaded data from 3 > different GEO data sets (GDS). It is my understanding that these are > normalized data from the various microarray experiments. Seems to me > that the data from those normalized results are normally distributed, > those three experiments are perfectly comparable (if you think the > author's respective normalization approaches were reasonable). > All you > need to do is calculate some sort of effect size/determine a > p-value/etc. for all genes in the experimental conditions of interest > and then combine these statistics across the different experiments. > However, I consistently read things like "raw data are required for a > microarray meta-analysis." Does this mean that normalized data are > not > directly comparable with eachother? If so, then why does GEO even > host > such data? > > > > Any help would be wonderful! > > > > Wyatt > > > > K. Wyatt McMahon, Ph.D. > > Texas Tech University Health Sciences Center > > Department of Internal Medicine > > 3601 4th St. > > Lubbock, TX - 79430 > > 806-743-4072 > > "It's been a good year in the lab when three things work. . . and > one of > those is the lights." - Tom Maniatis > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/ > gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Scott and Thomas, First of all, thanks so much for your prompt replies! I'm sorry I didn't include more details about precisely what I was trying to do. What I'm trying to do is find those genes that are "consistently differentially expressed" in my experimental condition of interest. To do this, I'm largely following the approach of Mulligan et.al., 2006, PNAS 103(16), 6368-73. They calculated an effect size (Cohen's d-statistic, which is the t-statistic for untreated vs. treated comparison times 2 and divided by the square root of the degrees of freedom) for all genes in multiple different experiments, and then took the average d-statistic across all experiments and used a z-test to determine if the mean effect size was not equal to 0. Following multiple testing adjustment, those with a p-value of <0.05 were considered consistently differentially expressed. Do you think I need raw data for this? Unfortunately, one of the groups whose experiment I'm trying to use have lost their raw data, precluding me from having the raw data for all experiments. I understand that I'm making assumptions about the quality of the arrays; but apart from that, do you think this is a reasonable approach? Thanks again in advance, Wyatt K. Wyatt McMahon, Ph.D. Texas Tech University Health Sciences Center Department of Internal Medicine 3601 4th St. Lubbock, TX - 79430 806-743-4072 "It's been a good year in the lab when three things work. . . and one of those is the lights." - Tom Maniatis > -----Original Message----- > From: Thomas Hampton [mailto:Thomas.H.Hampton at Dartmouth.edu] > Sent: Wednesday, December 17, 2008 5:02 PM > To: Mcmahon, Kevin > Cc: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] Normalized microarray data and meta-analysis > > The question, I think, has to do with what sort of comparisons you > plan to > make. When people normalize using RMA, each slide ends up with a common > distribution -- the only variable being how the elements of the > distribution map > to probes on any given slide. This is already some pretty hairy > normalization, > but it seems to work ok for lining up arrays done by the same people > at the same > time and place so that you can meaningfully compare expression values > head to > head, calculate averages, and do significance tests. > > With or without raw data, the idea of a meaningful direct comparisons > between of say, an > expression value of 7.5 in one lab with an expression value of 8.3 in > another > seem very optimistic to me. > > Saying something like gene X was in the top 1% in expression in both > cases seems > pretty reasonable... > > Tom > > > On Dec 17, 2008, at 5:31 PM, Mcmahon, Kevin wrote: > > > Hello Bioconductor-inos, > > > > > > > > I have more of a statistical/philosophical question regarding using > > raw > > vs. normalized data in a microarray meta-analysis. I've looked > > through > > the bioconductor archives and have found some addressing of this > > issue, > > but not exactly what I'm concerned with. I don't mean to waste > > anyone's > > time, but I was hoping I could get some help here. > > > > > > > > I've performed a meta-analysis using the downloaded data from 3 > > different GEO data sets (GDS). It is my understanding that these are > > normalized data from the various microarray experiments. Seems to me > > that the data from those normalized results are normally > distributed, > > those three experiments are perfectly comparable (if you think the > > author's respective normalization approaches were reasonable). > > All you > > need to do is calculate some sort of effect size/determine a > > p-value/etc. for all genes in the experimental conditions of interest > > and then combine these statistics across the different experiments. > > However, I consistently read things like "raw data are required for a > > microarray meta-analysis." Does this mean that normalized data are > > not > > directly comparable with eachother? If so, then why does GEO even > > host > > such data? > > > > > > > > Any help would be wonderful! > > > > > > > > Wyatt > > > > > > > > K. Wyatt McMahon, Ph.D. > > > > Texas Tech University Health Sciences Center > > > > Department of Internal Medicine > > > > 3601 4th St. > > > > Lubbock, TX - 79430 > > > > 806-743-4072 > > > > "It's been a good year in the lab when three things work. . . and > > one of > > those is the lights." - Tom Maniatis > > > > > > > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/ > > gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
@mayer-claus-dieter-3184
Last seen 10.2 years ago
Dear Kevin, that is a difficult question indeed. I am not sure what type of microarrays we are talking about here, but if it were Affy arrays then normalisation methods like RMA or GCRMA perform an "across array" normalisation step, i.e. the normalised data from the same study will be more similar to each other than the ones from different studies. So for a better comparibility across studies it seems better to normalise the raw arrays from all studies together. Having said that, even if you are able to do this you will typically find that the data from the different studies cluster together, i.e. the normalisation is not able to remove all the differences between studies. So any proper meta analysis must somehow take into account this study effect (and there is a growing amount of literature how to do that).The importance of having the raw data depends on which approach you take; if you use a p-value comination approach like Stouffers method for example it shouldn't matter much for example, but if you try to put all data into one big analysis it might very well matter. Best Wishes Claus Hello Bioconductor-inos, I have more of a statistical/philosophical question regarding using raw vs. normalized data in a microarray meta-analysis. I've looked through the bioconductor archives and have found some addressing of this issue, but not exactly what I'm concerned with. I don't mean to waste anyone's time, but I was hoping I could get some help here. I've performed a meta-analysis using the downloaded data from 3 different GEO data sets (GDS). It is my understanding that these are normalized data from the various microarray experiments. Seems to me that the data from those normalized results are normally distributed, those three experiments are perfectly comparable (if you think the author's respective normalization approaches were reasonable). All you need to do is calculate some sort of effect size/determine a p-value/etc. for all genes in the experimental conditions of interest and then combine these statistics across the different experiments. However, I consistently read things like "raw data are required for a microarray meta-analysis." Does this mean that normalized data are not directly comparable with eachother? If so, then why does GEO even host such data? Any help would be wonderful! Wyatt K. Wyatt McMahon, Ph.D. Texas Tech University Health Sciences Center Department of Internal Medicine 3601 4th St. Lubbock, TX - 79430 806-743-4072 "It's been a good year in the lab when three things work. . . and one of those is the lights." - Tom Maniatis [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor The University of Aberdeen is a charity registered in Scotland, No SC013683.
ADD COMMENT
0
Entering edit mode
Excellent, Claus! I appreciate your input! That was my idea as well - if you are trying to do one big experiment, then you'd definitely need to adjust for any "study effects," but just trying to combine p-values (I'm using an average effect size, which converts directly to p-values) then it's less important. Are there any other ideas on this subject? Thanks to everyone so far, Wyatt K. Wyatt McMahon, Ph.D. Texas Tech University Health Sciences Center Department of Internal Medicine 3601 4th St. Lubbock, TX - 79430 806-743-4072 "It's been a good year in the lab when three things work. . . and one of those is the lights." - Tom Maniatis > -----Original Message----- > From: Mayer, Claus-Dieter [mailto:c.mayer at abdn.ac.uk] > Sent: Wednesday, December 17, 2008 5:43 PM > To: Mcmahon, Kevin; bioconductor at stat.math.ethz.ch > Subject: RE: Normalized microarray data and meta-analysis > > Dear Kevin, > > that is a difficult question indeed. I am not sure what type of > microarrays we are talking about here, but if it were Affy arrays then > normalisation methods like RMA or GCRMA perform an "across array" > normalisation step, i.e. the normalised data from the same study will > be more similar to each other than the ones from different studies. So > for a better comparibility across studies it seems better to normalise > the raw arrays from all studies together. > > Having said that, even if you are able to do this you will typically > find that the data from the different studies cluster together, i.e. > the normalisation is not able to remove all the differences between > studies. So any proper meta analysis must somehow take into account > this study effect (and there is a growing amount of literature how to > do that).The importance of having the raw data depends on which > approach you take; if you use a p-value comination approach like > Stouffers method for example it shouldn't matter much for example, but > if you try to put all data into one big analysis it might very well > matter. > > Best Wishes > > Claus > > > Hello Bioconductor-inos, > > > > I have more of a statistical/philosophical question regarding using raw > vs. normalized data in a microarray meta-analysis. I've looked through > the bioconductor archives and have found some addressing of this issue, > but not exactly what I'm concerned with. I don't mean to waste > anyone's > time, but I was hoping I could get some help here. > > > > I've performed a meta-analysis using the downloaded data from 3 > different GEO data sets (GDS). It is my understanding that these are > normalized data from the various microarray experiments. Seems to me > that the data from those normalized results are normally distributed, > those three experiments are perfectly comparable (if you think the > author's respective normalization approaches were reasonable). All > you > need to do is calculate some sort of effect size/determine a > p-value/etc. for all genes in the experimental conditions of interest > and then combine these statistics across the different experiments. > However, I consistently read things like "raw data are required for a > microarray meta-analysis." Does this mean that normalized data are not > directly comparable with eachother? If so, then why does GEO even host > such data? > > > > Any help would be wonderful! > > > > Wyatt > > > > K. Wyatt McMahon, Ph.D. > > Texas Tech University Health Sciences Center > > Department of Internal Medicine > > 3601 4th St. > > Lubbock, TX - 79430 > > 806-743-4072 > > "It's been a good year in the lab when three things work. . . and one > of > those is the lights." - Tom Maniatis > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > The University of Aberdeen is a charity registered in Scotland, No > SC013683.
ADD REPLY
0
Entering edit mode
Paul Leo ▴ 970
@paul-leo-2092
Last seen 10.2 years ago
No you don't need the raw data. However, do you need to check that p-values were calculated the same way between experiments (will be consistent if you use GEO processed data ) - what if one group did a multiple testing correction and the other did not? Perhaps this is already accounted for in the method you mentioned? You may wish to consider if you will combine p-values at the gene level the probe level. Most favour the probe level due to spline varients etc If you comparing cross array platforms then you need to be very careful; a conservative appraoch is blast probe-to-probe across array platforms to get the correspondence. Illumina provides "pre-basted" probes sets on their ftp site for ilumina-affy comparisons. Best of luck. Cheers Paul -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Mcmahon, Kevin Sent: Thursday, 18 December 2008 8:31 AM To: bioconductor at stat.math.ethz.ch Subject: [BioC] Normalized microarray data and meta-analysis Hello Bioconductor-inos, I have more of a statistical/philosophical question regarding using raw vs. normalized data in a microarray meta-analysis. I've looked through the bioconductor archives and have found some addressing of this issue, but not exactly what I'm concerned with. I don't mean to waste anyone's time, but I was hoping I could get some help here. I've performed a meta-analysis using the downloaded data from 3 different GEO data sets (GDS). It is my understanding that these are normalized data from the various microarray experiments. Seems to me that the data from those normalized results are normally distributed, those three experiments are perfectly comparable (if you think the author's respective normalization approaches were reasonable). All you need to do is calculate some sort of effect size/determine a p-value/etc. for all genes in the experimental conditions of interest and then combine these statistics across the different experiments. However, I consistently read things like "raw data are required for a microarray meta-analysis." Does this mean that normalized data are not directly comparable with eachother? If so, then why does GEO even host such data? Any help would be wonderful! Wyatt K. Wyatt McMahon, Ph.D. Texas Tech University Health Sciences Center Department of Internal Medicine 3601 4th St. Lubbock, TX - 79430 806-743-4072 "It's been a good year in the lab when three things work. . . and one of those is the lights." - Tom Maniatis [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
I feel that p-values, corrected or otherwise, may be unsatisfactory for detecting concordance between experiments. For example, an experiment with higher N will show lower p-values for the same gene, even under conditions that are otherwise precisely the same. So we can't compare p values head to head across multiple experiments directly. Simple simulations show that straight fold change can be more predictive of future behavior (say, in somebody else's study) than statistics which place a high premium on within-group consistency. Check this out: BMC Bioinformatics. 2008; 9(Suppl 9): S10. Published online 2008 August 12. doi: 10.1186/1471-2105-9-S9-S10. PMCID: PMC2537561 Copyright ? 2008 Shi et al; licensee BioMed Central Ltd. The balance of reproducibility, sensitivity, and specificity of lists of differentially expressed genes in microarray studies Cheers Tom On Dec 17, 2008, at 7:06 PM, Paul Leo wrote: > No you don't need the raw data. However, do you need to check that > p-values were calculated the same way between experiments (will be > consistent if you use GEO processed data ) - what if one group did a > multiple testing correction and the other did not? Perhaps this is > already accounted for in the method you mentioned? > > You may wish to consider if you will combine p-values at the gene > level > the probe level. Most favour the probe level due to spline varients > etc > > If you comparing cross array platforms then you need to be very > careful; > a conservative appraoch is blast probe-to-probe across array platforms > to get the correspondence. Illumina provides "pre-basted" probes > sets on > their ftp site for ilumina-affy comparisons. > > Best of luck. > > Cheers > Paul > > > -----Original Message----- > From: bioconductor-bounces at stat.math.ethz.ch > [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Mcmahon, > Kevin > Sent: Thursday, 18 December 2008 8:31 AM > To: bioconductor at stat.math.ethz.ch > Subject: [BioC] Normalized microarray data and meta-analysis > > Hello Bioconductor-inos, > > > > I have more of a statistical/philosophical question regarding using > raw > vs. normalized data in a microarray meta-analysis. I've looked > through > the bioconductor archives and have found some addressing of this > issue, > but not exactly what I'm concerned with. I don't mean to waste > anyone's > time, but I was hoping I could get some help here. > > > > I've performed a meta-analysis using the downloaded data from 3 > different GEO data sets (GDS). It is my understanding that these are > normalized data from the various microarray experiments. Seems to me > that the data from those normalized results are normally distributed, > those three experiments are perfectly comparable (if you think the > author's respective normalization approaches were reasonable). > All you > need to do is calculate some sort of effect size/determine a > p-value/etc. for all genes in the experimental conditions of interest > and then combine these statistics across the different experiments. > However, I consistently read things like "raw data are required for a > microarray meta-analysis." Does this mean that normalized data are > not > directly comparable with eachother? If so, then why does GEO even > host > such data? > > > > Any help would be wonderful! > > > > Wyatt > > > > K. Wyatt McMahon, Ph.D. > > Texas Tech University Health Sciences Center > > Department of Internal Medicine > > 3601 4th St. > > Lubbock, TX - 79430 > > 806-743-4072 > > "It's been a good year in the lab when three things work. . . and > one of > those is the lights." - Tom Maniatis > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/ > gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
I'm very excited about this discussion, and I appreciate everyone's input. Thanks especially to Thomas, who noted the paper indicated that fold change is the most reproducible between groups. Overall, it appears that which method should be used depends largely on the goals of the project. I agree with Thomas that direct comparison on p-values is often problematic precisely because of the reasons he mentioned - that higher N will show lower p-values. However, if your different studies do happen to have at least similar N, then p-values might be a good approach. Additionally, the use of an effect size or other statistic often favors reproducibility of that statistic rather than actual biological significance. You might call a gene significantly differentially expressed if its statistic in three different studies is 0.25, 0.25, and 0.25, even though that's not a very large statistic. On the other hand a gene with a statistic of 0.25, 1, and 3 would not be considered significant simply because of the variance of the statistic between studies. We have chosen this approach in spite of this drawback because our question was specifically, "which genes are consistently differentially expressed?" Because we're looking for "consistency" then we have chosen to accept slight but reproducible changes. Finally, at least according to the paper Thomas noted, fold change appears to be the most reproducible statistic between laboratories. This makes sense, since a small difference in fold change can have a large effect on p-value, so that when you compare between groups the differences in fold change are relatively small while the differences in p-value can be large. However, comparing fold change between experiments would absolutely necessitate similar normalization schemes, as opposed to a common statistic or p-value combining method which is more concerned with what using the authors' interpretation. Thanks to everyone who helped with this discussion. It is, of course, still open. Wyatt K. Wyatt McMahon, Ph.D. Texas Tech University Health Sciences Center Department of Internal Medicine 3601 4th St. Lubbock, TX - 79430 806-743-4072 "It's been a good year in the lab when three things work. . . and one of those is the lights." - Tom Maniatis > -----Original Message----- > From: Thomas Hampton [mailto:Thomas.H.Hampton at Dartmouth.edu] > Sent: Wednesday, December 17, 2008 8:57 PM > To: Paul Leo > Cc: Mcmahon, Kevin; bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] Normalized microarray data and meta-analysis > > I feel that p-values, corrected or otherwise, may be unsatisfactory for > detecting concordance between experiments. For example, an experiment > with > higher N will show lower p-values for the same gene, even under > conditions that are otherwise precisely the same. So we can't compare > p values head to head across multiple experiments directly. Simple > simulations show > that straight fold change can be more predictive of future behavior > (say, in > somebody else's study) than statistics which place a high premium on > within-group consistency. > > Check this out: > > BMC Bioinformatics. 2008; 9(Suppl 9): S10. > Published online 2008 August 12. doi: 10.1186/1471-2105-9-S9-S10. > PMCID: PMC2537561 > Copyright (c) 2008 Shi et al; licensee BioMed Central Ltd. > > The balance of reproducibility, sensitivity, and specificity of lists > of differentially expressed genes in microarray studies > > > Cheers > > Tom > > > On Dec 17, 2008, at 7:06 PM, Paul Leo wrote: > > > No you don't need the raw data. However, do you need to check that > > p-values were calculated the same way between experiments (will be > > consistent if you use GEO processed data ) - what if one group did a > > multiple testing correction and the other did not? Perhaps this is > > already accounted for in the method you mentioned? > > > > You may wish to consider if you will combine p-values at the gene > > level > > the probe level. Most favour the probe level due to spline varients > > etc > > > > If you comparing cross array platforms then you need to be very > > careful; > > a conservative appraoch is blast probe-to-probe across array > platforms > > to get the correspondence. Illumina provides "pre-basted" probes > > sets on > > their ftp site for ilumina-affy comparisons. > > > > Best of luck. > > > > Cheers > > Paul > > > > > > -----Original Message----- > > From: bioconductor-bounces at stat.math.ethz.ch > > [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Mcmahon, > > Kevin > > Sent: Thursday, 18 December 2008 8:31 AM > > To: bioconductor at stat.math.ethz.ch > > Subject: [BioC] Normalized microarray data and meta-analysis > > > > Hello Bioconductor-inos, > > > > > > > > I have more of a statistical/philosophical question regarding using > > raw > > vs. normalized data in a microarray meta-analysis. I've looked > > through > > the bioconductor archives and have found some addressing of this > > issue, > > but not exactly what I'm concerned with. I don't mean to waste > > anyone's > > time, but I was hoping I could get some help here. > > > > > > > > I've performed a meta-analysis using the downloaded data from 3 > > different GEO data sets (GDS). It is my understanding that these are > > normalized data from the various microarray experiments. Seems to me > > that the data from those normalized results are normally > distributed, > > those three experiments are perfectly comparable (if you think the > > author's respective normalization approaches were reasonable). > > All you > > need to do is calculate some sort of effect size/determine a > > p-value/etc. for all genes in the experimental conditions of interest > > and then combine these statistics across the different experiments. > > However, I consistently read things like "raw data are required for a > > microarray meta-analysis." Does this mean that normalized data are > > not > > directly comparable with eachother? If so, then why does GEO even > > host > > such data? > > > > > > > > Any help would be wonderful! > > > > > > > > Wyatt > > > > > > > > K. Wyatt McMahon, Ph.D. > > > > Texas Tech University Health Sciences Center > > > > Department of Internal Medicine > > > > 3601 4th St. > > > > Lubbock, TX - 79430 > > > > 806-743-4072 > > > > "It's been a good year in the lab when three things work. . . and > > one of > > those is the lights." - Tom Maniatis > > > > > > > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/ > > gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Dear Tom, That is an interesting point you make (and interesting paper you refer to) but in my view it is not the main aim of a meta-analysis to find the concordance between the individual studies but to summarize these studies in such a way that you have a higher power/sensitivity than any of the individual studies. You could get for example a 100% concordance between studies by not using any statistics but listing the genes in alphabetical order. If you take, say, the top 100 of that list for each study you will get the same genes each time, but unfortunately most of them will be false positives. Also that BMC Bioinformatics paper doesn't suggest abandoning p-values completely but using them as an additional filtering on the gene list ranked with respect to fold change. So to apply that advice in a meta- analysis one would have to find both some way of coming up with an overall fold change for each gene and an overall p-value for each gene. And the original question remains: would one need to have the raw data for that or is it good enough to have the normalized data or even just summary statistics like average foldchange and p-value for each gene in each study (and my very short answer would be: you do not necessarily need the raw data but they might help!) Best Wishes Claus P.S.: Apologies for mis-using the list for discussion which is not strictly about Bioconductor software, but I guess that meta-analysis will be an issue that many here might be interested in > -----Original Message----- > From: bioconductor-bounces at stat.math.ethz.ch [mailto:bioconductor- > bounces at stat.math.ethz.ch] On Behalf Of Thomas Hampton > Sent: 18 December 2008 02:57 > To: Paul Leo > Cc: bioconductor at stat.math.ethz.ch; Mcmahon, Kevin > Subject: Re: [BioC] Normalized microarray data and meta-analysis > > I feel that p-values, corrected or otherwise, may be unsatisfactory for > detecting concordance between experiments. For example, an experiment > with > higher N will show lower p-values for the same gene, even under > conditions that are otherwise precisely the same. So we can't compare > p values head to head across multiple experiments directly. Simple > simulations show > that straight fold change can be more predictive of future behavior > (say, in > somebody else's study) than statistics which place a high premium on > within-group consistency. > > Check this out: > > BMC Bioinformatics. 2008; 9(Suppl 9): S10. > Published online 2008 August 12. doi: 10.1186/1471-2105-9-S9-S10. > PMCID: PMC2537561 > Copyright (c) 2008 Shi et al; licensee BioMed Central Ltd. > > The balance of reproducibility, sensitivity, and specificity of lists > of differentially expressed genes in microarray studies > > > Cheers > > Tom > > The University of Aberdeen is a charity registered in Scotland, No SC013683.
ADD REPLY

Login before adding your answer.

Traffic: 615 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6