multiple comparisons followed by multiple tests

0

Entering edit mode

Richard Friedman ★ 2.0k

@richard-friedman-513

Last seen 9.6 years ago

Dear Bioconductor Users, I have an experimental design where I have several samples which I wish to compare in several ways (necessitating multiple comparisons) and of course several thousand genes (necessitating multiple tests). My general strategy in for analyzing these experiments is to 1. Obtain p-values for the different comparisons for each gene corrected for multiple comparisons. 2. Correct the p-values for each test for multiple tests. Is this correct? I haven't mentioned particular software so far, because I wanted to first see if the overall approach is correct. I am planning to use SAS because I am more comfortable with it than I am with R at this point (I do my normalization and exploratory analysis with Bioconductor and my statistical analysis with R). I planning on using the SAS general linear model, I am planning to correct for multiple comparisons between means with the Tukey method or a randomization method. Then I am planning on correcting for multiple tests with the Benjamini-Hochberg False discovery rate. Does this sound like a reasonable way to proceed? (I am planning on switching to R for statistical analyses eventually). Finally, the experimentalist gave me between 1-3 technical replicates of each sample. I seem to remember someone on the list recommending that fold changes rather than statistics be used for samples with few replicates. Are there systematic studies to back up this contention? Am i therefore wasting my time doing statistics altogether, and should I merely rank fold changes? This seems counterintuitive, but I need to make sure. Thanks and best wishes, Rich ------------------------------------------------------------ Richard A. Friedman, PhD Associate Research Scientist Herbert Irving Comprehensive Cancer Center Oncoinformatics Core Lecturer Department of Biomedical Informatics Box 95, Room 130BB or P&S 1-420C Columbia University Medical Center 630 W. 168th St. New York, NY 10032 (212)305-6901 (5-6901) (voice) friedman@cancercenter.columbia.edu http://cancercenter.columbia.edu/~friedman/ "What is the breakfast all those people ate on Bloomsday?" -Rose Friedman, age 8 In Memoriam, Tim O'Connor

Normalization Cancer Normalization Cancer • 860 views

ADD COMMENT • link updated 19.8 years ago by Matthew Hannah ▴ 940 • written 19.8 years ago by Richard Friedman ★ 2.0k

0

Entering edit mode

Matthew Hannah ▴ 940

@matthew-hannah-621

Last seen 9.6 years ago

I would look at the Limma help pages as this allows Lm fitting and the specification of multiple comparisons and also P value correction by fdr (although I think this is only after ebayes mod of t-stats?). As for the replication if you have less than 3 reps per treatment then you are obviously wasting your time. Also if they are just technical reps rather than true biological reps then any statistical analysis will be misleading due to the underestimate of biological variability. You also don't mention what type of data (affy or cDNA) or the general design which may allow people to offer more detailed advice. You also don't mention the starting point data - for example if it's affy data are the signal values from MAS5 or have you looked into using RMA or GCRMA as a low-level normalisation? HTH, Matt

ADD COMMENT • link 19.8 years ago Matthew Hannah ▴ 940

0

Entering edit mode

Dear Matt and other Bioconductor Users: On Jul 21, 2004, at 9:56 AM, Matthew Hannah wrote: > I would look at the Limma help pages as this allows Lm fitting and the > specification of multiple comparisons and also P value correction by > fdr > (although I think this is only after ebayes mod of t-stats?). > > As for the replication if you have less than 3 reps per treatment then > you are obviously wasting your time. Also if they are just technical > reps > rather than true biological reps then any statistical analysis will be > misleading due to the underestimate of biological variability. You also > don't mention what type of data (affy or cDNA) or the general design > which may allow people to offer more detailed advice. > You also don't mention the starting point data - for example if it's > affy > data are the signal values from MAS5 or have you looked into using RMA > or > GCRMA as a low-level normalisation? Thank you for your help. I phrased my questions generally, but will now be more specific, if that affects the answers. The data is Affy data. I normalized it with RMA. I agree with you that between 1-3 technical replicates is not optimal. I didn't design the experiments. I was just given them to analyze after they were performed. The experimentalist with whom I am working is prepared to perform a limited number of PCR confirmatory experiments. I will encourage him to do more experiments, but he wants to see what can be learned from the present dataset first. The experiment is to detect the effect of a knockout on the ability of cells to respond to different mutagens. So I am planning on comparing: 1. wild-type exposed to mutagen (1 technical replicate) vs. wildtype no treatment (3 technical replicates). 2. knockout no treatment (2 technical replicates) vs. wildtype no-treatmen (3 technical replicates). 3. knockout exposed to mutagen (1 technical replicate) - wildtype exposed to mutagen (1 technical replicate). 4. (knockout exposed to mutagen (1 technical replicate) - wildtype exposed to mutagen)(1 technical replicate). - (knockout no treatment (2 technical replicates) - wildtype no treatment (3 technical replicates)). My question is; Given the small number of replicates, should I ignore statistical analysis altogether and merely proceed with fold changes. Thanks and best wishes, Rich > > HTH, > Matt > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > ------------------------------------------------------------ Richard A. Friedman, PhD Associate Research Scientist Herbert Irving Comprehensive Cancer Center Oncoinformatics Core Lecturer Department of Biomedical Informatics Box 95, Room 130BB or P&S 1-420C Columbia University Medical Center 630 W. 168th St. New York, NY 10032 (212)305-6901 (5-6901) (voice) friedman@cancercenter.columbia.edu http://cancercenter.columbia.edu/~friedman/ "What is the breakfast all those people ate on Bloomsday?" -Rose Friedman, age 8 In Memoriam, Tim O'Connor

ADD REPLY • link 19.8 years ago Richard Friedman ★ 2.0k

0

Entering edit mode

Matthew Hannah ▴ 940

@matthew-hannah-621

Last seen 9.6 years ago

Rich, So you have WT Mut - 1 chip WT Control - 3 chips KO Mut - 1 chip KO Control - 2 chips Odd design, I'd have used 1 more chip and done 2 biological reps of all treatments. But that's irrelevant as its not your experiment and its after the event anyway. The only statistical comparison that could be made is WT - KO, but then the treatment effects may get in the way and I guess the primary interest are differences in response to the treatment. Also remember the comment about if these are tech reps then stats is dodgy ground anyway. I would find out whether the reps are really tech reps ie:same RNA hybridised to 3 chips, biological reps within or between experiments. If its the first then he probably wasted $3000, the 2nd and 3rd may allow you to estimate the variability between samples and what kind of fold changes you may get by chance. You will also need to find out if there was any pooling done and consider this (see Rafael's post on pooling paper above). To cut to the point, the only option is to filter on fold change in some way. The bigger the changes the better and confirm with RT-PCR, I guess. Oh, and tell them to ask *before* next time ;-) HTH, Matt -----Original Message----- From: Richard Friedman [mailto:friedman@cancercenter.columbia.edu] Sent: Mittwoch, 21. Juli 2004 16:39 To: Matthew Hannah Cc: 'Bioconductor Mail List' Subject: Re: [BioC] multiple comparisons followed by multiple tests Dear Matt and other Bioconductor Users: On Jul 21, 2004, at 9:56 AM, Matthew Hannah wrote: > I would look at the Limma help pages as this allows Lm fitting and the > specification of multiple comparisons and also P value correction by > fdr > (although I think this is only after ebayes mod of t-stats?). > > As for the replication if you have less than 3 reps per treatment then > you are obviously wasting your time. Also if they are just technical > reps > rather than true biological reps then any statistical analysis will be > misleading due to the underestimate of biological variability. You also > don't mention what type of data (affy or cDNA) or the general design > which may allow people to offer more detailed advice. > You also don't mention the starting point data - for example if it's > affy > data are the signal values from MAS5 or have you looked into using RMA > or > GCRMA as a low-level normalisation? Thank you for your help. I phrased my questions generally, but will now be more specific, if that affects the answers. The data is Affy data. I normalized it with RMA. I agree with you that between 1-3 technical replicates is not optimal. I didn't design the experiments. I was just given them to analyze after they were performed. The experimentalist with whom I am working is prepared to perform a limited number of PCR confirmatory experiments. I will encourage him to do more experiments, but he wants to see what can be learned from the present dataset first. The experiment is to detect the effect of a knockout on the ability of cells to respond to different mutagens. So I am planning on comparing: 1. wild-type exposed to mutagen (1 technical replicate) vs. wildtype no treatment (3 technical replicates). 2. knockout no treatment (2 technical replicates) vs. wildtype no-treatmen (3 technical replicates). 3. knockout exposed to mutagen (1 technical replicate) - wildtype exposed to mutagen (1 technical replicate). 4. (knockout exposed to mutagen (1 technical replicate) - wildtype exposed to mutagen)(1 technical replicate). - (knockout no treatment (2 technical replicates) - wildtype no treatment (3 technical replicates)). My question is; Given the small number of replicates, should I ignore statistical analysis altogether and merely proceed with fold changes. Thanks and best wishes, Rich > > HTH, > Matt > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > ------------------------------------------------------------ Richard A. Friedman, PhD Associate Research Scientist Herbert Irving Comprehensive Cancer Center Oncoinformatics Core Lecturer Department of Biomedical Informatics Box 95, Room 130BB or P&S 1-420C Columbia University Medical Center 630 W. 168th St. New York, NY 10032 (212)305-6901 (5-6901) (voice) friedman@cancercenter.columbia.edu http://cancercenter.columbia.edu/~friedman/ "What is the breakfast all those people ate on Bloomsday?" -Rose Friedman, age 8 In Memoriam, Tim O'Connor

ADD COMMENT • link 19.8 years ago Matthew Hannah ▴ 940

Login before adding your answer.