multiple testing with 54000 genes

0

Entering edit mode

Dipl.-Ing. Johannes Rainer ▴ 430

@dipl-ing-johannes-rainer-846

Last seen 9.7 years ago

hi, i wanted to ask if someone has experience in multiple testing with a large number of genes. i have in total 24 Affymetrix chips (hgu133plus2), 12 patients, for every patient an 0 hours and 6 hours after treatment sample. i calculated p values using permutation (mt.maxT function with test="pairt") and corrected for multiple testing using the Benjamini Hochberg method. the problem is, that with that large number of tests (54675 genes and therefore 54675 tests) after adjusting the p values no gene shows a "significant" difference. i will now reduce the number of genes to test to get to some results. has anyone experienced similar problems? thanks, jo

• 1.5k views

ADD COMMENT • link 19.3 years ago Dipl.-Ing. Johannes Rainer ▴ 430

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 2 days ago

United States

Dipl.-Ing. Johannes Rainer wrote: > hi, > > i wanted to ask if someone has experience in multiple testing with a > large number of genes. > > i have in total 24 Affymetrix chips (hgu133plus2), 12 patients, for > every patient an 0 hours and 6 hours after treatment sample. i > calculated p values using permutation (mt.maxT function with > test="pairt") and corrected for multiple testing using the Benjamini > Hochberg method. the problem is, that with that large number of tests > (54675 genes and therefore 54675 tests) after adjusting the p values no > gene shows a "significant" difference. > > i will now reduce the number of genes to test to get to some results. > has anyone experienced similar problems? You probably don't have enough samples to use a permuted null distribution. I believe the smallest p-value you can get with a permuted null is going to be ~0.00024, which may not be small enough to survive a multiplicity correction with that many genes. I would imagine you would get better results if you used a parametric null (e.g., using the limma package). Best, Jim > > thanks, jo > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor -- James W. MacDonald Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623

ADD COMMENT • link 19.3 years ago James W. MacDonald 65k

0

Entering edit mode

exactly! the smalles p value i get is exactly 0.00024 (to be more precise 0.0002442 :) ). i thought 12 samples should be enough to calculate p values using permutation. what do you mean with parametric null? something like t-test? i'm sorry for my questions, but i am not a statistician (not yet...) thanks! Quoting "James W. MacDonald" <jmacdon@med.umich.edu>: > Dipl.-Ing. Johannes Rainer wrote: >> hi, >> >> i wanted to ask if someone has experience in multiple testing with a >> large number of genes. >> >> i have in total 24 Affymetrix chips (hgu133plus2), 12 patients, for >> every patient an 0 hours and 6 hours after treatment sample. i >> calculated p values using permutation (mt.maxT function with >> test="pairt") and corrected for multiple testing using the Benjamini >> Hochberg method. the problem is, that with that large number of >> tests (54675 genes and therefore 54675 tests) after adjusting the p >> values no gene shows a "significant" difference. >> >> i will now reduce the number of genes to test to get to some results. >> has anyone experienced similar problems? > > You probably don't have enough samples to use a permuted null > distribution. I believe the smallest p-value you can get with a > permuted null is going to be ~0.00024, which may not be small enough > to survive a multiplicity correction with that many genes. I would > imagine you would get better results if you used a parametric null > (e.g., using the limma package). > > > Best, > > Jim > > >> >> thanks, jo >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor > > > -- > James W. MacDonald > Affymetrix and cDNA Microarray Core > University of Michigan Cancer Center > 1500 E. Medical Center Drive > 7410 CCGC > Ann Arbor MI 48109 > 734-647-5623 >

ADD REPLY • link 19.3 years ago Dipl.-Ing. Johannes Rainer ▴ 430

0

Entering edit mode

Sachin Mathur ▴ 60

@sachin-mathur-494

Last seen 9.7 years ago

Jo, The number of probes significantly influence multiple testing corrections results. .Benjamini and Hoshberg is one of the least stringent tests. It works in the following way 1. The p-values of the probes after the t-test are ranked The largest p-value remains as it is, and starts by testing 2. The second largest p-value of the probe * Number of Probes(p) / p-1 <0.05 and for 3rd largest p-value it is 3rd largest p-value * p / p-2 <0.05 so if a large number of probes are selected for the test, n/n-1, n/n-2 and so on will beome larger. So, selecting lesser number of probes will give you a better result Sachin. >>> "Dipl.-Ing. Johannes Rainer" <johannes.rainer@tugraz.at> 02/17/05 8:39 AM >>> hi, i wanted to ask if someone has experience in multiple testing with a large number of genes. i have in total 24 Affymetrix chips (hgu133plus2), 12 patients, for every patient an 0 hours and 6 hours after treatment sample. i calculated p values using permutation (mt.maxT function with test="pairt") and corrected for multiple testing using the Benjamini Hochberg method. the problem is, that with that large number of tests (54675 genes and therefore 54675 tests) after adjusting the p values no gene shows a "significant" difference. i will now reduce the number of genes to test to get to some results. has anyone experienced similar problems? thanks, jo _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor

ADD COMMENT • link 19.3 years ago Sachin Mathur ▴ 60

0

Entering edit mode

Sachin Mathur ▴ 60

@sachin-mathur-494

Last seen 9.7 years ago

Sry in the earlier email I used n instead of p. It is p/p-1, p/p-2 and so on. Thanks Sachin >>> "Dipl.-Ing. Johannes Rainer" <johannes.rainer@tugraz.at> 02/17/05 8:39 AM >>> hi, i wanted to ask if someone has experience in multiple testing with a large number of genes. i have in total 24 Affymetrix chips (hgu133plus2), 12 patients, for every patient an 0 hours and 6 hours after treatment sample. i calculated p values using permutation (mt.maxT function with test="pairt") and corrected for multiple testing using the Benjamini Hochberg method. the problem is, that with that large number of tests (54675 genes and therefore 54675 tests) after adjusting the p values no gene shows a "significant" difference. i will now reduce the number of genes to test to get to some results. has anyone experienced similar problems? thanks, jo _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor

ADD COMMENT • link 19.3 years ago Sachin Mathur ▴ 60

0

Entering edit mode

Dipl.-Ing. Johannes Rainer ▴ 430

@dipl-ing-johannes-rainer-846

Last seen 9.7 years ago

good hint, i reduce the list of genes to test on those that have a M value bigger than 0.7 (or smaller -0.7) in more than 2 comparisons (comparing the 6 hour sample with the 0 hours sample of one patient). so i end up with about 4000 probe sets... Quoting "Oosting, J. (PATH)" <j.oosting@lumc.nl>: > Hi, > > There are a number of ways to reduce the numebr of genes before the > actual analysis. Just beware that your filtering method should not > rely on any differences between the groups you test in order to > prevent depency of values creeping into your data > - Remove genes that do not fulfill quality checks (ie expression > below or over certain level) > - Genes that have low overall veriance are probably not regulated > - Only test genes that are relevant to your experimental question > (you should not do this if your question is: "Which genes are > differentially expressed between the timepoints" also it is silly to > have 2 lists, 1 for the general question and 1 for the specific > question.) > > Regards, > > Jan > >> -----Original Message----- >> From: bioconductor-bounces@stat.math.ethz.ch >> [mailto:bioconductor-bounces@stat.math.ethz.ch]On Behalf Of Dipl.-Ing. >> Johannes Rainer >> Sent: donderdag 17 februari 2005 15:39 >> To: bioconductor@stat.math.ethz.ch >> Subject: [BioC] multiple testing with 54000 genes >> >> >> hi, >> >> i wanted to ask if someone has experience in multiple testing with a >> large number of genes. >> >> i have in total 24 Affymetrix chips (hgu133plus2), 12 patients, for >> every patient an 0 hours and 6 hours after treatment sample. i >> calculated p values using permutation (mt.maxT function with >> test="pairt") and corrected for multiple testing using the Benjamini >> Hochberg method. the problem is, that with that large number of tests >> (54675 genes and therefore 54675 tests) after adjusting the p >> values no >> gene shows a "significant" difference. >> >> i will now reduce the number of genes to test to get to some results. >> has anyone experienced similar problems? >> >> thanks, jo >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >

ADD COMMENT • link 19.3 years ago Dipl.-Ing. Johannes Rainer ▴ 430

0

Entering edit mode

Dipl.-Ing. Johannes Rainer ▴ 430

@dipl-ing-johannes-rainer-846

Last seen 9.7 years ago

jim thank you for your excellent explanation! i will check the limma package. otherwise i will reduce the number of genes i include in the further analysis by some cut off level (as we are interested in genes that show a differencial expression between the 0 and 6 hours sample (in most patients) i will restrict to those that have for example an M value bigger 0.5 in more than two patients). thanks, jo Quoting James MacDonald <jmacdon@med.umich.edu>: > OK, a bit of background. The idea behind a t-test is quite simple; if > you take two random samples from the same Normal distribution and > compare them using the t-test, the t-statistic you generate will follow > a t distribution. This means that we know what to expect from a t-test > if there really isn't a difference between the two samples, so if we get > a t-statistic that is much larger than we expect by chance, we can > assume that there is a difference in the means of the two populations we > are comparing. This is called a parametric test because we are using the > two parameters of the Normal distribution (the mean and variance) to > compare two sets of data that we are assuming come from a Normal > distribution. > > There are some assumptions we are making here. The main assumption is > that the two samples come from Normal distributions, and the only > possible difference between the groups is the mean (we assume that the > variance is the same). There have been some modifications proposed over > the years to account for different variances, and it has been shown that > you don't really need Normally distributed data, but as long as the data > are 'hump' shaped you should be OK. However, if the underlying > distribution of the data is seriously non-Normal, then the t-test starts > to fail. > > In this case, the failure is caused because we are no longer using the > correct null distribution, so we have to use non-parametric methods. One > such method is the Wilcoxon rank sum (or Mann-Whitney) test, where you > use the rank of the data rather than the values themselves. This test > has its own set of assumptions that are actually fairly strict. We can > also attempt to figure out what the null distribution should look like > for our data using permutation methods. The problem with permutation > methods is that the smallest p-value will be equal to 1/number of > permutations. In your case, there are only 2^12 possible permutations > (combinations, actually), so the smallest p-value will be 0.0002442... > > So, long story short, you will probably get better results using the > limma package, which uses a conventional null distribution. > > HTH, > > Jim > > > >>>> "Dipl.-Ing. Johannes Rainer" <johannes.rainer@tugraz.at> 02/17/05 > 12:03PM >>> > > exactly! the smalles p value i get is exactly 0.00024 (to be more > precise 0.0002442 :) ). i thought 12 samples should be enough to > calculate p values using permutation. what do you mean with parametric > > null? something like t-test? i'm sorry for my questions, but i am not a > > statistician (not yet...) > > thanks! > > > Quoting "James W. MacDonald" <jmacdon@med.umich.edu>: > >> Dipl.-Ing. Johannes Rainer wrote: >>> hi, >>> >>> i wanted to ask if someone has experience in multiple testing with a > >>> large number of genes. >>> >>> i have in total 24 Affymetrix chips (hgu133plus2), 12 patients, for > >>> every patient an 0 hours and 6 hours after treatment sample. i >>> calculated p values using permutation (mt.maxT function with >>> test="pairt") and corrected for multiple testing using the Benjamini > >>> Hochberg method. the problem is, that with that large number of >>> tests (54675 genes and therefore 54675 tests) after adjusting the p > >>> values no gene shows a "significant" difference. >>> >>> i will now reduce the number of genes to test to get to some > results. >>> has anyone experienced similar problems? >> >> You probably don't have enough samples to use a permuted null >> distribution. I believe the smallest p-value you can get with a >> permuted null is going to be ~0.00024, which may not be small enough > >> to survive a multiplicity correction with that many genes. I would >> imagine you would get better results if you used a parametric null >> (e.g., using the limma package). >> >> >> Best, >> >> Jim >> >> >>> >>> thanks, jo >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> >> -- >> James W. MacDonald >> Affymetrix and cDNA Microarray Core >> University of Michigan Cancer Center >> 1500 E. Medical Center Drive >> 7410 CCGC >> Ann Arbor MI 48109 >> 734-647-5623 >> > > > > > > ********************************************************** > Electronic Mail is not secure, may not be read every day, and should > not be used for urgent or sensitive issues. >

ADD COMMENT • link 19.3 years ago Dipl.-Ing. Johannes Rainer ▴ 430

Login before adding your answer.