help with multiple testing
2
0
Entering edit mode
@efthimios-motakis-4986
Last seen 10.2 years ago
Hi all, My name is Mike and I am a post-doctoral fellow in Bioinformatics. I have a question regarding multiple testing p-values adjustment and I wonder if someone could give me a piece of advice. I have multiple gene pairs (approximately 8,256) composed by all possible combinations of 129 genes. For each pair A-B (A different from B) four values are recorded: number of tumors found in both A and B (TT), number of tumors only in A (TF), number of tumors only in B (FT), number of tumors found neither in A nor in B (FF). The data are in the form of 2x2 contingency tables. E.g. Gene 1 Gene 2 TT TF FT FF g1 g2 5 1 1 27 g1 g3 4 1 1 28 g2 g3 4 2 0 28 ... ... ... Notice that each gene is paired with all others and thus it is represented 128 times in this list. I want to find which of the 8,256 gene pairs (tests) show significant associations between rows (in A, not in A) and columns (in B, not in B) by Fisher or Barnard test. Subsequently I have to perform p-value adjustment for multiple testing. At 5% I find approximately 500 significant gene pairs but, naturally, all p-value adjustment procedures I tried (for independent tests: BH, q-value; for dependent tests: BY, adaptiveBH and BlaRoq from package "multtest") produce adj. p-values > 0.3. I think that the problem is that the highly dependent nature of the data (50% of the genes have very small number of mutations which gives high p-values for all pair they generate) affects dramatically the adjustment procedure. Is there a better way (method) to run the p-values adjustment? Do you think if I created multiple lists of gene pairs, where each gene is represented only once, and then estimate q-value (multiple q-values for each pair) would be an appropriate solution? Thank you, Mike
• 1.1k views
ADD COMMENT
0
Entering edit mode
yao chen ▴ 210
@yao-chen-5205
Last seen 10.2 years ago
Hi Mike, I think another reason is the small sample size and many gene pairs.So randomly significant pairs would be expect which generate high FDR. I don't know if there is a better solution. I would choose top ranking genes with big fold change and small p value. Jack 2012/6/25 efthimiosm <efthimiosm@bii.a-star.edu.sg> > Hi all, > > My name is Mike and I am a post-doctoral fellow in Bioinformatics. I have > a question regarding multiple testing p-values adjustment and I wonder if > someone could give me a piece of advice. > > I have multiple gene pairs (approximately 8,256) composed by all possible > combinations of 129 genes. For each pair A-B (A different from B) four > values are recorded: number of tumors found in both A and B (TT), number > of tumors only in A (TF), number of tumors only in B (FT), number of tumors > found neither in A nor in B (FF). The data are in the form of 2x2 > contingency tables. E.g. > > Gene 1 Gene 2 TT TF FT FF > g1 g2 5 1 1 27 > g1 g3 4 1 1 28 > g2 g3 4 2 0 28 > ... > ... > ... > > Notice that each gene is paired with all others and thus it is represented > 128 times in this list. I want to find which of the 8,256 gene pairs > (tests) show significant associations between rows (in A, not in A) and > columns (in B, not in B) by Fisher or Barnard test. Subsequently I have to > perform p-value adjustment for multiple testing. > > At 5% I find approximately 500 significant gene pairs but, naturally, all > p-value adjustment procedures I tried (for independent tests: BH, q-value; > for dependent tests: BY, adaptiveBH and BlaRoq from package "multtest") > produce adj. p-values > 0.3. I think that the problem is that the highly > dependent nature of the data (50% of the genes have very small number of > mutations which gives high p-values for all pair they generate) affects > dramatically the adjustment procedure. > > Is there a better way (method) to run the p-values adjustment? > > Do you think if I created multiple lists of gene pairs, where each gene is > represented only once, and then estimate q-value (multiple q-values for > each pair) would be an appropriate solution? > > > Thank you, > Mike > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
@wolfgang-huber-3550
Last seen 3 months ago
EMBL European Molecular Biology Laborat…
Dear Mike I'd be surprised if this problem were cracked by a brute force purely 'statistical' approach. You could try to reduce the number of tests by first grouping the genes into 'pathways' or functional modules. With a lot of luck, the data may then just be large enough. Besy wishes Wolfgang Jun/25/12 1:15 PM, efthimiosm scripsit:: > Hi all, > > My name is Mike and I am a post-doctoral fellow in Bioinformatics. I > have a question regarding multiple testing p-values adjustment and I > wonder if someone could give me a piece of advice. > > I have multiple gene pairs (approximately 8,256) composed by all > possible combinations of 129 genes. For each pair A-B (A different from > B) four values are recorded: number of tumors found in both A and B > (TT), number of tumors only in A (TF), number of tumors only in B (FT), > number of tumors found neither in A nor in B (FF). The data are in the > form of 2x2 contingency tables. E.g. > > Gene 1 Gene 2 TT TF FT FF > g1 g2 5 1 1 27 > g1 g3 4 1 1 28 > g2 g3 4 2 0 28 > ... > ... > ... > > Notice that each gene is paired with all others and thus it is > represented 128 times in this list. I want to find which of the 8,256 > gene pairs (tests) show significant associations between rows (in A, not > in A) and columns (in B, not in B) by Fisher or Barnard test. > Subsequently I have to perform p-value adjustment for multiple testing. > > At 5% I find approximately 500 significant gene pairs but, naturally, > all p-value adjustment procedures I tried (for independent tests: BH, > q-value; for dependent tests: BY, adaptiveBH and BlaRoq from package > "multtest") produce adj. p-values > 0.3. I think that the problem is > that the highly dependent nature of the data (50% of the genes have very > small number of mutations which gives high p-values for all pair they > generate) affects dramatically the adjustment procedure. > > Is there a better way (method) to run the p-values adjustment? > > Do you think if I created multiple lists of gene pairs, where each gene > is represented only once, and then estimate q-value (multiple q-values > for each pair) would be an appropriate solution? > > > Thank you, > Mike > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Best wishes Wolfgang Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber
ADD COMMENT

Login before adding your answer.

Traffic: 493 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6