Question

question/issue with multtest mt.maxT

0

Entering edit mode

Les Dethlefsen ▴ 20

@les-dethlefsen-4668

Last seen 9.7 years ago

Hi All, I'm a biologist, not a statistician, and a bit of a newbie with R, so please forgive me if this is a stupid question, and thanks for your help. My data are from shotgun metagenomic sequencing of multiple samples, but it's essentially similar to microarray data. I have a matrix of 3312 functional gene categories by 24 samples, with real-valued entries indicating the proportion of interpretable gene sequence data for each sample that is assigned to each functional category. There are a number of ways to bin the samples into categories, based on some other analysis I'm most interested in a breakdown into two subjects and 5 temporal categories within each subject, i.e. a total of 10 categories. Two of the categories have 4 samples each, the remaining 8 categories have 2 samples each. When I ask mt.maxT to carry out a 2-sided F test on the 10 categories, I see something I don't understand: the adjusted p values are not a monotonic transformation of the raw p values. They are a monotonic with respect to the test statistic...as the test statistic falls, the adjusted p values rise, as expected. But the raw p values are bouncing around quite a bit, with the lowest raw p values of 0.0001 having a range of adjusted p values, all much larger than the minimum adjusted p value. There are 244 raw p values lower than that which corresponds to the minimum adjusted p value. My (perhaps mistaken) understanding was that permutations were used to generate the raw p values (which I definitely want, since I don't think the distributions of values for the different functional gene categories will follow any regular distribution or be similar to each other), and then some sort of adjustment was made to the raw values to adjust for multiple testing considerations, but the adjustment itself was not based on permutations. I would expect the permutations to be fairly noisy in my case with only 2 samples in most of my sample categories, but wouldn't that only affect the reliability of the raw p values? How can the adjusted p values have a different rank order than the raw p values? I did run the same command with the samples binned only to subject, i.e. only 2 categories of 12 subject each. While the range of p values is much lower (not surprisingly) there is still the same pattern with a monotonic relationship between the test statistic and the adjusted p values, but not between the raw p values and adjusted p values. Hence, it doesn't seem that this behavior is due to the messiness of 10 unequally sized categories for only 24 samples. I'm including some snips of my R session below; if anyone is interested in playing with the original data there's a compressed file at ftp://asiago.stanford.edu/multtest_issue.tgz that has the data matrix and the class labels. I'm running R 2.13.0 (on Mac OSX with the R.app GUI) with newly- installed BioConductor, so everything should be up to date. Thanks for any help or insight! Les Les Dethlefsen Relman Lab Microbiology & Immunology Stanford University R version 2.13.0 (2011-04-13) Copyright (C) 2011 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) [R.app GUI 1.40 (5751) x86_64-apple-darwin9.8.0] > ko.data.df=as.data.frame(read.table("K57.fvn.ko.txt",header=TRUE,sep ="",row.names=1)) > class.label.df=as.data.frame(t(read.table("ClassLabel2.txt",header=F ALSE,sep="",row.names=1))) > class.label.df Sub CpNon Cp12Non CpPIP Cp12PIP Cp12PELP SubCp12PIP V2 0 0 0 0 0 0 0 V3 0 0 0 0 0 0 0 V4 0 1 1 1 1 1 1 V5 0 1 1 1 1 1 1 V6 0 0 0 2 2 2 2 V7 0 0 0 2 2 2 2 V8 0 0 0 2 2 3 2 V9 0 0 0 2 2 3 2 V10 0 1 2 1 3 4 3 V11 0 1 2 1 3 4 3 V12 0 0 0 3 4 5 4 V13 0 0 0 3 4 5 4 V14 1 0 0 0 0 0 5 V15 1 0 0 0 0 0 5 V16 1 1 1 1 1 1 6 V17 1 1 1 1 1 1 6 V18 1 0 0 2 2 2 7 V19 1 0 0 2 2 2 7 V20 1 0 0 2 2 3 7 V21 1 0 0 2 2 3 7 V22 1 1 2 1 3 4 8 V23 1 1 2 1 3 4 8 V24 1 0 0 3 4 5 9 V25 1 0 0 3 4 5 9 > ko.SubCp12PIP.maxT.F=mt.maxT(ko.data.df,classlabel=class.label.df$Su bCp12PIP,test="f",side="abs",nonpara="n") b=100 b=200 b=300 b=400 b=500 b=600 b=700 b=800 b=900 b=1000 b=1100 b=1200 b=1300 b=1400 b=1500 b=1600 b=1700 b=1800 b=1900 b=2000 b=2100 b=2200 b=2300 b=2400 b=2500 b=2600 b=2700 b=2800 b=2900 b=3000 b=3100 b=3200 b=3300 b=3400 b=3500 b=3600 b=3700 b=3800 b=3900 b=4000 b=4100 b=4200 b=4300 b=4400 b=4500 b=4600 b=4700 b=4800 b=4900 b=5000 b=5100 b=5200 b=5300 b=5400 b=5500 b=5600 b=5700 b=5800 b=5900 b=6000 b=6100 b=6200 b=6300 b=6400 b=6500 b=6600 b=6700 b=6800 b=6900 b=7000 b=7100 b=7200 b=7300 b=7400 b=7500 b=7600 b=7700 b=7800 b=7900 b=8000 b=8100 b=8200 b=8300 b=8400 b=8500 b=8600 b=8700 b=8800 b=8900 b=9000 b=9100 b=9200 b=9300 b=9400 b=9500 b=9600 b=9700 b=9800 b=9900 b=10000 > ko.SubCp12PIP.maxT.F index teststat rawp adjp K05394 1975 4.536904e+03 0.0278000 0.0896 K01455 648 4.746965e+02 0.0238000 0.4072 K02394 1154 2.566388e+02 0.0243000 0.6114 K06990 2301 5.749225e+01 0.0297000 0.8393 K05795 2021 5.399053e+01 0.0157000 0.8511 K06641 2223 4.468233e+01 0.0265000 0.8778 K05916 2059 3.936887e+01 0.0296000 0.9033 K08321 2678 3.068437e+01 0.0001000 0.9339 K02074 1047 2.780696e+01 0.0044000 0.9476 K07783 2616 2.692854e+01 0.0023000 0.9512 K07318 2472 2.594598e+01 0.0297000 0.9572 K08137 2639 2.563818e+01 0.0297000 0.9596 K02479 1207 2.561986e+01 0.0297000 0.9596 K02082 1052 2.484221e+01 0.0252000 0.9614 K02771 1312 2.433006e+01 0.0265000 0.9667 K03446 1580 2.314993e+01 0.0028000 0.9694 K07214 2423 2.232653e+01 0.0001000 0.9757 K03325 1529 1.881246e+01 0.0001000 0.9850 K09124 2747 1.813063e+01 0.0002000 0.9858 K03762 1790 1.802233e+01 0.0303000 0.9866 K05791 2018 1.781049e+01 0.0309000 0.9876 K03739 1776 1.685187e+01 0.0265000 0.9894 #################### SNIP ################################ > ko.Sub.maxT.F=mt.maxT(ko.data.df,classlabel=class.label.df$Sub,test= "f",side="abs",nonpara="n") b=100 b=200 b=300 b=400 b=500 b=600 b=700 b=800 b=900 b=1000 b=1100 b=1200 b=1300 b=1400 b=1500 b=1600 b=1700 b=1800 b=1900 b=2000 b=2100 b=2200 b=2300 b=2400 b=2500 b=2600 b=2700 b=2800 b=2900 b=3000 b=3100 b=3200 b=3300 b=3400 b=3500 b=3600 b=3700 b=3800 b=3900 b=4000 b=4100 b=4200 b=4300 b=4400 b=4500 b=4600 b=4700 b=4800 b=4900 b=5000 b=5100 b=5200 b=5300 b=5400 b=5500 b=5600 b=5700 b=5800 b=5900 b=6000 b=6100 b=6200 b=6300 b=6400 b=6500 b=6600 b=6700 b=6800 b=6900 b=7000 b=7100 b=7200 b=7300 b=7400 b=7500 b=7600 b=7700 b=7800 b=7900 b=8000 b=8100 b=8200 b=8300 b=8400 b=8500 b=8600 b=8700 b=8800 b=8900 b=9000 b=9100 b=9200 b=9300 b=9400 b=9500 b=9600 b=9700 b=9800 b=9900 b=10000 > ko.Sub.maxT.F index teststat rawp adjp K07214 2423 4.388760e+01 0.0001 0.0017 K01812 867 3.582776e+01 0.0001 0.0072 K01209 554 3.493056e+01 0.0001 0.0087 K03111 1446 3.368798e+01 0.0001 0.0110 K03313 1519 3.286514e+01 0.0001 0.0129 K05878 2049 3.030097e+01 0.0001 0.0212 K01188 539 2.861773e+01 0.0001 0.0285 K07053 2349 2.625740e+01 0.0001 0.0451 K09705 2793 2.596417e+01 0.0001 0.0476 K07277 2451 2.584051e+01 0.0001 0.0492 K03543 1633 2.551410e+01 0.0003 0.0526 K10852 2989 2.493226e+01 0.0003 0.0594 K01191 541 2.465159e+01 0.0001 0.0634 K02238 1108 2.458557e+01 0.0003 0.0651 K03701 1751 2.365698e+01 0.0001 0.0812 K06998 2305 2.344949e+01 0.0001 0.0856 K12340 3131 2.284225e+01 0.0001 0.0983 K01678 779 2.280762e+01 0.0004 0.0996 K03771 1796 2.255402e+01 0.0006 0.1042 K03325 1529 2.252884e+01 0.0001 0.1047 K00647 262 2.246549e+01 0.0001 0.1061 K00282 135 2.227595e+01 0.0002 0.1099 K03657 1726 2.220074e+01 0.0001 0.1122 K01551 698 2.161330e+01 0.0001 0.1278 K05970 2072 2.160496e+01 0.0005 0.1281 K03453 1583 2.131683e+01 0.0003 0.1387 K08321 2678 2.118411e+01 0.0001 0.1428 K01811 866 2.092958e+01 0.0003 0.1512 K09687 2783 2.025709e+01 0.0001 0.1788 K07407 2494 1.909975e+01 0.0002 0.2376 K11741 3080 1.908456e+01 0.0001 0.2381 #################### SNIP ################################

Sequencing Microarray GUI Category Sequencing Microarray GUI Category • 1.0k views

ADD COMMENT • link 12.9 years ago Les Dethlefsen ▴ 20