Question

p.adjust BH generates duplicate values

0

Entering edit mode

David Young ▴ 10

@david-young-4857

Last seen 10.2 years ago

Hi all, I was doing an RMA->limma (ebayes) analysis of an affymetrix mouse 430a experiment and noticed that while the p-values listed in toptable were all different, the adjusted p-values (adjust="BH") contained duplicate values. I don't think this is incorrect necessarily, but I was wondering why a different alpha wasn't generated for each gene. From what I understand, the BH method gets the adjusted p-value (alpha) from [P_k*n*c(n) ] / k < alpha, where n = total number of genes (tests), P_k = p-value at kth gene (genes ordered from low to high p-value), and k = number of genes with p-value less than or equal to P_k. I'm not entirely sure how the c(n) (dependence correction) part works, but it seems like a unique adjusted p-value (alpha) could be generated for each gene. Instead I get: >top<-topTable(efit, adjust="BH", n=nrow(exprs(rmadata))) >write.table(top, "output.xls", sep="\t") from output.xls... ID adj.P.Val P.Value Mm.277921 0.039259664 3.17E-06 Mm.272646 0.050424143 9.93E-06 Mm.148886 0.050424143 1.64E-05 Mm.235998 0.050424143 2.02E-05 Mm.4598 0.050424143 2.04E-05 Mm.10728 0.101013086 4.89E-05 Mm.162744 0.106930684 6.34E-05 Mm.247564 0.106930684 6.91E-05 Mm.269384 0.115716969 8.62E-05 Mm.212428 0.115716969 9.34E-05 Mm.457989 0.118548889 0.000126578 Mm.154662 0.118548889 0.000128005 Mm.21005 0.118548889 0.000133975 Mm.5109 0.149489879 0.000196053 Mm.207432 0.149489879 0.00020444 Does anyone know why several probesets have the same adjusted p value even though the regular p value is different for each gene? I'm 90% sure this is just my ignorance about the BH method, but I'll be very thankful to anyone who can point me in the right direction. Thanks in advance, Dave Young > sessionInfo() R version 2.13.1 (2011-07-08) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] limma_3.8.3 mouse430a2mmugcdf_14.1.0 simpleaffy_2.28.0 gcrma_2.24.1 [5] genefilter_1.34.0 affy_1.30.0 Biobase_2.12.2 loaded via a namespace (and not attached): [1] affyio_1.20.0 annotate_1.30.1 AnnotationDbi_1.14.1 Biostrings_2.20.3 [5] DBI_0.2-5 IRanges_1.10.6 preprocessCore_1.14.0 RSQLite_0.9-4 [9] splines_2.13.1 survival_2.36-9 tools_2.13.1 xtable_1.5-6 [[alternative HTML version deleted]]

• 2.7k views

ADD COMMENT • link updated 13.2 years ago by Gordon Smyth 51k • written 13.2 years ago by David Young ▴ 10

score 1 · Answer 1 · 2011-09-16

Dear David,

You have described the first step of the BH algorithm. However there is a second step which ensures that the adjusted p-values are monotonic in the original p-values. It is this second step that sometimes causes a series of genes to get the same adjusted p-value. This occurs whenever the first-step adjusted p-value for a less significant gene is lower than that for a more significant gene.

Best wishes
Gordon