Question

Right skewed histogram of p-values

0

Entering edit mode

Giulio Di Giovanni ▴ 540

@giulio-di-giovanni-950

Last seen 11.4 years ago

Hi. During a DE analysis, done with limma ebayes and toptable, the histogram of the p-values doesn't show a high number of low p-values, and not even an uniform distribution, which I may expect under the null hypothesis of no-differentiation (right?), but something totally skewed to the right: low freqencies for the small p-values and increasing frequencies as the p-value on th x-axis increases. here it is http://img522.imageshack.us/img522/2169/testkpv.jpg (I'm pretty sure about the correctness of the test. I did it on other comparisons and it always gave nice and straightforward results). I work on peptide arrays and I measure the immune-response. Since the comparison I'm doing is between sick vs healthy individuals for a pathology BUT all the individuals were also diagnosed as sick for ANOTHER pathology, I explained the phenomenon with a confounding effect of the second pathology, which is altering the immune-response, and for which the data are not controlled. But, from the data point of view, how can I comment on the graph? Does anybody have an idea what we can say of a histogram of p-values where we have few low p-values and many more high p-values? Thanks in advance G [[alternative HTML version deleted]]

limma limma • 4.2k views

ADD COMMENT • link updated 15.3 years ago by Simon Anders ★ 3.8k • written 15.3 years ago by Giulio Di Giovanni ▴ 540

score 3 · Answer 1 · 2010-09-27

Hi you've already got a completely satisfying explanation. A right skew of the p value histogram is in fact a typical sign for a covariate for which you do not control. A quick example to demonstrate. Let's simulate 1000 times a sample of four draws from normal distributions: y <- cbind( rnorm( 1000, 20, 4 ), rnorm( 1000, 20, 4 ), rnorm( 1000, 20, 4 ), rnorm( 1000, 20, 4 ) ) The first two are supposed to be control, the third and fourth treatment, and they all have the same mean, i.e., the treatment has no effect. Doing a t test on each realization gives us nicely uniform p values: library(genefilter) hist( rowttests( y, factor( c( "C", "C", "T", "T" ) ) )$p.value ) Now, assume that one of the two control and one of the two treatment samples has an elevated mean: y <- cbind( rnorm( 1000, 20, 4 ), rnorm( 1000, 30, 4 ), rnorm( 1000, 20, 4 ), rnorm( 1000, 30, 4 ) ) In this case, you get right-skewed p values, because the t test is not informed of the extra effect present in one sample of each of the two groups: hist( rowttests( y, factor( c( "C", "C", "T", "T" ) ) )$p.value ) Simon