Question: Right skewed histogram of p-values
0
gravatar for Giulio Di Giovanni
8.9 years ago by
Giulio Di Giovanni540 wrote:
Hi. During a DE analysis, done with limma ebayes and toptable, the histogram of the p-values doesn't show a high number of low p-values, and not even an uniform distribution, which I may expect under the null hypothesis of no-differentiation (right?), but something totally skewed to the right: low freqencies for the small p-values and increasing frequencies as the p-value on th x-axis increases. here it is http://img522.imageshack.us/img522/2169/testkpv.jpg (I'm pretty sure about the correctness of the test. I did it on other comparisons and it always gave nice and straightforward results). I work on peptide arrays and I measure the immune-response. Since the comparison I'm doing is between sick vs healthy individuals for a pathology BUT all the individuals were also diagnosed as sick for ANOTHER pathology, I explained the phenomenon with a confounding effect of the second pathology, which is altering the immune-response, and for which the data are not controlled. But, from the data point of view, how can I comment on the graph? Does anybody have an idea what we can say of a histogram of p-values where we have few low p-values and many more high p-values? Thanks in advance G [[alternative HTML version deleted]]
limma • 2.1k views
ADD COMMENTlink modified 8.9 years ago by Simon Anders3.6k • written 8.9 years ago by Giulio Di Giovanni540
Answer: Right skewed histogram of p-values
3
gravatar for Simon Anders
8.9 years ago by
Simon Anders3.6k
Zentrum für Molekularbiologie, Universität Heidelberg
Simon Anders3.6k wrote:
Hi you've already got a completely satisfying explanation. A right skew of the p value histogram is in fact a typical sign for a covariate for which you do not control. A quick example to demonstrate. Let's simulate 1000 times a sample of four draws from normal distributions: y <- cbind( rnorm( 1000, 20, 4 ), rnorm( 1000, 20, 4 ), rnorm( 1000, 20, 4 ), rnorm( 1000, 20, 4 ) ) The first two are supposed to be control, the third and fourth treatment, and they all have the same mean, i.e., the treatment has no effect. Doing a t test on each realization gives us nicely uniform p values: library(genefilter) hist( rowttests( y, factor( c( "C", "C", "T", "T" ) ) )$p.value ) Now, assume that one of the two control and one of the two treatment samples has an elevated mean: y <- cbind( rnorm( 1000, 20, 4 ), rnorm( 1000, 30, 4 ), rnorm( 1000, 20, 4 ), rnorm( 1000, 30, 4 ) ) In this case, you get right-skewed p values, because the t test is not informed of the extra effect present in one sample of each of the two groups: hist( rowttests( y, factor( c( "C", "C", "T", "T" ) ) )$p.value ) Simon
ADD COMMENTlink written 8.9 years ago by Simon Anders3.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 255 users visited in the last hour