Right skewed histogram of p-values
1
0
Entering edit mode
@giulio-di-giovanni-950
Last seen 10.2 years ago
Hi. During a DE analysis, done with limma ebayes and toptable, the histogram of the p-values doesn't show a high number of low p-values, and not even an uniform distribution, which I may expect under the null hypothesis of no-differentiation (right?), but something totally skewed to the right: low freqencies for the small p-values and increasing frequencies as the p-value on th x-axis increases. here it is http://img522.imageshack.us/img522/2169/testkpv.jpg (I'm pretty sure about the correctness of the test. I did it on other comparisons and it always gave nice and straightforward results). I work on peptide arrays and I measure the immune-response. Since the comparison I'm doing is between sick vs healthy individuals for a pathology BUT all the individuals were also diagnosed as sick for ANOTHER pathology, I explained the phenomenon with a confounding effect of the second pathology, which is altering the immune-response, and for which the data are not controlled. But, from the data point of view, how can I comment on the graph? Does anybody have an idea what we can say of a histogram of p-values where we have few low p-values and many more high p-values? Thanks in advance G [[alternative HTML version deleted]]
limma limma • 3.8k views
ADD COMMENT
3
Entering edit mode
Simon Anders ★ 3.8k
@simon-anders-3855
Last seen 4.3 years ago
Zentrum für Molekularbiologie, Universi…
Hi you've already got a completely satisfying explanation. A right skew of the p value histogram is in fact a typical sign for a covariate for which you do not control. A quick example to demonstrate. Let's simulate 1000 times a sample of four draws from normal distributions: y <- cbind( rnorm( 1000, 20, 4 ), rnorm( 1000, 20, 4 ), rnorm( 1000, 20, 4 ), rnorm( 1000, 20, 4 ) ) The first two are supposed to be control, the third and fourth treatment, and they all have the same mean, i.e., the treatment has no effect. Doing a t test on each realization gives us nicely uniform p values: library(genefilter) hist( rowttests( y, factor( c( "C", "C", "T", "T" ) ) )$p.value ) Now, assume that one of the two control and one of the two treatment samples has an elevated mean: y <- cbind( rnorm( 1000, 20, 4 ), rnorm( 1000, 30, 4 ), rnorm( 1000, 20, 4 ), rnorm( 1000, 30, 4 ) ) In this case, you get right-skewed p values, because the t test is not informed of the extra effect present in one sample of each of the two groups: hist( rowttests( y, factor( c( "C", "C", "T", "T" ) ) )$p.value ) Simon
ADD COMMENT

Login before adding your answer.

Traffic: 675 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6