Question

how to explain the figure obtained from edgeR?

0

Entering edit mode

yuan.qing ▴ 10

@yuanqing-8002

Last seen 8.8 years ago

United States

I ploted histogram of unadjusted p-value. I encounted a huge peak at p-value=1, anybody knows the reason? Is it because the zero counts? How can I verify that?

hist(top.table.edgeR$PValue, breaks=100, main="",xlab="Unadjusted P values", col="lightblue")

Thanks for your help!

edgeR unadjusted p-value • 805 views

ADD COMMENT • link updated 8.8 years ago by Gordon Smyth 50k • written 8.8 years ago by yuan.qing ▴ 10

0

Entering edit mode

Without some context, it's impossible to tell what's going on. What is your experimental design? What steps did you take to get to top.table.edgeR? What does your data look like, and did you do any filtering?

ADD REPLY • link 8.8 years ago Aaron Lun ★ 28k

score 5 · Answer 1 · 2015-06-24

Yes, it is almost certainly because of genes for which all the counts are zero or very small.

A gene with no counts at all will naturally yield a p-value=1. Other very low counts will also give p-values=1 when the counts are as balanced as possible between the two experimental conditions. If you have a huge peak at p-value=1, it is probably because you haven't filtered out genes with consistently small counts.

Note that the exactTest() in edgeR will always give some over-representation of p-values equal to 1, because this is a property of exact tests. So you will usually get a peak at the 1 even when the analysis is done perfectly. If the peak is very large though, it is probably because you haven't filtered. We always recommend that users filter out genes with very small counts -- there is no point in keeping them in the analysis.