Question

Big p-value/adj p-value with quite good read count

0

Entering edit mode

bharata1803 ▴ 60

@bharata1803-7698

Last seen 5.1 years ago

Japan

Hello,

So, I am working with RNA-seq data and voom/limma workflow. After I finished my workflow and get the list of DE genes, I tried to filter by p-value. I choose <=0.1 for my cutoff value. After that, I chekced one of gene and it isn't in my list. I noticed that it maybe because the p-value and it is. The p-value is big, almost 0.2. I tried to find the reason why p-value is big. I check the read count. The raw read count is actually quite big. Below is the readcount:

1	Cat_1_1	2097.070
2	Cat_1_2	1866.160
3	Cat_1_3	2539.440
4	Cat_1_4	2048.650
5	Cat_1_5	1628.770
6	Cat_1_7	3241.710
7	Cat_2_1	807.168
8	Cat_2_2	7171.430
9	Cat_2_3	8759.580
10	Cat_3_1	1213.360
11	Cat_3_2	339.301
12	Cat_3_3	2096.140
13	Cat_3_4	888.941
14	Cat_3_5	1381.800
15	Cat_3_6	3281.890
16	Cat_3_7	2498.580

Below is values after I used voom:

	row.names	x
1	Cat_1_1	5.383155
2	Cat_1_2	5.202185
3	Cat_1_3	5.568586
4	Cat_1_4	5.500774
5	Cat_1_5	5.625384
6	Cat_1_7	5.878762
7	Cat_2_1	4.356846
8	Cat_2_2	7.397590
9	Cat_2_3	7.628476
10	Cat_3_1	5.568482
11	Cat_3_2	4.578878
12	Cat_3_3	6.982345
13	Cat_3_4	5.806681
14	Cat_3_5	6.466003
15	Cat_3_6	7.618232
16	Cat_3_7	7.357043

I checked the Cat_1 vs Cat_3. What is the reason the p-value is big? With that read count, I hope I can get the p-value to be significant and that gene is one of the important gene to check.

rnaseq voom limma • 1.1k views

ADD COMMENT • link updated 8.4 years ago by Aaron Lun ★ 28k • written 8.4 years ago by bharata1803 ▴ 60

0

Entering edit mode

What is your experimental design? Based on what you say you're comparing, I assume that this is a one-way layout with three groups - Cat_1, Cat_2 and Cat_3 - is that correct? Also, why are your read counts not integer values?

ADD REPLY • link 8.4 years ago Aaron Lun ★ 28k

score 2 · Answer 1 · 2015-12-08

2

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 2 hours ago

The city by the bay

Well, for starters, the variance is estimated using all samples. For this gene, there are several aberrant samples; Cat_2_1 in particular, in which the count is 10-fold lower than its replicates, but also Cat_3_2 (also 10-fold lower) and Cat_3_4 to a lesser extent. This results in a large variance for this gene, which leads to a larger p-value. Moreover, the counts don't suggest that there's a strong difference between Cat_1 and Cat_3. Both groups have counts from 1000 - 3000, so that's not very strong evidence for DE.

ADD COMMENT • link 8.4 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thank you. The setup is I want 3-way comparison of the 3 categories. The result is not integer because I use Salmon to generate the gene read count and I don't round it. I understand your explanation. So, basically, several "bad" data cause the p-value to be insignificant.

ADD REPLY • link 8.4 years ago bharata1803 ▴ 60

0

Entering edit mode

Well, no, I don't think it is because of one or two "bad" observations. The Categories just don't look at all different.

Category 2 has cpm values that range from 4.36 to 7.63, completely covering the whole range of values of Category 1.

Category 3 has cpm values than range from 4.58 to 7.62, again completely covering the range of Category 1.

Category 2 and 3 look almost identical in terms of range of values.

The Categories are internally variable and not systematically different. Hence the big p-value.

ADD REPLY • link 8.4 years ago Gordon Smyth 50k