Big p-value/adj p-value with quite good read count
1
0
Entering edit mode
bharata1803 ▴ 60
@bharata1803-7698
Last seen 5.7 years ago
Japan

Hello,

So, I am working with RNA-seq data and voom/limma workflow. After I finished my workflow and get the list of DE genes, I tried to filter by p-value. I choose <=0.1 for my cutoff value. After that, I chekced one of gene and it isn't in my list. I noticed that it maybe because the p-value and it is. The p-value is big, almost 0.2. I tried to find the reason why p-value is big. I check the read count. The raw read count is actually quite big. Below is the readcount:

1 Cat_1_1 2097.070
2 Cat_1_2 1866.160
3 Cat_1_3 2539.440
4 Cat_1_4 2048.650
5 Cat_1_5 1628.770
6 Cat_1_7 3241.710
7 Cat_2_1 807.168
8 Cat_2_2 7171.430
9 Cat_2_3 8759.580
10 Cat_3_1 1213.360
11 Cat_3_2 339.301
12 Cat_3_3 2096.140
13 Cat_3_4 888.941
14 Cat_3_5 1381.800
15 Cat_3_6 3281.890
16 Cat_3_7 2498.580

 

Below is values after I used voom:

  row.names x
1 Cat_1_1 5.383155
2 Cat_1_2 5.202185
3 Cat_1_3 5.568586
4 Cat_1_4 5.500774
5 Cat_1_5 5.625384
6 Cat_1_7 5.878762
7 Cat_2_1 4.356846
8 Cat_2_2 7.397590
9 Cat_2_3 7.628476
10 Cat_3_1 5.568482
11 Cat_3_2 4.578878
12 Cat_3_3 6.982345
13 Cat_3_4 5.806681
14 Cat_3_5 6.466003
15 Cat_3_6 7.618232
16 Cat_3_7 7.357043

I checked the Cat_1 vs Cat_3. What is the reason the p-value is big? With that read count, I hope I can get the p-value to be significant and that gene is one of the important gene to check.

rnaseq voom limma • 1.3k views
ADD COMMENT
0
Entering edit mode

What is your experimental design? Based on what you say you're comparing, I assume that this is a one-way layout with three groups - Cat_1, Cat_2 and Cat_3 - is that correct? Also, why are your read counts not integer values?

ADD REPLY
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 30 minutes ago
The city by the bay

Well, for starters, the variance is estimated using all samples. For this gene, there are several aberrant samples; Cat_2_1 in particular, in which the count is 10-fold lower than its replicates, but also Cat_3_2 (also 10-fold lower) and Cat_3_4 to a lesser extent. This results in a large variance for this gene, which leads to a larger p-value. Moreover, the counts don't suggest that there's a strong difference between Cat_1 and Cat_3. Both groups have counts from 1000 - 3000, so that's not very strong evidence for DE.

ADD COMMENT
0
Entering edit mode

Thank you. The setup is I want 3-way comparison of the 3 categories. The result is not integer because I use Salmon to generate the gene read count and I don't round it. I understand your explanation. So, basically, several "bad" data cause the p-value to be insignificant.  

ADD REPLY
0
Entering edit mode

Well, no, I don't think it is because of one or two "bad" observations. The Categories just don't look at all different.

Category 2 has cpm values that range from 4.36 to 7.63, completely covering the whole range of values of Category 1.

Category 3 has cpm values than range from 4.58 to 7.62, again completely covering the range of Category 1.

Category 2 and 3 look almost identical in terms of range of values.

The Categories are internally variable and not systematically different. Hence the big p-value.

ADD REPLY

Login before adding your answer.

Traffic: 945 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6