Should zero counts in some replicates lead to inflated p-values?
1
0
Entering edit mode
@1c9e52fd
Last seen 8 days ago
United States

I am doing a DE analysis comparing two strains, one wild type and one in which a repressor is overexpressed with 3. We expect that its downstream effectors will have very little expression and should come up as top hits in the DE analysis. We did the experiment in the presence and absence of drug and with 3 replicates per strain per condition. Without the drug, we see that all the counts for one of these effectors (gene X) in the WT strain are >10,000 and all the counts in the overexpression strain are 0. The log(FC) is very high and the p-value is close to 0, exactly what we expect.

In the presence of drug, all the counts in the WT strain for the same gene are >10,000, and the counts in the overexpression strain are 0 for two replicates and 140 for the third replicate. In this analysis the log(FC) for gene X is very high but the pvalue is insignificant. This doesn't match my intuition as 0,0,140 is clearly different from 10000, 10000, 10000. If I edit the raw counts matrix to have 1s in place of 0s. The gene is a significant hit. This also doesn't match my intuition as 0,0,140 shouldn't be very different from 1,1,140, especially when compared to a condition where the counts are very high. Generally, my volcano plots look pretty "flat" with many points having large FC but borderline and nonsignificant p-values. Maybe overall this is just indicating that we need more replicates? But in the case of this particular gene, the outcome doesn't seem to match what seems like should be the obvious outcome.

Does this have something to do with how DESeq2 handles 0 counts? Adding a pseudocount to all the 0s doesn't seem like the most kosher option, and I'm wondering if there are other parameters that might help in cases where a gene is expected to be repressed and have very low counts in one condition.

DESeq2 • 238 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 3 hours ago
United States

Zeros are tricky, and significance in the global context is often counterintuitive with looking at genes one by one.

But 1s _are_ different than 0s and would imply to me lower variance so more arguing for a gene to be "significant".

ADD COMMENT

Login before adding your answer.

Traffic: 950 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6