Question

Should zero counts in some replicates lead to inflated p-values?

0

Entering edit mode

nicolettacommins • 0

@1c9e52fd

Last seen 8 days ago

United States

I am doing a DE analysis comparing two strains, one wild type and one in which a repressor is overexpressed with 3. We expect that its downstream effectors will have very little expression and should come up as top hits in the DE analysis. We did the experiment in the presence and absence of drug and with 3 replicates per strain per condition. Without the drug, we see that all the counts for one of these effectors (gene X) in the WT strain are >10,000 and all the counts in the overexpression strain are 0. The log(FC) is very high and the p-value is close to 0, exactly what we expect.

In the presence of drug, all the counts in the WT strain for the same gene are >10,000, and the counts in the overexpression strain are 0 for two replicates and 140 for the third replicate. In this analysis the log(FC) for gene X is very high but the pvalue is insignificant. This doesn't match my intuition as 0,0,140 is clearly different from 10000, 10000, 10000. If I edit the raw counts matrix to have 1s in place of 0s. The gene is a significant hit. This also doesn't match my intuition as 0,0,140 shouldn't be very different from 1,1,140, especially when compared to a condition where the counts are very high. Generally, my volcano plots look pretty "flat" with many points having large FC but borderline and nonsignificant p-values. Maybe overall this is just indicating that we need more replicates? But in the case of this particular gene, the outcome doesn't seem to match what seems like should be the obvious outcome.

Does this have something to do with how DESeq2 handles 0 counts? Adding a pseudocount to all the 0s doesn't seem like the most kosher option, and I'm wondering if there are other parameters that might help in cases where a gene is expected to be repressed and have very low counts in one condition.

DESeq2 • 238 views

ADD COMMENT • link updated 17 days ago by Michael Love 42k • written 17 days ago by nicolettacommins • 0

score 0 · Answer 1 · 2024-10-21

0

Entering edit mode

Michael Love 42k

@mikelove

Last seen 3 hours ago

United States

Zeros are tricky, and significance in the global context is often counterintuitive with looking at genes one by one.

But 1s _are_ different than 0s and would imply to me lower variance so more arguing for a gene to be "significant".

ADD COMMENT • link 17 days ago Michael Love 42k