I am doing a DE analysis comparing two strains, one wild type and one in which a repressor is overexpressed with 3. We expect that its downstream effectors will have very little expression and should come up as top hits in the DE analysis. We did the experiment in the presence and absence of drug and with 3 replicates per strain per condition. Without the drug, we see that all the counts for one of these effectors (gene X) in the WT strain are >10,000 and all the counts in the overexpression strain are 0. The log(FC) is very high and the p-value is close to 0, exactly what we expect.
In the presence of drug, all the counts in the WT strain for the same gene are >10,000, and the counts in the overexpression strain are 0 for two replicates and 140 for the third replicate. In this analysis the log(FC) for gene X is very high but the pvalue is insignificant. This doesn't match my intuition as 0,0,140 is clearly different from 10000, 10000, 10000. If I edit the raw counts matrix to have 1s in place of 0s. The gene is a significant hit. This also doesn't match my intuition as 0,0,140 shouldn't be very different from 1,1,140, especially when compared to a condition where the counts are very high. Generally, my volcano plots look pretty "flat" with many points having large FC but borderline and nonsignificant p-values. Maybe overall this is just indicating that we need more replicates? But in the case of this particular gene, the outcome doesn't seem to match what seems like should be the obvious outcome.
Does this have something to do with how DESeq2 handles 0 counts? Adding a pseudocount to all the 0s doesn't seem like the most kosher option, and I'm wondering if there are other parameters that might help in cases where a gene is expected to be repressed and have very low counts in one condition.