A gene in my dataset has average counts of 0 (control) vs. 13.3 (treated). My DESeq2 analysis determines the gene to have a padj = 5.8E-6 and an average LFC of 6.6.
1) How can a LFC value be calculated in this scenario?
2) While I can see how a statistical test could potentially yield padj ≤ 0.05, this isn't a hit I would follow up on. Anyone have first-hand experience or know of literature that determines whether such small shifts in expression (e.g. 0 to 15 counts) correlate with meaningful biology? Am I naive to discredit "hits" arising from such low counts?
Thanks for any advice! Code with data below.
First, I took the 25 genes with highest LFC and padj ≤ 0.05. Plotted the rlog values in a heat map and noticed YDL022C-A was an outlier.
sigGenes <- as.data.frame(results_shrunk) %>% tibble::rownames_to_column("GeneID") %>% filter(padj <= 0.05) sigTop25 <- sigGenes %>% slice_max(log2FoldChange, n=25) %>% pull("GeneID") rld <- rlog(dds_filtered, blind=FALSE) plotDat <- assay((rld)[sigTop25,]) Heatmap(plotDat)
Checked values in rlog transformation:
> assay((rld)["YDL022C-A",]) sample1 sample2 sample3 sample4 sample5 sample6 YDL022C-A 1.723071 3.014438 1.724505 3.134384 1.730045 2.827426
Seemed fairly low, so I checked results values, where "sigTop25" were pulled from, above.
>results_shrunk["YDL022C-A",] log2 fold change (MAP): Treatment 0.4M NaCl 15min vs 0.4M NaCl 0min Wald test p-value: Treatment 0.4M NaCl 15min vs 0.4M NaCl 0min DataFrame with 1 row and 5 columns baseMean log2FoldChange lfcSE pvalue padj <numeric> <numeric> <numeric> <numeric> <numeric> YDL022C-A 7.14097 6.6092 2.59133 2.84041e-06 5.81039e-06
The lfcSE was high, so took a look at the raw count data.
> counts(dds_filtered)["YDL022C-A",] sample1 sample2 sample3 sample4 sample5 sample6 0 13 0 16 0 11 > sum(counts(dds_filtered))  229073425 `
The odd-numbered samples belong to the control group - all have 0 counts.