If you want to filter, you can do so before running DESeq:
dds <- estimateSizeFactors(dds) idx <- rowSums( counts(dds, normalized=TRUE) >= 5 ) >= 3
This would say, e.g. filter out genes where there are less than 3 samples with normalized counts greater than or equal to 5.
dds <- dds[idx,] dds <- DESeq(dds)
However, you typically don't need to pre-filter because independent filtering occurs within results() to save you from multiple test correction on genes with no power (see ?results and the vignette section about independent filtering, or the paper). The main reason to pre-filter would be to increase speed. Designs with many samples and many interaction terms can slow down on genes which have very few reads.
I have a similar question: In an experiment with 5 strains in triplicates, I have a gene with the following normalized counts:
Strain-4: 0,0, 2.6
After running DESeq2, this gene is flagged and given "NA" for pvalue and adjusted.value, which makes sense. However, when I rerun the analysis with only first two replicates per strain (highlighted bold) and compare strain 5 and 4, this gene comes up as differentially expressed: baseMean=9.5 and log2FoldChange=3.3. I am wondering why is this gene not being flagged? and more importantly, how is deseq2 able to compute a fold change when the normalized counts for this gene in strain-4 are zeros.
Appreciate your help.