Filtering Rna-seq counts before performing differential expression analysis is generally recommanded. I wonder why is it recommanded? What makes the analysis better that if no filtering was performed.?
"Genes that have very low counts across all the libraries should be removed prior to downstream analysis. This is justified on both biological and statistical grounds. From biological point of view, a gene must be expressed at some minimal level before it is likely to be translated into a protein or to be considered biologically important. From a statistical point of view, genes with consistently low counts are very unlikely be assessed as significantly DE because low counts do not provide enough statistical evidence for a reliable judgement to be made. Such genes can therefore be removed from the analysis without any loss of information."
Filtering improves dispersion estimation (because one doesn't try to estimate dispersions for genes with no information), improves statistical power (because it reduces the amount of testing) and decreases computation. Most important of all, filtering allows good empirical Bayes estimation across genes because it makes the remaining genes more homogeneous.