This is a pretty well discussed subject, nevertheless every time I analyze some new form of RNA-seq data I hit the issue of how to filter lowly expressed genes in a differential expression analysis.
My data are read counts of micro-RNAs, which have somewhat of a lower expression range than mRNA.
I have 4 experimental conditions (4 genotypes), with 3 sample for each one, which I'm using limma for the differential expression analysis.
If I follow the limma guide and keep exons that have more than 1 cpm in at least 3 samples I loose quite a lot of microRNAs, some of them are real signal, since these 3 samples may all be from the same genotype that is down regulated.
Perhaps a more sensible filtering approach is to set to zero all samples of a certain experimental condition for which 3 or more samples have cpm <= 1. The problem here is that the cutoff is arbitrary and therefore genes which in one condition were a bit below the cutoff and hence set to 0, but in another condition were a bit above it and hence left as they are, will be false positives.
So my question is if there is a happy medium?