I am wondering about the edgeR normalization step (calcNormFactors) for CRISPR screens. The edgeR user guide says that TMM normalization is recommended for RNA-Seq data because "the highly expressed genes can consume a substantial proportion of the total library size, causing the remaining genes to be under-sampled in that sample. Unless this RNA composition effect is adjusted for, the remaining genes may falsely appear to be down-regulated in that sample." -- the same applies for CRISPR enrichment screens, where in many cases individual guides can take over large fractions of the library (in comparison to the starting point). However, in most of the case studies I have seen for CRISPR screens, no normalization is applied. What is the reason for this?
A related question concerns enrichment screens with a substantial bottleneck. For example, what would be the best way to deal with FACS screens where a small number of cells has been sorted from a complex library? Most guides will be well-represented in the baseline sample, but will have zero counts in the sorted sample, where only a few guides enrich massively. What kind of normalization (if any) should/could be applied? What is the best way to set up the analysis in edgeR in such situations?
Any thoughts would be much appreciated!