I am involved in a project aimed to create a transcriptomics analysis framework for regulatory purposes (FDA, OECD, …). We are currently investigating DEG identification. For regulatory purposes, we need to end up with a list of DEGs which is highly reliable and robust (more so than is used in common research). In order to facilitate this, we like to apply more stringent thresholds.
Within the project, a discussion has emerged on WHEN to filter out low-expressed genes. Should this be done before normalization or after? Within EdgeR, there is the filterByExpr function before normalization. However, for DESeq2 independent filtering occurs afterwards.
Therefore, some questions have arisen and I was hoping that you would be able to answer them: - Is the proper way to filter low-expressed genes different between these methods? - Removing low-expressed genes before normalization does have an impact (total read count decreases somewhat, amount of detected genes decrease, variance in the dataset changes, ..). Is this beneficial or detrimental to the reliability/robustness of the obtained DEGs? - What would be you’re advice for making the filtering of low-expressed genes more stringent?
I really hope you can give some information regarding this issue.
Best, Marcha