Question

Where in the procedure do I remove low-expressed genes?

0

Entering edit mode

marcha.verheijen • 0

@marchaverheijen-23048

Last seen 5.0 years ago

I am involved in a project aimed to create a transcriptomics analysis framework for regulatory purposes (FDA, OECD, …). We are currently investigating DEG identification. For regulatory purposes, we need to end up with a list of DEGs which is highly reliable and robust (more so than is used in common research). In order to facilitate this, we like to apply more stringent thresholds.

Within the project, a discussion has emerged on WHEN to filter out low-expressed genes. Should this be done before normalization or after? Within EdgeR, there is the filterByExpr function before normalization. However, for DESeq2 independent filtering occurs afterwards.

Therefore, some questions have arisen and I was hoping that you would be able to answer them: - Is the proper way to filter low-expressed genes different between these methods? - Removing low-expressed genes before normalization does have an impact (total read count decreases somewhat, amount of detected genes decrease, variance in the dataset changes, ..). Is this beneficial or detrimental to the reliability/robustness of the obtained DEGs? - What would be you’re advice for making the filtering of low-expressed genes more stringent?

I really hope you can give some information regarding this issue.

Best, Marcha

deseq2 edger • 2.9k views

ADD COMMENT • link updated 5.0 years ago by Michael Love 43k • written 5.0 years ago by marcha.verheijen • 0

score 0 · Answer 1 · 2020-03-05

It is perfectly fine in DESeq2 to filter beforehand. If you know you want to remove lowly expressed genes for robustness across experiments (e.g. the next experiment may have lower depth and not even detect these genes), then you can pre-filter with DESeq2. I'd recommend a filter similar to filterByExpr, such as x or more samples with a count of 10 or more, where you might choose x to be the sample size of the smallest group. And then you can set independentFiltering=FALSE in results(), so that the only filtering that happens is the pre-filter.