Is it always preferable to recount library sizes of DGEList object in edgeR after filtering?
Hi,

I was wondering whether it is always preferable to recalculate the library sizes of your samples after any filtering. I have seen from the manual that this is recommended after filtering out lowly expressed genes, however I was wondering if you would also recommend this after filtering to retain only protein-coding genes? Could this not end up skewing the results if some of your samples had high expression of non-protein-coding genes?

The section of code that I am referring to is:

y <- y[keep, , keep.lib.sizes=FALSE]


Lucy

No it doesn't skew the results. There is no assumption that the filtered genes are equally expressed in the different libraries.

We recommend keep.lib.sizes = FALSE after gene filtering and before calcNormFactors, regardless of whether the filtering is by expression level or by annotation type. So yes we would still recommend it even if you keep protein-coding genes only.

Thank you - so you would also recommend keep.lib.sizes = FALSE when filtering out non-protein-coding genes?