Is it always preferable to recount library sizes of DGEList object in edgeR after filtering?
Entering edit mode
Lucy ▴ 60
Last seen 4 weeks ago
United Kingdom


I was wondering whether it is always preferable to recalculate the library sizes of your samples after any filtering. I have seen from the manual that this is recommended after filtering out lowly expressed genes, however I was wondering if you would also recommend this after filtering to retain only protein-coding genes? Could this not end up skewing the results if some of your samples had high expression of non-protein-coding genes?

The section of code that I am referring to is:

y <- y[keep, , keep.lib.sizes=FALSE]

Many thanks for the advice,


edgeR RnaSeqSampleSizeData RNAseq • 1.7k views
Entering edit mode
Last seen 8 hours ago
WEHI, Melbourne, Australia

No it doesn't skew the results. There is no assumption that the filtered genes are equally expressed in the different libraries.

We recommend keep.lib.sizes = FALSE after gene filtering and before calcNormFactors, regardless of whether the filtering is by expression level or by annotation type. So yes we would still recommend it even if you keep protein-coding genes only.

Having said that, setting keep.lib.sizes to TRUE or FALSE is not a crucial issue. The library size normalization done by normLibSizes will re-adjust the library sizes so you will end up with much the same effective library sizes either way. So leaving keep.lib.sizes = TRUE will give nearly the same DE results in the end.


Login before adding your answer.

Traffic: 707 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6