Is it always preferable to recount library sizes of DGEList object in edgeR after filtering?
1
0
Entering edit mode
Lucy ▴ 20
@lucy-17014
Last seen 6 weeks ago
United Kingdom

Hi,

I was wondering whether it is always preferable to recalculate the library sizes of your samples after any filtering. I have seen from the manual that this is recommended after filtering out lowly expressed genes, however I was wondering if you would also recommend this after filtering to retain only protein-coding genes? Could this not end up skewing the results if some of your samples had high expression of non-protein-coding genes?

The section of code that I am referring to is:

y <- y[keep, , keep.lib.sizes=FALSE]


Lucy

edgeR RnaSeqSampleSizeData RNAseq • 181 views
2
Entering edit mode
@gordon-smyth
Last seen 2 hours ago
WEHI, Melbourne, Australia

No it doesn't skew the results. There is no assumption that the filtered genes are equally expressed in the different libraries.

We recommend keep.lib.sizes = FALSE after gene filtering and before calcNormFactors, regardless of whether the filtering is by expression level or by annotation type. So yes we would still recommend it even if you keep protein-coding genes only.

0
Entering edit mode

Thank you - so you would also recommend keep.lib.sizes = FALSE when filtering out non-protein-coding genes?