Is it always preferable to recount library sizes of DGEList object in edgeR after filtering?
1
1
Entering edit mode
Lucy ▴ 60
@lucy-17014
Last seen 4 weeks ago
United Kingdom

Hi,

I was wondering whether it is always preferable to recalculate the library sizes of your samples after any filtering. I have seen from the manual that this is recommended after filtering out lowly expressed genes, however I was wondering if you would also recommend this after filtering to retain only protein-coding genes? Could this not end up skewing the results if some of your samples had high expression of non-protein-coding genes?

The section of code that I am referring to is:

y <- y[keep, , keep.lib.sizes=FALSE]

Many thanks for the advice,

Lucy

edgeR RnaSeqSampleSizeData RNAseq • 1.7k views
ADD COMMENT
2
Entering edit mode
@gordon-smyth
Last seen 8 hours ago
WEHI, Melbourne, Australia

No it doesn't skew the results. There is no assumption that the filtered genes are equally expressed in the different libraries.

We recommend keep.lib.sizes = FALSE after gene filtering and before calcNormFactors, regardless of whether the filtering is by expression level or by annotation type. So yes we would still recommend it even if you keep protein-coding genes only.

Having said that, setting keep.lib.sizes to TRUE or FALSE is not a crucial issue. The library size normalization done by normLibSizes will re-adjust the library sizes so you will end up with much the same effective library sizes either way. So leaving keep.lib.sizes = TRUE will give nearly the same DE results in the end.

ADD COMMENT

Login before adding your answer.

Traffic: 707 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6