Entering edit mode
Hi list,
Just a question regarding edgeR and dataset processing/filtering prior
to
calling differential expression.
Case Study 12 (RNA-seq of Hormone-Treated LNCaP Cells) from the edgeR
manual mentions that:
"We filter out lowly expressed tags and those which are only expressed
in
a small number of samples. We keep only those tags that have at least
one
count per million in at least three samples."
Then in section 6 of the manual it mentions that:
"The edgeR methodology needs to work with the original digital
expression
counts, so these should not be transformed in any way by users prior
to
analysis. edgeR automatically takes into account the total size (total
read number) of each library in all calculations of fold-changes,
concentration and statistical significance."
My question is whether filtering counts as "transforming" the data.
Since
this would affect the total size of each library and thus affecting
all
downstream calculations, is it OK to use such filters? And what should
one
be cautious about when applying such filters e.g. at least n tags in n
samples, prior to performing the edgeR analysis?
Many thanks,
--
Dave