Hello,
I have a question about filtering lowly expressed probes (or genes) when using limma.
For the example in section 17.4 of the limma user's guide, probes are called expressed if they exceed a cut-off in more than a given number of samples (equal to the size of the smallest treatment group):
isexpr <- rowSums(y$E > cutoff) >= 4
I was wondering whether it would make sense to use the treatment group information to look for probes that are expressed in every sample in any treatment group (or perhaps a proportion of samples for any treatment). Something like this perhaps:
proportion <- 1.0 isexpr2 <- apply(y$E > cutoff, 1, function(z){ any(sapply(levels(Treatment), function(treat){ sum(z[Treatment == treat]) >= sum(Treatment == treat)*proportion })) })
For this example, the normal approach yields 32754 expressed probes, whereas this yields 30840 probes.
I have seen other answers on this subject (e.g. https://support.bioconductor.org/p/52762/) warning against filtering based on variance because it would affect the limma algorithms, but I am not sure whether this is quite the same thing.
Would this be a technically valid? Even if it is, it may just be unnecessarily complicated.
Very good. Thanks Steve.