Entering edit mode
I know that DESeq2 does some filtering, but I wanted to filter before to decrease the number of genes and eliminate irrelevant data.
In my study, I have 22 samples and 2 comparison groups.
I used the following code:
smallestGroupSize <- 3
keep <- rowSums(counts(dds) >= 10) >= smallestGroupSize
dds <- dds[keep,]
How do you decide on the number 3 (3 samples)? Choosing the number 3 as the threshold of a number of samples methodologically correct?
I got it. I have 22 samples in total, one group is 7 and the other 15. So I should use 7 instead of 3. Am I correct?
Yes, that's correct.
Either this, or use
filterByExpr
from edgeR. Not that this is necessarily better, but it is heavily used and stood well the test of time. The choice of parameters, as it is for cutoffs, is to some extend arbitrary, but as said, defaults stood the test of time.