I’m following the DESeq2 tutorial to perform DGE analysis. I noticed that before run
Dds <- DESeq(dds)
it is recommended to remove genes whose counts are 0 for all the samples. I have question about this step:
- Why just genes with 0 counts in all samples? What about genes that add up a total of 5 counts considering all the samples? And 10? Which will be a reasonable threshold? I’m sure that for many experienced people doing DGE it should be a number that is sounded as a correct a safe threshold. I will like to have some advice regarding this question.
- I understand that DESeq2 perform independent filtering, and that for this purpose it identify a threshold base in counts and remove genes that given the counts cannot produce a trustable result. My question is: why to bother to perform the above step if these genes are going to be filtered any way.