Dear Michael, I am new in the field of RNAseq analysis and I find DESeq2 a great and intuitive tool. I read your RNAseq workflow on bioconductor (https://www.bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html) and I have just a few naive questions that I would like to ask you. These are focused on the startegy of filtering-out low-expressed genes, before proceeding with the analyses (pre-filtering). In the workflow I see keep <- rowSums(counts(dds)) > 1 keep <- rowSums(counts(dds) >= 10) >= 3 (e.g. the smallest group size)
while in DESeq2 analysis page (http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html) the "suggestion" is keep <- rowSums(counts(dds)) >= 10
So, my newbie's questions are whether there is a general suggestion on how to pre-filter (guess no), and whether fixing a threshold at 10 counts (e.g for a gene in a sample) stems from statistical grounds or it is just a matter of sense. Finally, if I get it correctly, filtering by counts does not take into account potential differences in library sizes. Would it be reasonable to use normalized counts and/or cpm? Any advice on this point? Just for completion, I am working on RNAseq data from human tumor FFPE tissues, and have a limited number of sample (say 30 overall, with smallest group of 10) Sorry for bothering you with these naive questions, and thank you for your time and patience.
Thank you for the fast reply.