prefiletring before PCA and DESeq2 DE analysis
1
0
Entering edit mode
luca.s ▴ 50
@lucas-24386
Last seen 5 months ago
Italy

Dear Michael, I am new in the field of RNAseq analysis and I find DESeq2 a great and intuitive tool. I read your RNAseq workflow on bioconductor (https://www.bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html) and I have just a few naive questions that I would like to ask you. These are focused on the startegy of filtering-out low-expressed genes, before proceeding with the analyses (pre-filtering). In the workflow I see keep <- rowSums(counts(dds)) > 1 keep <- rowSums(counts(dds) >= 10) >= 3 (e.g. the smallest group size)

while in DESeq2 analysis page (http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html) the "suggestion" is keep <- rowSums(counts(dds)) >= 10

So, my newbie's questions are whether there is a general suggestion on how to pre-filter (guess no), and whether fixing a threshold at 10 counts (e.g for a gene in a sample) stems from statistical grounds or it is just a matter of sense. Finally, if I get it correctly, filtering by counts does not take into account potential differences in library sizes. Would it be reasonable to use normalized counts and/or cpm? Any advice on this point? Just for completion, I am working on RNAseq data from human tumor FFPE tissues, and have a limited number of sample (say 30 overall, with smallest group of 10) Sorry for bothering you with these naive questions, and thank you for your time and patience.

deseq2 • 1.8k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

DESeq2 does its own filtering for power (to reduce multiple testing by eliminating genes that don't have enough counts for detecting differences). So it's really mostly for reducing dataset size and running time, as it takes time to fit the model over these genes.

You can use counts(dds, normalized=TRUE) if you want here to filter on scaled counts (so that the lowest and highest sequenced samples are brought in line with the typical samples with respect to sequencing depth in the dataset). Often it doesn't make a big difference because in a typical experiment the range is not so wide in terms of sequencing depth. You would then need to run estimateSizeFactors first, before the pre-filtering.

ADD COMMENT
0
Entering edit mode

Thank you for the fast reply.

ADD REPLY

Login before adding your answer.

Traffic: 486 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6