We've recently completed an RNASeq experiment with 35 samples and 10 different conditions. Almost all of our biological questions are about pairwise comparisons between two of these conditions.
As of now, my workflow has been to import all of the samples as follows:
dds <- DESeqDataSetFromTximport(txi.salmon, sampleTable, ~condition) dds <- dds[ rowSums(counts(dds)) > 10, ]
nrow(dds) -> 33,270
After I've done this, I've ran DESeq on the enitre dds dataset:
dds <- DESeq(dds)
I've then proceeded to run results on the pairwise comparison of interest:
YCNT <- results(dds, contrast = c("condition", "YB6CNT", "YBJCNT"), alpha = 0.05, lfcThreshold = 1)
In this instance, I find very few significant differences using the log2 and padj thresholds (which is not that surprising since this is a control condition comparison)
> table(YCNT.05$padj < 0.05) FALSE TRUE 33072 24
My question is, if I'm interested in just this comparison, is there any difference in how the model runs if I were to only import the samples corresponding to these two conditions to begin with, as opposed to importing all 35 samples and then running the pairwise results contrast?
For example, I could imagine that the initial DDS filtering would return less rows with rowsum >=10 and that might make the FDR less strict. I'm not sure if that is accounted for in the model, but was just curious if one approach is more appropriate than the other.
Thanks a lot.