Hi all,
We are using DESeq2 for differential gene expression analysis of RNA-seq data. We have three cell lines, in each we have performed shRNA-mediated knock-down of a transcription factor so we have control and knock-down for each cell line in triplicate. We would like to find the gene expression changes due to knock down that are common across the three cell lines.
> dds <- DESeqDataSetFromMatrix(countData = counts, colData = sampleData, design = ~ cell.line + shRNA)
> dds <- DESeq(dds)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
> colData(dds)
DataFrame with 18 rows and 3 columns
cell.line shRNA sizeFactor
<factor> <factor> <numeric>
1 A scr 0.8885007
2 A scr 2.0095038
3 A scr 1.9165557
4 A kd 0.9561916
5 A kd 1.4908854
... ... ... ...
14 C scr 0.9472558
15 C scr 0.9923416
16 C kd 0.4343480
17 C kd 0.9292461
18 C kd 0.3899622
> design(dds)
~cell.line + shRNA
> resultsNames(dds)
[1] "Intercept" "cell.line_B_vs_A" "cell.line_C_vs_A" "shRNA_kd_vs_scr"
> results(dds)
log2 fold change (MLE): shRNA kd vs scr
Wald test p-value: shRNA kd vs scr
DataFrame with 27557 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
ENSG00000000003 1.217422 0.00484854 1.1959788 0.004054035 0.99676536 NA
> res <- results(dds)
> res <- subset(res, padj < 0.05)
> res <- res[order(res$log2FoldChange),]
> res.dn <- head(rownames(res))
# plotting normalized counts for the genes from res.dn we get:

My understanding was that the design (~ cell.line + shRNA) would give me the overall effect of the shRNA taking into account differences between the cell lines. But when the counts are plotted it is clear that for many of the genes with padj < 0.05 the difference in expression is dominated by one cell line with low expression and/or little/no difference in expression for the others. How can we find the genes that respond in all three of the cell lines?
We can of course run a separate comparison for each cell line and then look for the common changes manually but I'd much prefer to incorporate it into the design from the beginning. Any advice on how to change the design or call results greatly appreciated.

Thanks so much for the speedy response! I'll work through the method you suggest and see where it gets me. I'm not a statistician - is there a logical way to determine the padj cut-off to apply when using this approach? Can one use a less stringent cut off given that it's applied for three separate comparisons in this case?
I wouldn't use a less stringent cutoff just because it's a three-way intersection. And I should say there isn't an unified FDR interpretation for the final set. It's just a post-hoc set, based of off three FDR bounded sets.