Search
Question: Differential gene expression: common changes in response to treatment across cell types
0
11 months ago by
jason.wray0 wrote:

Hi all,

We are using DESeq2 for differential gene expression analysis of RNA-seq data. We have three cell lines, in each we have performed shRNA-mediated knock-down of a transcription factor so we have control and knock-down for each cell line in triplicate. We would like to find the gene expression changes due to knock down that are common across the three cell lines.

> dds <- DESeqDataSetFromMatrix(countData = counts, colData = sampleData, design = ~ cell.line + shRNA)
> dds <- DESeq(dds)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing

> colData(dds)
DataFrame with 18 rows and 3 columns
cell.line    shRNA sizeFactor
<factor> <factor>  <numeric>
1           A      scr  0.8885007
2           A      scr  2.0095038
3           A      scr  1.9165557
4           A       kd  0.9561916
5           A       kd  1.4908854
...       ...      ...        ...
14          C      scr  0.9472558
15          C      scr  0.9923416
16          C       kd  0.4343480
17          C       kd  0.9292461
18          C       kd  0.3899622
> design(dds)
~cell.line + shRNA
> resultsNames(dds)
[1] "Intercept"        "cell.line_B_vs_A" "cell.line_C_vs_A" "shRNA_kd_vs_scr"

> results(dds)
log2 fold change (MLE): shRNA kd vs scr
Wald test p-value: shRNA kd vs scr
DataFrame with 27557 rows and 6 columns
baseMean log2FoldChange     lfcSE         stat      pvalue       padj
<numeric>      <numeric> <numeric>    <numeric>   <numeric>  <numeric>
ENSG00000000003   1.217422     0.00484854 1.1959788  0.004054035  0.99676536         NA

> res <- results(dds)
> res <- subset(res, padj < 0.05)
> res <- res[order(res$log2FoldChange),] > res.dn <- head(rownames(res)) # plotting normalized counts for the genes from res.dn we get: My understanding was that the design (~ cell.line + shRNA) would give me the overall effect of the shRNA taking into account differences between the cell lines. But when the counts are plotted it is clear that for many of the genes with padj < 0.05 the difference in expression is dominated by one cell line with low expression and/or little/no difference in expression for the others. How can we find the genes that respond in all three of the cell lines? We can of course run a separate comparison for each cell line and then look for the common changes manually but I'd much prefer to incorporate it into the design from the beginning. Any advice on how to change the design or call results greatly appreciated. ADD COMMENTlink modified 11 months ago by Gavin Kelly560 • written 11 months ago by jason.wray0 1 11 months ago by Michael Love19k United States Michael Love19k wrote: To require a DE signal in an cell lines, there's not a single design formula, but you can look at the intersection of the three FDR bounded sets. You can do: ~cell.line + cell.line:shRNA Then test, resA <- results(dds, name="cell.lineA.shRNAkd") and then look at set <- resA$padj < x & ... & ...

Thanks so much for the speedy response! I'll work through the method you suggest and see where it gets me. I'm not a statistician - is there a logical way to determine the padj cut-off to apply when using this approach? Can one use a less stringent cut off given that it's applied for three separate comparisons in this case?

1

I wouldn't use a less stringent cutoff just because it's a three-way intersection. And I should say there isn't an unified FDR interpretation for the final set. It's just a post-hoc set, based of off three FDR bounded sets.

1
11 months ago by
Gavin Kelly560
United Kingdom / London / Francis Crick Institute
Gavin Kelly560 wrote:

There are a couple of other approaches than Michael's that might also be worth a try, depending on the exact interpretation you want; if the replicates are technical replicates, then I think the estimates of variance using the interaction model would mainly reflect technical variation.  You could do an LRT between full=~cell.line + shRNA and reduced = ~cell.line to look (colloquially) for global shRNA effects after normalising by cell.line.  You'd still get potential domination of a single cell-line's effects, but it would be augmented by cell-line-independent changes as well.  You could even remove things that were having a cell-line specific response from this list, by filtering out things that were significant in a cell.line*shRNA vs cell.line+shRNA LRT.

You could take it one stage further and do a likelihood ratio test between full=~shRNA and reduced=~1 which would be treating the cell lines as replicates (so you get the variance being a composite of technical and biological variation), but that's probably a step too far.

Or you could go down the limma/voom route, and model the between-cell-line variability separately from the within-line variability using the blocking that's available there, but perhaps there aren't enough cell-lines here to approximate well the diversity of cell-line populations.

Michael's approach will be the safest, in that you'll get transcripts that are changing in all three, whereas the LRT approach may pick transcripts that just fall outside the stringency in one of the lines.

Thank you! I have in fact tried LRT for full = ~cell.line + shRNA and reduced = ~cell.line but the results were very similar. I had also thought of filtering as you suggest but hadn't hit on a clever way of doing it. Problem is that I'm fine with there being a difference in the magnitude of response between cell lines but such differences would be picked up by the cell.line*shRNA vs cell.line+shRNA (I think?).

Will look into limma/voom but I have no experience with those.