Question

DESeq2 and shrinkage of log2 fold changes

4

Entering edit mode

ribioinfo ▴ 100

@ribioinfo-9434

Last seen 3.7 years ago

Hi I have some questions about the shrinkage of log2 fold changes:

Is it always useful or in some cases is it advisable to disable it?

Are there plots helpful to show when use it or not?

The shrinkage can be useful also for the small rna-seq or only for the rna seq?

Thank you.

deseq2 • 32k views

ADD COMMENT • link updated 3.3 years ago by Michael Love 41k • written 8.3 years ago by ribioinfo ▴ 100

0

Entering edit mode

Hi! I'm finding a little bit difficulties trying to understand the shrinkage. I am testing a multifactor design (two factors with two conditions each) and the interaction between the two factors. I have run the DESeq function and I was going to get the Shrunk data with lfcShrink, but I cannot use the type "normal" since I have an interaction. My question is: do I have to include the shrunk data in the original DESeqDataSet and rerun the DESeq function in order to test the shrunk data or has the DESeq function already shrunk it? If the DESeq function has already shrunk it, which type has it used? Thanks for your help!

ADD REPLY • link 6.1 years ago ceboral • 0

0

Entering edit mode

I'd recommend to use type="apeglm". I'd recommend using something like the paradigm in the quick start section of the vignette, where you specify the coefficient of interest using the name or number from resultsNames(dds). You would not re-run DESeq(). The `res` object is then giving you p-values and FDR for the maximum likelihood LFC and a posterior mode and posterior SD for the LFC.

dds <- DESeq(dds)
resultsNames(dds)
res <- lfcShrink(dds, coef=..., type="apgelm")

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#quick-start

The difference between results() and lfcShrink() is that the former does not provide fold change shrinkage. The latter function calls results() internally to create the p-value and adjusted p-value columns, which provide inference on the maximum likelihood LFC. The shrunken fold changes are useful for ranking genes by effect size and for visualization.

In addition, we have new functionality providing aggregate posterior probabilities on the shrunken LFC but this hasn't been fully released yet (in the current release there isn't support for arbitrary thresholds on LFC). Full functionality with lfcThreshold will be released in April.

ADD REPLY • link 6.1 years ago Michael Love 41k

score 11 · Answer 1 · 2016-01-26

11

Entering edit mode

Michael Love 41k

@mikelove

Last seen 1 hour ago

United States

The shrinkage is generally useful, which is why it is enabled by default. Full methods are described in the DESeq2 paper (see DESeq2 citation), but in short, it looks at the largest fold changes that are not due to low counts and uses these to inform a prior distribution. So the large fold changes from genes with lots of statistical information are not shrunk, while the imprecise fold changes are shrunk. This allows you to compare all estimated LFC across experiments, for example, which is not really feasible without the use of a prior.

One case where I would not use it, is if it is expected that nearly all genes will have no change and there is little to no variation across replicates (so near technical replication), and then say < 10 genes with very large fold changes. This scenario could occur in non-biological samples, for example technical replicates plus DE spike ins. The reason this would cause a problem is that the prior is formed according to a high percentile of the large fold changes, but it could miss if there were singular DE genes, and form a prior which is not wide enough to accommodate very large fold changes. It is trivial to turn off the prior in this case (betaPrior=FALSE).

I don't have a comment on small RNA-seq, as I haven't personally analyzed this, but I know the moderated LFC have been used in some small RNA-seq analyses.

You can plot fold changes with and without shrinkage like so:

res <- results(dds, addMLE=TRUE)
plotMA(res)
plotMA(res, MLE=TRUE)

ADD COMMENT • link 8.3 years ago Michael Love 41k

0

Entering edit mode

Thank you. In the case of the small rna seq could you give me some advice in order to assess when use it or not?

ADD REPLY • link 8.3 years ago ribioinfo ▴ 100

0

Entering edit mode

It should be fine to use it. To give an example, I wouldn't use it if all LFCs were nearly equal to 0 (say between -.1 and .1) except one or two LFCs which were > 4 in the MA plot of MLE fold changes. This could occur in a technical dataset but unlikely with real biological samples. These numbers are totally contrived though.

ADD REPLY • link 8.3 years ago • updated 7.0 years ago Michael Love 41k

0

Entering edit mode

One question related to this. In the case of using the schrinkage (in my dataset of miRNA-seq) the number of SDE miRNAs (FDR) increases with respect to the results when not using it. However, although significant, the LFC of most of them is pushed to nearly 0. So, should I consider as differentially expressed those miRNAs significant with or without the schrinkage? And, in the case of consider as significant those with the schrinkage, is it correct to use/consider their LFC before schrinkage for those miRNAs (plotting, reporting results, etc.)? Thank you.

ADD REPLY • link 3.3 years ago Asier • 0

0

Entering edit mode

What do you mean that the set with FDR bound increases? Shrinkage methods in current versions of DESeq2 do not change the FDR (padj column).

How are you performing shrinkage?

ADD REPLY • link 3.3 years ago Michael Love 41k

0

Entering edit mode

Hi Mike, would I be right in thinking that another situation you might not want to use shrinkage is if you wanted to compare the change for two sets of genes, and one of the those sets was more lowly expressed than the other?

ADD REPLY • link 7.0 years ago i.sudbery ▴ 40

2

Entering edit mode

hi, in this case, I'd say the shrinkage is actually the most useful. It will tamper down any non-informative differences in the small count gene. Only when the LFCs between two groups both with small counts are above what is expected simply due to sampling variability will a large value come through. And I should note, on nearly all the bulk RNA-seq datasets we try, we do not observe too much shrinkage. It's only observable for extremely large LFCs in datasets where all the other genes have nearly no change. And for this we have a new estimator in development which works quite well across the board.

ADD REPLY • link 7.0 years ago Michael Love 41k