Question: DESeq2 and shrinkage of log2 fold changes
gravatar for riccardo
2.4 years ago by
riccardo40 wrote:

Hi I have some questions about the shrinkage of log2 fold changes:

Is it always useful or in some cases is it advisable to disable it?

Are there plots helpful to show when use it or not?

The shrinkage can be useful also for the small rna-seq or only for the rna seq?

Thank you.

ADD COMMENTlink modified 2.4 years ago by Michael Love18k • written 2.4 years ago by riccardo40

Hi! I'm finding a little bit difficulties trying to understand the shrinkage. I am testing a multifactor design (two factors with two conditions each) and the interaction between the two factors. I have run the DESeq function and I was going to get the Shrunk data with lfcShrink, but I cannot use the type "normal" since I have an interaction. My question is: do I have to include the shrunk data in the original DESeqDataSet and rerun the DESeq function in order to test the shrunk data or has the DESeq function already shrunk it? If the DESeq function has already shrunk it, which type has it used? Thanks for your help! 

ADD REPLYlink written 11 weeks ago by ceboral0

I'd recommend to use type="apeglm". I'd recommend using something like the paradigm in the quick start section of the vignette, where you specify the coefficient of interest using the name or number from resultsNames(dds). You would not re-run DESeq(). The `res` object is then giving you p-values and FDR for the maximum likelihood LFC and a posterior mode and posterior SD for the LFC.

dds <- DESeq(dds)
res <- lfcShrink(dds, coef=..., type="apgelm")

The difference between results() and lfcShrink() is that the former does not provide fold change shrinkage. The latter function calls results() internally to create the p-value and adjusted p-value columns, which provide inference on the maximum likelihood LFC. The shrunken fold changes are useful for ranking genes by effect size and for visualization.

In addition, we have new functionality providing aggregate posterior probabilities on the shrunken LFC but this hasn't been fully released yet (in the current release there isn't support for arbitrary thresholds on LFC). Full functionality with lfcThreshold will be released in April.

ADD REPLYlink written 11 weeks ago by Michael Love18k
gravatar for Michael Love
2.4 years ago by
Michael Love18k
United States
Michael Love18k wrote:

The shrinkage is generally useful, which is why it is enabled by default. Full methods are described in the DESeq2 paper (see DESeq2 citation), but in short, it looks at the largest fold changes that are not due to low counts and uses these to inform a prior distribution. So the large fold changes from genes with lots of statistical information are not shrunk, while the imprecise fold changes are shrunk. This allows you to compare all estimated LFC across experiments, for example, which is not really feasible without the use of a prior.

One case where I would not use it, is if it is expected that nearly all genes will have no change and there is little to no variation across replicates (so near technical replication), and then say < 10 genes with very large fold changes. This scenario could occur in non-biological samples, for example technical replicates plus DE spike ins. The reason this would cause a problem is that the prior is formed according to a high percentile of the large fold changes, but it could miss if there were singular DE genes, and form a prior which is not wide enough to accommodate very large fold changes. It is trivial to turn off the prior in this case (betaPrior=FALSE).

I don't have a comment on small RNA-seq, as I haven't personally analyzed this, but I know the moderated LFC have been used in some small RNA-seq analyses.

You can plot fold changes with and without shrinkage like so:

res <- results(dds, addMLE=TRUE)
plotMA(res, MLE=TRUE)
ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by Michael Love18k

Thank you. In the case of the small rna seq could you give me some advice in order to assess when use it or not?

ADD REPLYlink written 2.4 years ago by riccardo40
It should be fine to use it. To give an example, I wouldn't use it if all LFCs were nearly equal to 0 (say between -.1 and .1) except one or two LFCs which were > 4 in the MA plot of MLE fold changes. This could occur in a technical dataset but unlikely with real biological samples. These numbers are totally contrived though.
ADD REPLYlink modified 13 months ago • written 2.4 years ago by Michael Love18k

Hi Mike, would I be right in thinking that another situation you might not want to use shrinkage is if you wanted to compare the change for two sets of genes, and one of the those sets was more lowly expressed than the other?

ADD REPLYlink written 13 months ago by i.sudbery10

hi, in this case, I'd say the shrinkage is actually the most useful. It will tamper down any non-informative differences in the small count gene. Only when the LFCs between two groups both with small counts are above what is expected simply due to sampling variability will a large value come through. And I should note, on nearly all the bulk RNA-seq datasets we try, we do not observe too much shrinkage. It's only observable for extremely large LFCs in datasets where all the other genes have nearly no change. And for this we have a new estimator in development which works quite well across the board.

ADD REPLYlink written 13 months ago by Michael Love18k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 120 users visited in the last hour