Question: DESeq2 log2FC shrink before GSEA or not?
gravatar for bmreilly
8 months ago by
United States
bmreilly0 wrote:

I have RNA-seq data from several conditions where I have deteremined differentially expressed genes using DESeq2 and I'd now like to perform gene set enrichment analysis on the log2FC values from these comparisons. My question is whether I should use the shrunken log fold changes from "lfcShrink()" function or use the raw log2FC values?

ADD COMMENTlink modified 8 months ago by Michael Love26k • written 8 months ago by bmreilly0

I think you should use the shrunken LFCs as it took into account the variance of the changes.

For more information, refer to this vignette to find details on moderated LFCs:

ADD REPLYlink written 8 months ago by mikhael.manurung200
Answer: DESeq2 log2FC shrink before GSEA or not?
gravatar for Michael Love
8 months ago by
Michael Love26k
United States
Michael Love26k wrote:

I'd argue for the shrunken LFCs, for the reasons given in the two papers, e.g. the apeglm paper from last year or the DESeq2 paper.

ADD COMMENTlink written 8 months ago by Michael Love26k

Thanks I'll read the apeglm paper more closely.

I'm actually doing a time-series where I look at the acute affects of a drug treatment and then a later time point checking for longer-lasting or sub-acute changes. Using the shrunken LFCs in the acute time point seems to work well, but for the "recovery" time point the changes are much more modest, and the shrunken LFCs lead to for the most part only 1 single gene of a set having ANY LFC, which leads to poor results with GSEA (completely unrelated pathways driven by a single gene with large lfc and close to zero for the rest). When I use non-shrunken values I recover many similar pathways as the acute time-point, but to a lesser degree of significance which seems more realistic.

Perhaps this could be an effect of using "ashr" instead of "apeglm" ?

I ask because I'm using a design formula of " ~ batch + condition" due to complicated experimental design, and I'm unable to use "apeglm" because it requires a "coef" argument, and the comparisons I need to make are not found when I use "coef(dds.result)", only when I pass contrasts. is there a way to use "apeglm" method with contrasts? Or a way to get a coefficient for a given comparison?

Perhaps if I find a way to use apeglm the "Recovery" time point results would make more sense.

Thanks -Brian

ADD REPLYlink modified 8 months ago • written 8 months ago by bmreilly0

ashr may be a little less aggressive than apeglm. For apeglm, if the data is compatible with LFC=0, it tends to move the point estimate to 0.

Almost always, there is a way to express a contrast as a coefficient, and we have some examples in the vignette, but it's easy enough to just use 'ashr' here, and then you can give lfcShrink() the contrast.

ADD REPLYlink written 8 months ago by Michael Love26k

Okay thanks for the help. I guess the fact that I get unintelligible results with shrunken LFC for the later time point makes me feel that shrunken LFC with GSEA may not be appropriate when looking at more subtle changes in gene expression.

I have heard fof some people on BioStars who use the statistic (Wald) generated from DESeq2 as the ranking metric for GSEA -- in my experiment it looks like this gives somewhat intelligible results for both time points --do you think this would be a reasonable compromise?

ADD REPLYlink written 8 months ago by bmreilly0

Sure, Wald statistic is fine as well, that promotes genes which may have smaller absolute FC, but relative to the dispersion of the gene it ends up with a higher statistic.

Gene set testing for me is quite dataset specific. There seems to be a magic zone of amount of signal which gives "best" gene set results. But certainly using Wald statistic sounds reasonable.

ADD REPLYlink written 8 months ago by Michael Love26k

Yes I liked the idea of Wald stat for that reason also. Thanks for your input, very helpful!

ADD REPLYlink written 8 months ago by bmreilly0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 231 users visited in the last hour