DESeq2 log2FC shrink before GSEA or not?
1
0
Entering edit mode
bmreilly • 0
@bmreilly-7279
Last seen 3.5 years ago
United States

I have RNA-seq data from several conditions where I have deteremined differentially expressed genes using DESeq2 and I'd now like to perform gene set enrichment analysis on the log2FC values from these comparisons. My question is whether I should use the shrunken log fold changes from "lfcShrink()" function or use the raw log2FC values?

deseq2 gsea rnaseq gene espression • 26k views
ADD COMMENT
1
Entering edit mode

I think you should use the shrunken LFCs as it took into account the variance of the changes.

For more information, refer to this vignette to find details on moderated LFCs: https://www.bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html

ADD REPLY
1
Entering edit mode
@mikelove
Last seen 13 hours ago
United States

I'd argue for the shrunken LFCs, for the reasons given in the two papers, e.g. the apeglm paper from last year or the DESeq2 paper.

ADD COMMENT
0
Entering edit mode

Thanks I'll read the apeglm paper more closely.

I'm actually doing a time-series where I look at the acute affects of a drug treatment and then a later time point checking for longer-lasting or sub-acute changes. Using the shrunken LFCs in the acute time point seems to work well, but for the "recovery" time point the changes are much more modest, and the shrunken LFCs lead to for the most part only 1 single gene of a set having ANY LFC, which leads to poor results with GSEA (completely unrelated pathways driven by a single gene with large lfc and close to zero for the rest). When I use non-shrunken values I recover many similar pathways as the acute time-point, but to a lesser degree of significance which seems more realistic.

Perhaps this could be an effect of using "ashr" instead of "apeglm" ?

I ask because I'm using a design formula of " ~ batch + condition" due to complicated experimental design, and I'm unable to use "apeglm" because it requires a "coef" argument, and the comparisons I need to make are not found when I use "coef(dds.result)", only when I pass contrasts. is there a way to use "apeglm" method with contrasts? Or a way to get a coefficient for a given comparison?

Perhaps if I find a way to use apeglm the "Recovery" time point results would make more sense.

Thanks -Brian

ADD REPLY
1
Entering edit mode

ashr may be a little less aggressive than apeglm. For apeglm, if the data is compatible with LFC=0, it tends to move the point estimate to 0.

Almost always, there is a way to express a contrast as a coefficient, and we have some examples in the vignette, but it's easy enough to just use 'ashr' here, and then you can give lfcShrink() the contrast.

ADD REPLY
0
Entering edit mode

Okay thanks for the help. I guess the fact that I get unintelligible results with shrunken LFC for the later time point makes me feel that shrunken LFC with GSEA may not be appropriate when looking at more subtle changes in gene expression.

I have heard fof some people on BioStars who use the statistic (Wald) generated from DESeq2 as the ranking metric for GSEA -- in my experiment it looks like this gives somewhat intelligible results for both time points --do you think this would be a reasonable compromise?

ADD REPLY
0
Entering edit mode

Sure, Wald statistic is fine as well, that promotes genes which may have smaller absolute FC, but relative to the dispersion of the gene it ends up with a higher statistic.

Gene set testing for me is quite dataset specific. There seems to be a magic zone of amount of signal which gives "best" gene set results. But certainly using Wald statistic sounds reasonable.

ADD REPLY
0
Entering edit mode

Yes I liked the idea of Wald stat for that reason also. Thanks for your input, very helpful!

ADD REPLY

Login before adding your answer.

Traffic: 946 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6