Question

Variation in number of DEGs on LFC shrinkage

0

Entering edit mode

Deevanshu • 0

@a5b88c96

Last seen 4.1 years ago

India

Hi, I am using DeSeq2 to estimate the DEGs across a dataset of 6 samples with 3 samples each in 2 conditions - patient versus control. The parameters for the DEGs are as follows: |Log2FC|<2, P-value<0.05

I ran DeSeq on the datasets through 2 ways - (i) Without the LFC shrinkage, and (ii) With the LFC Shrinkage

I didn't expect a huge variation in the number of DEGs detected but the results showed otherwise. This has confused me further about the usage of LFC shrinkage in DE Analysis. The code and output are as below.

I want to understand if this is the expected variation if using LFC shrinkage versus not using LFC shrinkage, and if yes, when should shrinkage be used?

> contrast_oe <- c("SampleType", "LC", "Control")
> res_tableOE_unshrunken <- results(dds, contrast=contrast_oe)
> unshrunk_tb <- res_tableOE_unshrunken %>% data.frame() %>% rownames_to_column(var="gene") %>% as_tibble()
> unshrunk_tb$diffexpressed <- "NO"
> unshrunk_tb$diffexpressed[unshrunk_tb$log2FoldChange > 2 & unshrunk_tb$pvalue < 0.05] <- "UP"
> unshrunk_tb$diffexpressed[unshrunk_tb$log2FoldChange < -2 & unshrunk_tb$pvalue < 0.05] <- "DOWN"
> aggregate(gene~diffexpressed, unshrunk_tb, function(x) c(count = length(x)))
  diffexpressed  gene
1          DOWN   293
2            NO 25349
3            UP   843
> res_tableOE_shrunk <- lfcShrink(dds, contrast = contrast_oe, type = "ashr")
using 'ashr' for LFC shrinkage. If used in published research, please cite:
    Stephens, M. (2016) False discovery rates: a new deal. Biostatistics, 18:2.
    https://doi.org/10.1093/biostatistics/kxw041
> shrunk_tb <- res_tableOE_shrunk %>% data.frame() %>% rownames_to_column(var="gene") %>% as_tibble()
> shrunk_tb$diffexpressed <- "NO"
> shrunk_tb$diffexpressed[shrunk_tb$log2FoldChange > 2 & shrunk_tb$pvalue < 0.05] <- "UP"
> shrunk_tb$diffexpressed[shrunk_tb$log2FoldChange < -2 & shrunk_tb$pvalue < 0.05] <- "DOWN"
> aggregate(gene~diffexpressed, shrunk_tb, function(x) c(count = length(x)))
  diffexpressed  gene
1          DOWN     2
2            NO 26448
3            UP    35

DESeq2 lfcShrink • 1.1k views

ADD COMMENT • link updated 4.1 years ago by Michael Love 43k • written 4.1 years ago by Deevanshu • 0

score 0 · Answer 1 · 2021-11-12

The top results() call is not how we recommend to threshold against an LFC value, use lfcThreshold instead (see the vignette or paper).

Still the results will not be identical, as the methods are not identical. If you use lfcThreshold with lfcShrink it will output aggregate posterior tail probabilities, (s-values). You can plot on the -log10 scale the p-value from results with lfcThreshold vs s-value from lfcShrink with lfcThreshold to compare.