Hi, I am using DeSeq2 to estimate the DEGs across a dataset of 6 samples with 3 samples each in 2 conditions - patient versus control. The parameters for the DEGs are as follows: |Log2FC|<2, P-value<0.05
I ran DeSeq on the datasets through 2 ways - (i) Without the LFC shrinkage, and (ii) With the LFC Shrinkage
I didn't expect a huge variation in the number of DEGs detected but the results showed otherwise. This has confused me further about the usage of LFC shrinkage in DE Analysis. The code and output are as below.
I want to understand if this is the expected variation if using LFC shrinkage versus not using LFC shrinkage, and if yes, when should shrinkage be used?
> contrast_oe <- c("SampleType", "LC", "Control")
> res_tableOE_unshrunken <- results(dds, contrast=contrast_oe)
> unshrunk_tb <- res_tableOE_unshrunken %>% data.frame() %>% rownames_to_column(var="gene") %>% as_tibble()
> unshrunk_tb$diffexpressed <- "NO"
> unshrunk_tb$diffexpressed[unshrunk_tb$log2FoldChange > 2 & unshrunk_tb$pvalue < 0.05] <- "UP"
> unshrunk_tb$diffexpressed[unshrunk_tb$log2FoldChange < -2 & unshrunk_tb$pvalue < 0.05] <- "DOWN"
> aggregate(gene~diffexpressed, unshrunk_tb, function(x) c(count = length(x)))
diffexpressed gene
1 DOWN 293
2 NO 25349
3 UP 843
> res_tableOE_shrunk <- lfcShrink(dds, contrast = contrast_oe, type = "ashr")
using 'ashr' for LFC shrinkage. If used in published research, please cite:
Stephens, M. (2016) False discovery rates: a new deal. Biostatistics, 18:2.
https://doi.org/10.1093/biostatistics/kxw041
> shrunk_tb <- res_tableOE_shrunk %>% data.frame() %>% rownames_to_column(var="gene") %>% as_tibble()
> shrunk_tb$diffexpressed <- "NO"
> shrunk_tb$diffexpressed[shrunk_tb$log2FoldChange > 2 & shrunk_tb$pvalue < 0.05] <- "UP"
> shrunk_tb$diffexpressed[shrunk_tb$log2FoldChange < -2 & shrunk_tb$pvalue < 0.05] <- "DOWN"
> aggregate(gene~diffexpressed, shrunk_tb, function(x) c(count = length(x)))
diffexpressed gene
1 DOWN 2
2 NO 26448
3 UP 35