I started using log fold change shrinkage from the DESeq2 package in single-cell RNA-seq datasets, and I was comparing the "normal" and "apeglm" methods.
I noticed that for many genes, the normal LFC shrinkage switched the sign of the estimated LFC (4524/19243 genes), which doesn't occur for apeglm (4/19243). This was surprising, as I didn't expect a shrinkage method to turn any positive LFC into (small) negative values. Is this expected behavior from the normal shrinkage?
Upon inspection of some of these genes, very few single cells had non-zero counts, so it makes sense these should be shrunk. However, apeglm maintained a high LFC for genes with almost no counts. I included a couple of examples:
Now I'm wondering if these genes should be used at all if only a few cells have one or two counts and the vast majority (>90%) has none. I thought shrinkage would always handle these cases. The normal method apparently does, shrinking most genes to 0 (or slightly negative values?), but apeglm method seems too conservative. Should it only be used after a filter on the minimum number of counts?
Sorry if I'm missing something. Any insights?
Thanks in advance,
scRNA_cn <- readRDS(file = "output_sc/Bakhoum_scRNA.rds") sc_counts <- Seurat::GetAssayData(scRNA_cn, assay = "RNA", slot = "counts") df.diff.design <- scRNA_cn@meta.data %>% select(CIN_status) %>% rownames_to_column("Sample") %>% mutate(Sample = as.factor(Sample), CIN_status = as.factor(CIN_status)) deseq.dataset <- DESeqDataSetFromMatrix( countData=as.matrix(sc_counts), colData=df.diff.design, design=~CIN_status ) %>% estimateSizeFactors() ddsWald <- DESeq(deseq.dataset, test = "Wald", useT = TRUE, minmu = 1e-6, minReplicatesForReplace=Inf) ddsWald_shrunk <- lfcShrink(ddsWald, coef = "CIN_status_CIN.high_vs_CIN.low", type = "normal") ddsWald_apeglm <- lfcShrink(ddsWald, coef = "CIN_status_CIN.high_vs_CIN.low", type = "apeglm") sessionInfo( ) R version 4.0.3 (2020-10-10) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.2 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3 locale:  LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8  LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8  LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C  LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages:  parallel stats4 stats graphics grDevices utils datasets methods  base other attached packages:  biomaRt_2.46.3 AneuFinder_1.18.0 AneuFinderData_1.18.0  cowplot_1.1.1 infercnv_1.6.0 DESeq2_1.30.1  SummarizedExperiment_1.20.0 GenomicRanges_1.42.0 GenomeInfoDb_1.26.7  glmGamPoi_1.2.0 gplots_3.1.1 pheatmap_1.0.12  plotly_4.9.3 fgsea_1.16.0 proxy_0.4-26  forcats_0.5.1 stringr_1.4.0 dplyr_1.0.6  purrr_0.3.4 readr_1.4.0 tidyr_1.1.3  tibble_3.1.1 tidyverse_1.3.1 org.Hs.eg.db_3.12.0  AnnotationDbi_1.52.0 IRanges_2.24.1 S4Vectors_0.28.1  Biobase_2.50.0 BiocGenerics_0.36.1 msigdbr_7.4.1  sparseMatrixStats_1.2.1 MatrixGenerics_1.2.1 matrixStats_0.58.0  ggrepel_0.9.1 ggplot2_3.3.3 patchwork_1.1.1  SeuratObject_4.0.1 Seurat_4.0.1