lfcShrink probelm in many 0 count genes RNA-seq data
1
0
Entering edit mode
@shangguandong1996-21805
Last seen 2.0 years ago
China

Hi, Dr love.

I post a question about weird MAplot or volcano plot of DESeq2 diff result and also in biostar. ATpoint give a useful answer about too many 0 count genes and prefiltering.

It seems that too many 0 count genes makes lfc shrink have a probelm. And I find the apeglm and ashr result is very different. I am wondering whether you can give me some advice or which algorithms I shoud choose in this condition.

I have prefilter 0 count gene

enter image description here enter image description here

Here is the code, and rawdata

https://github.com/shangguandong1996/picture_link/blob/main/WFX_count_Rmatrix.txt

# Prepare -----------------------------------------------------------------

# load up the packages
library(DESeq2)
library(dplyr)

library(ggplot2)

# Set Options
options(stringsAsFactors = F)

# load up the data
data <- read.table("rawdata/count/WFX_count_Rmatrix.txt",
                   header = TRUE,
                   row.names = 1)

coldata <- data.frame(row.names = colnames(data),
                      type = rep(c("Fx593", "Fx600"), each = 2))


# DESeq2 ------------------------------------------------------------------

dds <- DESeqDataSetFromMatrix(countData = data,
                              colData = coldata,
                              design= ~ type)
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]

# PCA
vsd <- vst(dds)
plotPCA(vsd, intgroup = "type")

dds <- DESeq(dds)


lfc_plotVolcano <- function(type){
    res_lfc <- lfcShrink(dds = dds,
                         type = type,
                         coef = "type_Fx600_vs_Fx593")

    as_tibble(res_lfc) %>% 
        mutate(padj = case_when(
            is.na(padj) ~ 1,
            TRUE ~ padj
        )) %>% 
        ggplot(aes(x = log2FoldChange, y = -log10(padj))) +
        geom_point() +
        ggtitle(type)
}

lfc_plotVolcano("ashr")
lfc_plotVolcano("apeglm")

Best wishes Guandong Shang

ashr DESeq2 apeglm • 1.7k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 11 hours ago
United States

apeglm looks fine here, what’s the issue?

As we show in our paper these methods are not identical, hence we developed apeglm.

The genes with small pvalue from null hypothesis testing sometimes have their LFC shrunk toward zero, when those genes are affected by count outliers. I tend to trust the shrunken coefficients and you can use this to filter your gene set, such that it has small pvalue and a large enough LFC after shrinkage. You can examine these genes with plotCounts.

ADD COMMENT
0
Entering edit mode

Thanks for your reply, Dr Love.

I just have another question about pre-filtering 0 count genes. Accoring to your manual, it seems that pre-filter low count is not necessary because there is more strict filtering on result function.

Pre-filtering

While it is not necessary to pre-filter low count genes before running the DESeq2 functions, there are two reasons which make pre-filtering useful: by removing rows in which there are very few reads, we reduce the memory size of the dds data object, and we increase the speed of the transformation and testing functions within DESeq2. Here we perform a minimal pre-filtering to keep only rows that have at least 10 reads total. Note that more strict filtering to increase power is automatically applied via independent filtering on the mean of normalized counts within the results function.

But as you can see, in this condition, pre-filtering or not will make a big infulence on the volcano plot shape. I am just curious why?

Best wishes

Guandong Shang

ADD REPLY
0
Entering edit mode

I’m recommending it for you here aside from the point above that it’s not required for null hypothesis pvalue generation and multiple testing.

ADD REPLY
0
Entering edit mode

Hi, Dr love.

I am wondering whether lfcShrink function will not apply independent filtering on the mean of normalized counts while result function will. But if it is true, should I pre-filtering low count gene each time before running DESeq function if I want to get the lfcShrink log2FoldChange result?

Best wishes

Guandong Shang

ADD REPLY
0
Entering edit mode

The lfcShrink function produces posterior effect sizes. Those don’t involve independent filtering or multiple testing.

ADD REPLY

Login before adding your answer.

Traffic: 604 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6