log2 Fold Change in DESeq2 is not identical to FC calculated from normalized count when betaPrior is turned off
1
1
Entering edit mode
Yanling ▴ 10
@Yanling-24595
Last seen 3.3 years ago

I was comparing various modeling methods for differential expression and found that DESeq2 mysteriously inverted the sign of a subset of fold changes. According to this discussion, turning off fold change shrinkage should make log2foldchange from DESeq2 be simply equal to (mean of normalized counts group B) / (mean of normalized counts group A).

However, it seems that some degree of fold change moderation is done even when betaPrior is False.

As an example, I use the dataset from the official guideline. All preprocessing steps are copied from the official guide.

Result: log2Foldchange from DESeq2 without fold change shrinkage is not identical to what is calculated from normalized count matrix. And sign inversion happened for 121 genes in this dataset.

# Preprocessing
library("pasilla")
pasCts <- system.file("extdata",
                      "pasilla_gene_counts.tsv",
                      package="pasilla", mustWork=TRUE)
pasAnno <- system.file("extdata",
                       "pasilla_sample_annotation.csv",
                       package="pasilla", mustWork=TRUE)
cts <- as.matrix(read.csv(pasCts,sep="\t",row.names="gene_id"))
coldata <- read.csv(pasAnno, row.names=1)
coldata <- coldata[,c("condition","type")]
coldata$condition <- factor(coldata$condition)
coldata$type <- factor(coldata$type)
head(cts,2)
coldata
rownames(coldata) <- sub("fb", "", rownames(coldata))
all(rownames(coldata) %in% colnames(cts))
all(rownames(coldata) == colnames(cts))
cts <- cts[, rownames(coldata)]
all(rownames(coldata) == colnames(cts))
library("DESeq2")
dds <- DESeqDataSetFromMatrix(countData = cts,
                              colData = coldata,
                              design = ~ condition)
dds
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]
dds$condition <- factor(dds$condition, levels = c("untreated","treated")) 

##Actual analysis
dds <- DESeq(dds, betaPrior = F)
res <- results(dds, name="condition_treated_vs_untreated") 
# Fold change from DESeq2 without lfc
FC_DESeq <- res$log2FoldChange

#Export the median ratio normalized matrix
cts_norm <- counts(dds, normalized = T)
#Fold change calculated from the normalized matrix
FC_MLE <- apply(cts_norm, 1, function(x)
  log2(mean(x[coldata$condition == "treated"])/mean(x[coldata$condition == "untreated"])))

plot(FC_DESeq, FC_MLE)
abline(0, 1)

# number of FC got reversed
sum(FC_DESeq * FC_MLE < 0)

I wonder if this behavior is intended. I'm using DESeq2_1.28.1.

DESeq2 • 2.1k views
ADD COMMENT
3
Entering edit mode
@mikelove
Last seen 15 hours ago
United States

It's not always equal to the ratio of the mean of normalized counts depending on the fit of the GLM, but close (when no other factors are present in the design).

The GLM coefficient is not wrong. The correlation is 0.99. The naive estimator doesn't take into account differences in precision based on sequencing depth.

ADD COMMENT
0
Entering edit mode

Thanks Michael! Your courses and books got me started on bioinfo.

You are right. In this two group design, the two FCs are not identical, but still very similar. Our own data actually has a 2*2*2*4 design. When compressed into a 16*2 design using the method you suggested in the guide, the plot with two FCs has some extra "wings" near the axis, similar to fold change shrinkage, even though betaPrior is off. Could we trust the GLM coefficient vs naive estimator in a design of four factors and some (low order) interactions (e.g. ~A + B + C + D + A:B)?

ADD REPLY
0
Entering edit mode

Your naive estimator doesn't take into account observation precision. You can trust the GLM coefficients for significant genes (right? because the LFC can have very high variance depending on the data), and the shrunken LFC for all genes (this is why we implemented our shrinkage estimators, and provided access to others, e.g. ashr).

ADD REPLY
0
Entering edit mode

This makes a lot of sense. Many thanks again.

ADD REPLY

Login before adding your answer.

Traffic: 702 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6