Question: DESeq2 plotCounts() function-what is the unit on y axis?
gravatar for Alan
11 days ago by
Alan0 wrote:

Hi Michael and community,

Thanks for the work and efforts on DESeq2! I've had this question for some time, I went through quite a number of posts, the manual, and the vignette, but can't really find the answer. What is the unit of the "normalized count" of the y axis of the "plotCounts" plot?

plotCounts(dds, gene=which.min(res$padj), intgroup="condition")

in the vignette.

Question1: If I type ?plotCounts in R, then it says "Normalized counts plus a pseudocount of 0.5 are shown by default." and also there's this "normalized = TRUE", and  "transform = TRUE" arguments. So is the y axis just raw read counts divided by DESeq2-estimated size factor? or has it been logged? (loge? log2? log10?) Also, why is the psudocount 0.5 here? (it's just to avoid taking log of zero, right? but why don't you set it smaller, say, 0.001?) What happens if we set normalized = FALSE? Likewise,  what happens if we set transform = FALSE?

Question2: This one is indirectly related to the first question. So with the way DESeq2 normalizes raw read counts without the effect of gene size, can we compare the expression of two (or multiple) different genes within the same sample? If not, is there other recommended way of normalization for this purpose?

Thank you in advance!


ps. R version 3.4.3 (2017-11-30), DESeq2_1.18.1 

ADD COMMENTlink modified 11 days ago by Michael Love16k • written 11 days ago by Alan0
gravatar for Michael Love
11 days ago by
Michael Love16k
United States
Michael Love16k wrote:

With defaults, normalized=TRUE and transform=TRUE, plotted on the y-axis is the normalized count + 0.5. The y-axis is log scale, but the tick marks are on the count scale. So it's not log counts on the y-axis.

Try this:

coldata <- data.frame(x=factor(c(1,1,2,2)))
cts <- matrix(1:16,ncol=4)
dds <- DESeqDataSetFromMatrix(cts, coldata, ~x)
sizeFactors(dds) <- rep(1,4)
plotCounts(dds, 1, "x")
abline(h=counts(dds, normalized=TRUE)[1,] + .5)

The normalized count is given by counts(dds, normalized=TRUE) and this is K_ij / s_ij in the language of the DESeq2 paper and the formula in the vignette.

If I were to make the pseudocount smaller, if a gene has a 0, it would inflate the vertical space between 0 and 1, where there is little useful information, and squash the part of the plot between 1 and max(count), where there is more useful information. Actually, this fact is related to why we suggest to use variance stabilizing transformations, and why log(x + small pseudocount) is a bad decision for data exploration of counts. Compare these plots:

plot(0:3 + .5, log="y")
plot(0:3 + .001, log="y")

You can't compare DESeq2 normalized counts across genes. I would recommend to compare TPM, for example, estimated by a transcript quantifier and aggregated to the gene level using a package like tximport.

ADD COMMENTlink modified 11 days ago • written 11 days ago by Michael Love16k

Thank you for your very prompt reply and explanation, Michael!  


ADD REPLYlink written 11 days ago by Alan0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 148 users visited in the last hour