DESeq2 - Differences between plotCounts and normalized counts
1
0
Entering edit mode
@andrebolerbarros-16788
Last seen 26 days ago
Portugal

Hi everyone,

I've just realized that the data I got to do a plotCounts for a plot of a specific gene and the values obtained by counts, with normalized=T. I was wondering if there is a reason for this? Is anyway related with the pseudo-counts (pc) arguments?

Thanks!

deseq2 • 1.2k views
0
Entering edit mode
@mikelove
Last seen 7 hours ago
United States

The normalized=TRUE argument gives you counts that are normalized by size factor, typically this is sequencing depth normalization. It is giving you:

counts(dds, normalized=TRUE)[ gene, ]


This is useful so you can interpret the changes up and down as associated with the condition, rather than having those differences on top of sequencing depth variation. The DESeq2 model however uses raw counts plus offsets internally. The normalization in this plot is just to aid the human eye.

0
Entering edit mode

As far as I understand then, the values I get from plotCounts are different since the processing of the data by DESeq2 is more complicated than "just" normalization by sequencing depth. Is there any way to access the values for the values used for DESeq2 calculations?

The main objective of this is to be able to calculate baseMeans and baseVariances per groups of interests.

0
Entering edit mode

DESeq2 calculations are a generalized linear model on raw counts, with size factor offsets, where the design determines the coefficients in the GLM.

Can you say what values you are after?

If you want to calculate mean and variance of each group, I'd use counts(dds, normalized=TRUE). These may be useful descriptive statistics. Note though that these sample means and variances per group aren't used by DESeq2 in its estimation of its test statistics.

0
Entering edit mode

Let's imagine I have gene A. When I perform plotCounts, the output consistis of a data.frame:

       count group
1  2147.82347     A
2  2785.33020     A
3  5149.90495     A
4  4874.16509     A
5  1755.28676     A
6  4467.04202     B
7  1832.76189     B
8  1796.12376     B
9   659.91229     B
10  736.73997     B
11   35.74922     C
12   36.02343     C
13   35.96535     C
14   21.94258     C
15   14.40656     C
16   47.87601     D
17   20.46806     D
18   32.73794     D
19   17.32621     D
20   32.82445     D

However, for gene A, when I go to counts, normalized=T, the values are different. I was just wondering why there are differences (which you already answered - thanks for that!) and if I could assess in a systematic way to a data.frame with the values per sample as in plotCounts but, for the whole gene set.

My question came up in the logFC shrinkage; I would like to use some metrics to further explore some big differences I obtained (the method for shrinkage I understand; I wanted some metrics to make it "easier" to assess the differences).

Thanks!

0
Entering edit mode

How are they different? Can you show the other values?

0
Entering edit mode

Here they are:

  counts group
1  2147.32347     A
2  2784.83020     A
3  5149.40495     A
4  4873.66509     A
5  1754.78676     A
6  4466.54202     B
7  1832.26189     B
8  1795.62376     B
9   659.41229     B
10  736.23997     B
11   35.24922     C
12   35.52343     C
13   35.46535     C
14   21.44258     C
15   13.90656     C
16   47.37601     D
17   19.96806     D
18   32.23794     D
19   16.82621     D
20   32.32445     D
1
Entering edit mode

If you look up the help for plotCounts you will see that a pseudocount of 0.5 is added to the data by default (because the default setting for transform=TRUE, and counts of 0 cannot be plotted when the y-axis has log scale).

You can access the normalized counts with counts(dds, normalized=TRUE), and what you are getting from plotCounts(dds, returnData=TRUE) has 0.5 added because transform=TRUE. If you set plotCounts(dds, transform=FALSE, returnData=TRUE) you would get the same values as the normalized counts via counts().

0
Entering edit mode

The difference is consistently 0.5, which corresponds to the default value for pseudo-counts in plotCounts, that was my first assumption.