Question

DESeq2 - Differences between plotCounts and normalized counts

0

Entering edit mode

andrebolerbarros ▴ 20

@andrebolerbarros-16788

Last seen 10 hours ago

Portugal

Hi everyone,

I've just realized that the data I got to do a plotCounts for a plot of a specific gene and the values obtained by counts, with normalized=T. I was wondering if there is a reason for this? Is anyway related with the pseudo-counts (pc) arguments?

Thanks!

deseq2 • 3.2k views

ADD COMMENT • link updated 6 months ago by Michael Love 41k • written 5.7 years ago by andrebolerbarros ▴ 20

score 0 · Answer 1 · 2018-08-20

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 21 minutes ago

United States

The normalized=TRUE argument gives you counts that are normalized by size factor, typically this is sequencing depth normalization. It is giving you:

counts(dds, normalized=TRUE)[ gene, ]

This is useful so you can interpret the changes up and down as associated with the condition, rather than having those differences on top of sequencing depth variation. The DESeq2 model however uses raw counts plus offsets internally. The normalization in this plot is just to aid the human eye.

ADD COMMENT • link 5.7 years ago Michael Love 41k

0

Entering edit mode

As far as I understand then, the values I get from plotCounts are different since the processing of the data by DESeq2 is more complicated than "just" normalization by sequencing depth. Is there any way to access the values for the values used for DESeq2 calculations?

The main objective of this is to be able to calculate baseMeans and baseVariances per groups of interests.

ADD REPLY • link 5.7 years ago andrebolerbarros ▴ 20

0

Entering edit mode

DESeq2 calculations are a generalized linear model on raw counts, with size factor offsets, where the design determines the coefficients in the GLM.

Can you say what values you are after?

If you want to calculate mean and variance of each group, I'd use counts(dds, normalized=TRUE). These may be useful descriptive statistics. Note though that these sample means and variances per group aren't used by DESeq2 in its estimation of its test statistics.

ADD REPLY • link 5.7 years ago Michael Love 41k

0

Entering edit mode

Let's imagine I have gene A. When I perform plotCounts, the output consistis of a data.frame:

       count group
1  2147.82347     A
2  2785.33020     A
3  5149.90495     A
4  4874.16509     A
5  1755.28676     A
6  4467.04202     B
7  1832.76189     B
8  1796.12376     B
9   659.91229     B
10  736.73997     B
11   35.74922     C
12   36.02343     C
13   35.96535     C
14   21.94258     C
15   14.40656     C
16   47.87601     D
17   20.46806     D
18   32.73794     D
19   17.32621     D
20   32.82445     D

However, for gene A, when I go to counts, normalized=T, the values are different. I was just wondering why there are differences (which you already answered - thanks for that!) and if I could assess in a systematic way to a data.frame with the values per sample as in plotCounts but, for the whole gene set.

My question came up in the logFC shrinkage; I would like to use some metrics to further explore some big differences I obtained (the method for shrinkage I understand; I wanted some metrics to make it "easier" to assess the differences).

Thanks!

ADD REPLY • link 5.7 years ago andrebolerbarros ▴ 20

0

Entering edit mode

How are they different? Can you show the other values?

ADD REPLY • link 5.7 years ago Michael Love 41k

0

Entering edit mode

Here they are:

  counts group
1  2147.32347     A
2  2784.83020     A
3  5149.40495     A
4  4873.66509     A
5  1754.78676     A
6  4466.54202     B
7  1832.26189     B
8  1795.62376     B
9   659.41229     B
10  736.23997     B
11   35.24922     C
12   35.52343     C
13   35.46535     C
14   21.44258     C
15   13.90656     C
16   47.37601     D
17   19.96806     D
18   32.23794     D
19   16.82621     D
20   32.32445     D

ADD REPLY • link 5.7 years ago andrebolerbarros ▴ 20

1

Entering edit mode

If you look up the help for plotCounts you will see that a pseudocount of 0.5 is added to the data by default (because the default setting for transform=TRUE, and counts of 0 cannot be plotted when the y-axis has log scale).

You can access the normalized counts with counts(dds, normalized=TRUE), and what you are getting from plotCounts(dds, returnData=TRUE) has 0.5 added because transform=TRUE. If you set plotCounts(dds, transform=FALSE, returnData=TRUE) you would get the same values as the normalized counts via counts().

ADD REPLY • link 5.7 years ago Michael Love 41k

0

Entering edit mode

The difference is consistently 0.5, which corresponds to the default value for pseudo-counts in plotCounts, that was my first assumption.

ADD REPLY • link 5.7 years ago andrebolerbarros ▴ 20

0

Entering edit mode

Dear Michael,

Sorry for re-opening this post again. I have a question regarding the input in plotCounts:

In plotCounts(dds), when we apply "normalized=TRUE", this corrects for size factor. But when we apply VST on the dds object, we correct both for size factor and library size, right? I use the second to plot PCAs, for example. Then library size is not corrected in plotCounts?

Besides this, and regarding batch effect, normally I correct batch effect for plotting purposes like this:

# Normalize counts for visualization purposes (i.e PCA)
vsd <- varianceStabilizingTransformation(dds)

#Use limma and removebatcheffect to plot corrected PCA
assay(vsd) <- limma::removeBatchEffect(assay(vsd), vsd$batch_bis_corr, design=  model.matrix(~ shear + stiffness + shear:stiffness, colData))

How can I use the batch-corrected data in my plotCount?

Thank you. Laia

ADD REPLY • link 6 months ago Laia ▴ 10

1

Entering edit mode

VST corrects for library size (as modeled by size factor).

plotCounts is designed to show the count data including batch.

I recommend to use VST data if you want to eg see variation that remains after regressing out batch.

ADD REPLY • link 6 months ago Michael Love 41k