Question

Are transformed values from rlog /vst log2 normalized counts?

0

Entering edit mode

Eva ▴ 10

@ae923a5a

Last seen 8 weeks ago

Spain

I am trying to understand the vst/rlog transformation of DESeq2 and...in the following vignette - section 4.2. where vst and rlog is explained, it has this paragraph:

Both vst and rlog return a DESeqTransform object which is based on the SummarizedExperiment class. The transformed values are no longer counts, and are stored in the assay slot.

What does it mean that they are no longer counts? It may be mean that the transformed values are not going to be in the "counts" slot as you would find it doing this: counts(dds, normalized=TRUE) or is it something else?

It is clear that the magnitude that you get after vst/rlog and counts(dds, normalized=TRUE) is not the same... but it is because that vst/rlog outputs in a log2 scale, isn't? (of course, there is a variance-stabilized transformation, but the results are in a log2 scale...?) So... this output will be log2 normalized and transformed counts...?

**The reason of this question is because I am wondering if I should save those transformed counts as "normalized_transformed" counts for the future. I used to save the counts(dds, normalized=TRUE) and those were the ones that I was using for downstream analyses... but now that I have discovered (and read more about) vst/rlog transformation, I will have to change the way of working and doing my analyses. But I am quite worried about the paragraph above, that they are no longer counts and I don't know if I understand everything properly.

Thanks in advance

Regards

Normalization DESeq2 rlog vst • 526 views

ADD COMMENT • link 10 weeks ago Eva ▴ 10

score 2 · Accepted Answer · 2024-10-11

They are not counts because counts are integers.

> z <- makeExampleDESeqDataSet()
## These are counts
> head(assay(z))
      sample1 sample2 sample3
gene1      55      23      46
gene2       7       9       0
gene3       6      45      79
gene4       1       1       0
gene5       0       1       2
gene6     206     461     187
      sample4 sample5 sample6
gene1      51      16      34
gene2       2      14       4
gene3       9      20      14
gene4       2       0       0
gene5       0       8      11
gene6     428     277     200
      sample7 sample8 sample9
gene1      48      56      11
gene2       9       1       7
gene3      17      17      17
gene4       0       6       9
gene5       0       4       3
gene6     270     157     481
      sample10 sample11 sample12
gene1       25       29       91
gene2        4        9        8
gene3       32       19        4
gene4        0        0        0
gene5        5        0        7
gene6      257      140      385

## These are not counts!
> head(assay(rlog(z)))
       sample1  sample2   sample3
gene1 5.508012 4.732097 5.2791903
gene2 2.532313 2.639302 1.9586419
gene3 3.507028 4.889190 5.3750272
gene4 0.283636 0.281633 0.1745515
gene5 1.261563 1.350452 1.4254712
gene6 7.727871 8.605355 7.5497238
        sample4   sample5   sample6
gene1 5.4296431 4.4794192 5.0229586
gene2 2.1618863 2.8903103 2.3107258
gene3 3.7026089 4.2296479 3.9381367
gene4 0.3784741 0.1766241 0.1749336
gene5 1.2612043 1.8266005 1.9365913
gene6 8.5333462 8.0548245 7.6361029
        sample7   sample8   sample9
gene1 5.3835566 5.5187194 4.2431675
gene2 2.6497815 2.0684901 2.5366996
gene3 4.1119435 4.1044945 4.1169204
gene4 0.1766834 0.6869912 0.8742537
gene5 1.2617262 1.5861930 1.5178622
gene6 8.0286134 7.4289983 8.6872464
       sample10  sample11  sample12
gene1 4.7867908 4.9386086 5.9969793
gene2 2.3177758 2.6500157 2.5894217
gene3 4.5787006 4.1931967 3.3466693
gene4 0.1755969 0.1766985 0.1764068
gene5 1.6452014 1.2617493 1.7717412
gene6 7.9341562 7.3198482 8.4157297

And these data are not intended for analysis using any count-based method. From ?varianceStabilizingTransformation:

Description:

     This function calculates a variance stabilizing transformation
     (VST) from the fitted dispersion-mean relation(s) and then
     transforms the count data (normalized by division by the size
     factors or normalization factors), yielding a matrix of values
     which are now approximately homoskedastic (having constant
     variance along the range of mean values). The transformation also
     normalizes with respect to library size. The 'rlog' is less
     sensitive to size factors, which can be an issue when size factors
     vary widely. These transformations are useful when checking for
     outliers or as input for machine learning techniques such as
     clustering or linear discriminant analysis.

So if you want to plot the data or do other downstream analyses you could use rlog or vst, but neither are meant to be used prior to analyzing the data.