Normalisation of rlog and VSD transformation
Entering edit mode
thjnant ▴ 10
Last seen 16 months ago


I am a bit confused about the rlog() and vsd() functions in DESeq2.

Using log2 transformation, it is quite clear how to proceed for normalisation and then log2 transformation of data. It is however not possible to normalise the data before applying rlog or vst on them. I see from the document that normalisation is done sort of behind the scene in these two cases.

I used the three transformations on my data. Please see the graph in the link below:

What I see is that the shape of the distribution looks quite different using log2 vs. rlog vs. vsd. In the case of the graph above, it seems if sufficient filtering of raw counts are done, log2 transformation works quite good. I was wondering what is your idea about that and also why rlog values start from below zero?

Thank you!

deseq2 rna-seq R • 301 views
Entering edit mode
Last seen 24 minutes ago
Republic of Ireland

I am not sure why one would want to do a blind log [base 2] transformation on count data. There are specific functions in DESeq2 for logging data and utilising it in downstream applications.

The general procedure can be regarded as:

  1. Derive raw or estimated counts (outside DESeq2)
  2. Import counts to DESeq2 (tximport and/or DESeq2)
  3. Normalise the counts via an estimation of size factors and gene-wide dispersion (DESeq2)
  4. conduct differential expression analysis on the normalised counts (DESeq2)
  5. transform the data for downstream applications (e.g., PCA, clustering, 'machine learning', etc.) via variance stabilisation or regularised log (DESeq2)

If, at any point, you wish to obtain 'normalised counts', then you can use:

counts(dds, normalized = TRUE)


Entering edit mode

Thanks a lot for your reply. I want to understand how different transformations work and how they differ from one another. I also like to see the effect of transformation on the normalised data. What I see in the plot for example is vst seems to be a bad choice for my data. The rlog gives a very strange left tail to the distribution and oddly enough, log2 transformation seems to do a better job if low counts are filtered enough. This was quite interesting for me. Yes, of course, for the differential expression, I just give the raw read count file to the DESeq function and that does the job. Thanks again.


Login before adding your answer.

Traffic: 422 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6