Question: DESeq2 normalization vs VST vs rlog
0
gravatar for Jonas B.
12 weeks ago by
Jonas B.0
Belgium, Antwerp, University of Antwerp
Jonas B.0 wrote:

Hi all,

after consulting the manual on data normalization, I have one question left to ask:

The way I see it, there are 4 ways described to obtain normalized data:

  • The first one is to extract data, normalized using the normalization factors for a gene x sample matrix, and size factors for a single number per sample. This can be done using the following code:

    counts(dds, normalized=TRUE)

  • The second way is to perform log2 transformation log2(n + 1), using the following function:

    normTransform(dds)

  • The third and fourth way is to use the vst and rlog transformation, using the following functions respectively: vst(dds, blind=FALSE) rlog(dds, blind=FALSE)

When I just got started, I used the the first function (counts(dds, normalized=TRUE)), to obtain the normalized data, which I later used for clustering etc. . However, now I doubt that this was the correct decision and that the normalized data, obtained this way, is only used during the DE genes analysis and that for clustering, the second, third and fourth way of normalization is preferred.

I was hoping that any of you could share a more expert opinion on the what normalization to use and whether or not the "counts(dds, normalized=TRUE)" is a viable option as well.

Thank you a lot in advance.

Kind regards, Jonas

deseq2 • 197 views
ADD COMMENTlink modified 12 weeks ago by Michael Love26k • written 12 weeks ago by Jonas B.0

As a side note: I did find a recent question addressing normalization ( https://support.bioconductor.org/p/123651/ ) , however it leaves my question unanswered on whether or not I could also use the counts function ( I guess it's wrong, but I am not sure. Maybe it is still usable... ) and which one is most commonly used/advised. Any opinions shared are much appreciated!

ADD REPLYlink modified 12 weeks ago • written 12 weeks ago by Jonas B.0

It came to mind that the function: counts(dds, normalized=TRUE), might already return log2 transformed data? (However, this is not described in: https://www.rdocumentation.org/packages/DESeq2/versions/1.12.3/topics/counts)

ADD REPLYlink written 12 weeks ago by Jonas B.0
Answer: DESeq2 normalization vs VST vs rlog
0
gravatar for Michael Love
12 weeks ago by
Michael Love26k
United States
Michael Love26k wrote:

Take a look at the workflow (linked from the top of the vignette).

There we suggest to use transformations for anything involving a distance (also we say this in the DESeq2 paper). We give reasons for this suggestion and in the paper we evaluated alternatives.

My preferred transformation of the two we provide is VST, because it is fast.

ADD COMMENTlink written 12 weeks ago by Michael Love26k

Dear Michael, thank you for your quick reply.

I've read the vignette and in the future I will definitely go for VST then.

About the normalized counts I've obtained using the function "counts(dds, normalized=TRUE)":

  • The normalized counts obtained here, are they also log2 transformed? I was unable to find this in the vignette, but a section on "Heatmap of the count matrix" does address this normalization, next to VST and rlog, and it seems to work fine. I am trying to assess what these normalized counts can be used for and whether my previous findings using these normalized count are still valid. (Ofcourse, I will repeat the normalization using VST in the long run.)
ADD REPLYlink written 12 weeks ago by Jonas B.0

That’s not using counts() in the plot. Take a closer look at the code.

ADD REPLYlink written 12 weeks ago by Michael Love26k

Indeed, I am sorry, it is used in code where 20 genes get preselected, on which later on the normTransform function (log2(n+1)) was performed. I should have looked more carefully.

Do you mind still sharing the answer to my previous question concerning the function "counts(dds, normalized=TRUE)"?

  • The normalized counts obtained here, are they also log2 transformed?

  • Is the normalization only used for differential expression analysis or could it also have value for clustering later on (even though it is not recommended by the vignette - I'm asking this because I want to assess the value of my previous analyses)?

Thank you in advance for your time.

ADD REPLYlink written 12 weeks ago by Jonas B.0

I think it’s pretty clear from documentation that this gives counts divided by size factors. So, no, it is not log2 transforming and there is in fact a separate function for producing log2 transformed counts...

I do not recommend clustering untransformed data. There was a recent post about this on the support site, but again the reasons are in the documentation and also in the publication.

ADD REPLYlink written 12 weeks ago by Michael Love26k

Dear Micheal, Thank you for your time and answers. It's all clear now. Kind regards, Jonas

ADD REPLYlink written 12 weeks ago by Jonas B.0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 346 users visited in the last hour