Question: choosing the normalization method (rlog, variance stabilizing transformation)
0
gravatar for lirongrossmann
15 months ago by
lirongrossmann20 wrote:

Hi everyone,

I was hoping to get an answer on an issue I have been struggling for a while. 

I have a raw count data from RNA-seq experiment and want to develop a model for separating two group of samples. I used Deseq2 to select my top genes and trained and test the model on the dataset using variance stabilizing transformation. 

To make sure my model is robust, I tried to use rlog and other normalization methods (TPM, RPKM) on the raw count matrix with the same set of selected genes. 

My problem - I get different performance of my model (different accuracy) depending on the normalization method (even between rlog and vsd). Of note, just by looking at the values of the normalized matrix, I can see that there is a substantial difference in the normalized counts between the different methods. For example, in one of the selected genes the normalized value for one sample is 4.328 using vsd and 0.02 using RPKM. I am not sure I fully understand where this big difference is coming from.

Anyone has encountered a similar situation? Any help would be appreciated.

Thanks!

ADD COMMENTlink modified 15 months ago by Michael Love22k • written 15 months ago by lirongrossmann20
Answer: choosing the normalization method (rlog, variance stabilizing transformation)
0
gravatar for Michael Love
15 months ago by
Michael Love22k
United States
Michael Love22k wrote:

The variance stabilizing transformations are very different from TPM and RPKM. These latter normalizations allow for comparison of values across genes, because they are proportional to original counts of transcripts. However, you will see that they are not variance stabilizing. Distances between samples will be highly weighted by contributions from gene with highest TPM. We recommend in the DESeq and DESeq2 papers to use variance stabilization when comparing samples e.g. using a distance metric, as it takes into account the precision of the measurements and reduces contributions of noise from genes with low counts.

ADD COMMENTlink written 15 months ago by Michael Love22k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 237 users visited in the last hour