Search
Question: an unexplained phenomenon using variance stabilizing transformation for downstream analysis
0
gravatar for lirongrossmann
6 days ago by
lirongrossmann0 wrote:

Hi Everyone,

I am using variance stabilizing transformation (vsd from now on) for normalization in order to perform downstream analysis on a raw count expression matrix. To be specific, I have two groups (say group A and group B) that I want to separate based on the expression levels of certain genes. I found several genes that separate the two groups (using Deseq2) and want to test my hypothesis using an independent set of samples. When using the vsd on the ENTIRE test set (group A+ group B), I get that genes separate the two groups with a certain accuracy. When I use  vsd on each group of the test set separately (i.e. vsd on group A and vsd on group B), I get the the two groups are separated even better based on these genes. 

Why is it when I run vsd on A+B I get different results when I ran vsd on A and vsd on B? I assume vsd takes the interaction between the samples, so is there a way to eliminate it? Should I use a different normalization method? If so which one is recommended?

Thanks!

ADD COMMENTlink modified 1 day ago • written 6 days ago by lirongrossmann0
1
gravatar for Wolfgang Huber
1 day ago by
EMBL European Molecular Biology Laboratory
Wolfgang Huber13k wrote:

The variance stabilizing transformation in DESeq2 is not a normalization method, it is (as the name says) a transformation. For normalization, it uses the usual DESeq2 estimateSizeFactors normalization.

It is normal and expected that the results differ if you call estimateSizeFactors on the complete matrix, versus if you call it on the A and B subsets separately. The latter is wrong since it defies the purpose of normalization. So you can ignore the result from that analysis.

As always, posting reproducible code examples and session_info would help.

 

ADD COMMENTlink written 1 day ago by Wolfgang Huber13k
0
gravatar for lirongrossmann
1 day ago by
lirongrossmann0 wrote:

Thank you very much for the clarification!!

My biggest issue is which values to use in order to train the model? Should I use the normalized values or should I used the transformed values (which if I understand correctly are also normalized). I used "mat" to train my model based on the "Response classification" (see code below).

Here is my code:


ep<-read.table("exp.train.txt",header = TRUE, row.names = 1) 

cp<-read.csv("train.csv")

dds <-DESeqDataSetFromMatrix(countData = ep,colData = cp,design =~Response)

dds <- dds[ rowSums(counts(dds)) > 1, ]

dds <- estimateSizeFactors(dds)

vsd <- varianceStabilizingTransformation(dds)

mat<-assay(vsd)

ADD COMMENTlink written 1 day ago by lirongrossmann0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 416 users visited in the last hour