I am using variance stabilizing transformation (vsd from now on) for normalization in order to perform downstream analysis on a raw count expression matrix. To be specific, I have two groups (say group A and group B) that I want to separate based on the expression levels of certain genes. I found several genes that separate the two groups (using Deseq2) and want to test my hypothesis using an independent set of samples. When using the vsd on the ENTIRE test set (group A+ group B), I get that genes separate the two groups with a certain accuracy. When I use vsd on each group of the test set separately (i.e. vsd on group A and vsd on group B), I get the the two groups are separated even better based on these genes.
Why is it when I run vsd on A+B I get different results when I ran vsd on A and vsd on B? I assume vsd takes the interaction between the samples, so is there a way to eliminate it? Should I use a different normalization method? If so which one is recommended?