Different clustering results of DESeq2 counts and rlog
3
0
Entering edit mode
ezz • 0
@ezz-7643
Last seen 6.4 years ago
United States

-After i downloaded the TCGA raw gene counts. I created a DESeq object using Design=~1

-To visualize the normalized effect I ran this code lon <- log2(counts(dds, normalized=TRUE) + 1)

-I also ran this code r<-rlog(dds)

clustering both matrices results in different outcome.

Any insight???

tcga nomalization rnaseq deseq2 • 2.6k views
1
Entering edit mode
@ryan-c-thompson-5618
Last seen 12 months ago
Scripps Research, La Jolla, CA

Yes, rlog should give different clustering results from a simple log2 transformation. The effect of rlog in clustering is to give less weight to genes with small counts, since small counts are more affected by counting noise and thus carry less information about the biology of the samples. If your samples' clustering pattern is strongly influenced by genes with small counts, then rlog, as well as any other kind of variance-stabilizing transformation, will most definitely change that clustering pattern. This change is usually an improvement.

0
Entering edit mode
@steve-lianoglou-2771
Last seen 3 days ago
Denali

Yes, of course: this is exactly the point of using these transformations vs +1 smoothing followed by log2 transformation.

This is covered in the entirety of section 2 of the DESeq2 vignette, which would be a worthwhile read.

0
Entering edit mode
ezz • 0
@ezz-7643
Last seen 6.4 years ago
United States

Thank you so much...I found out that rlog should not be used in zero infested matrix with large amount of samples.

0
Entering edit mode

"Zero inflated". This means, within a biological condition, observing counts like: {0,0,0,1000,0,2000,0,0,3000}. This is not a good match for the negative binomial, but is better modeled by a combination of two distributions: one component with a spike at 0 and another component for the  large counts.