Question

Different clustering results of DESeq2 counts and rlog

0

Entering edit mode

ezz • 0

@ezz-7643

Last seen 9.0 years ago

United States

-After i downloaded the TCGA raw gene counts. I created a DESeq object using Design=~1

-To visualize the normalized effect I ran this code lon <- log2(counts(dds, normalized=TRUE) + 1)

-I also ran this code r<-rlog(dds)

clustering both matrices results in different outcome.

Any insight???

tcga nomalization rnaseq deseq2 • 3.5k views

ADD COMMENT • link 9.0 years ago ezz • 0

score 1 · Answer 1 · 2015-05-13

Yes, rlog should give different clustering results from a simple log2 transformation. The effect of rlog in clustering is to give less weight to genes with small counts, since small counts are more affected by counting noise and thus carry less information about the biology of the samples. If your samples' clustering pattern is strongly influenced by genes with small counts, then rlog, as well as any other kind of variance-stabilizing transformation, will most definitely change that clustering pattern. This change is usually an improvement.

score 0 · Answer 2 · 2015-05-13

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 14 months ago

United States

Yes, of course: this is exactly the point of using these transformations vs +1 smoothing followed by log2 transformation.

This is covered in the entirety of section 2 of the DESeq2 vignette, which would be a worthwhile read.

ADD COMMENT • link 9.0 years ago Steve Lianoglou ★ 13k

score 0 · Answer 3 · 2015-05-13

0

Entering edit mode

ezz • 0

@ezz-7643

Last seen 9.0 years ago

United States

Thank you so much...I found out that rlog should not be used in zero infested matrix with large amount of samples.

ADD COMMENT • link 9.0 years ago ezz • 0

0

Entering edit mode

"Zero inflated". This means, within a biological condition, observing counts like: {0,0,0,1000,0,2000,0,0,3000}. This is not a good match for the negative binomial, but is better modeled by a combination of two distributions: one component with a spike at 0 and another component for the large counts.

ADD REPLY • link 9.0 years ago Michael Love 42k