Question

Difference between heatmap using Euclidian distance and Poisson distance

1

Entering edit mode

thibault.lorin34 ▴ 20

@thibaultlorin34-14148

Last seen 5.8 years ago

Using DESeq2 I would like to obtain a heatmap of sample-to-sample distances using the rlog-transformed values. However, I'm not sure if I should use the "Euclidian" distance or the "Poisson" distance (both are suggested here). I have obtained both graphs but don't know which one I should "trust". While I think I understand the "Euclidian" approach I'm not sure to get the advantages of the "Poisson" approach. I've tried reading the fundament paper (Witten 2011) but got lost at some point B-)

Could someone:

1. illustrate a simple sample case where both methods would give the same result?

2. illustrate a simple sample case where both methods would give different results?

3. explain the advantages and drawbacks of both methods?

Many thanks!

PS: Since this might concern Bioinformatics in general, I also posted this question on Biostars and got an answer.

deseq2 euclidean sample distance rnaseq • 2.4k views

ADD COMMENT • link updated 6.5 years ago by Michael Love 41k • written 6.5 years ago by thibault.lorin34 ▴ 20

score 4 · Answer 1 · 2017-10-17

We have some benchmarks in the DESeq2 paper using Euclidean distance on variance stabilized counts compared to Poisson distance, see Supplementary Figure 17. Overall the methods performed similarly, but in our simulation, when the size factors had a very large range (e.g the smallest sequencing depth sample sequenced at 1/10 of the largest sequencing depth sample), the rlog performed a bit better in terms of how clustering using the distances recovered the 'true' clusters. But this is just one simulation, and I wouldn't extrapolate to all cases from this. The Poisson distance in my experience is just as useful at uncovering interesting patterns.

https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0550-8