Difference between heatmap using Euclidian distance and Poisson distance
1
1
Entering edit mode
@thibaultlorin34-14148
Last seen 3.6 years ago

Using DESeq2 I would like to obtain a heatmap of sample-to-sample distances using the rlog-transformed values. However, I'm not sure if I should use the "Euclidian" distance or the "Poisson" distance (both are suggested here). I have obtained both graphs but don't know which one I should "trust". While I think I understand the "Euclidian" approach I'm not sure to get the advantages of the "Poisson" approach. I've tried reading the fundament paper (Witten 2011) but got lost at some point B-)

Could someone:

1. illustrate a simple sample case where both methods would give the same result?

2. illustrate a simple sample case where both methods would give different results?

3. explain the advantages and drawbacks of both methods?

Many thanks!

PS: Since this might concern Bioinformatics in general, I also posted this question on Biostars and got an answer.

deseq2 euclidean sample distance rnaseq • 938 views
3
Entering edit mode
@mikelove
Last seen 7 hours ago
United States

We have some benchmarks in the DESeq2 paper using Euclidean distance on variance stabilized counts compared to Poisson distance, see Supplementary Figure 17. Overall the methods performed similarly, but in our simulation, when the size factors had a very large range (e.g the smallest sequencing depth sample sequenced at 1/10 of the largest sequencing depth sample), the rlog performed a bit better in terms of how clustering using the distances recovered the 'true' clusters. But this is just one simulation, and I wouldn't extrapolate to all cases from this. The Poisson distance in my experience is just as useful at uncovering interesting patterns.

https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0550-8