Question

Dunn Index for Clusters from scRNA-Seq

0

Entering edit mode

hamza_karakurt ▴ 60

@hamza_karakurt-17704

Last seen 3.4 years ago

Turkey

Hello everyone, I want to try Dunn Index to validate my clustering results from scRNA-Seq data. I know the Dunn index starts from 0 and goes to infinity and higher results mean better clustering. I tried this method from scRNA-Seq which clustered with buildSNNGraph() function of Scater/Scran package and the graph is clustered with Louvain algorithm of igraph package. I tried range of k values and want to score them. Most of the Dunn indexes are between 0.08 and 0.1. Can I use these values to compare my clustering results or Dunn index is working for methods of distance based clustering rather than graph based clustering methods?

I know modularity function can be used in that cases but I saw that modularity decreases with increasing of k in buildSNNGraph and buildKNNGraph functions so I wanted to use a different method.

Thank you in advance

scRNA-Seq clustering scater scran dunn index • 2.1k views

ADD COMMENT • link updated 6.9 years ago by Aaron Lun ★ 29k • written 6.9 years ago by hamza_karakurt ▴ 60

0

Entering edit mode

Cross-posted on Biostars: https://www.biostars.org/p/368557/

ADD REPLY • link 6.9 years ago Kevin Blighe ★ 4.0k

score 1 · Answer 1 · 2019-03-10

To answer your immediate question: I don't see an inherent problem with using the Dunn index to assess separation of clusters, provided you're willing to do all those distance calculations. But keep in mind that the clustering methods in igraph will attempt to maximize the modularity, not the Dunn index. If a graph-based clustering strategy gives you a higher modularity but a lower Dunn index, you can hardly say that it performs poorly - it's just doing its job.

You also don't mention what flavor of Dunn index is being used. If you're using the one that involves computing the ratio of the minimum inter-cluster distance to the maximum intra-cluster distance, I'd say that this is far too conservative to be useful in single-cell data. A single misassigned cell is enough to make your index very small, even if the rest of the clustering is fine.

I want to try Dunn Index to validate my clustering results from scRNA-Seq data.

Don't use the word "validation". Validation implies that there is some kind of truth to be found, but this isn't really the purpose of clustering, as we have already discussed. Currently, all that you're doing is to evaluate the separation between clusters, which is fine and useful but is a long way from establishing truth. If you want to "validate" something, you should be performing functional experiments to demonstrate that your clusters correspond to cells that have different biological behaviour.

I tried range of k values and want to score them.

Or you could just pick one and see if it's useful. Clustering doesn't have to be perfect, it just has to be good enough for downstream interpretation.

I know modularity function can be used in that cases but I saw that modularity decreases with increasing of k in buildSNNGraph and buildKNNGraph functions so I wanted to use a different method.

This is a natural consequence of increasing the number of connections in the graph. I would say that this is a feature rather than a bug, because increasing the connectivity allows us to obtain more granular clusters. In this manner, we can adjust the resolution as desired if there are too few/many clusters for further examination.