Question: Dunn Index for Clusters from scRNA-Seq
0
gravatar for hamza_karakurt
6 months ago by
hamza_karakurt30 wrote:

Hello everyone, I want to try Dunn Index to validate my clustering results from scRNA-Seq data. I know the Dunn index starts from 0 and goes to infinity and higher results mean better clustering. I tried this method from scRNA-Seq which clustered with buildSNNGraph() function of Scater/Scran package and the graph is clustered with Louvain algorithm of igraph package. I tried range of k values and want to score them. Most of the Dunn indexes are between 0.08 and 0.1. Can I use these values to compare my clustering results or Dunn index is working for methods of distance based clustering rather than graph based clustering methods?

I know modularity function can be used in that cases but I saw that modularity decreases with increasing of k in buildSNNGraph and buildKNNGraph functions so I wanted to use a different method.

Thank you in advance

ADD COMMENTlink modified 6 months ago by Aaron Lun24k • written 6 months ago by hamza_karakurt30

Cross-posted on Biostars: https://www.biostars.org/p/368557/

ADD REPLYlink written 6 months ago by Kevin Blighe190
Answer: Dunn Index for Clusters from scRNA-Seq
1
gravatar for Aaron Lun
6 months ago by
Aaron Lun24k
Cambridge, United Kingdom
Aaron Lun24k wrote:

To answer your immediate question: I don't see an inherent problem with using the Dunn index to assess separation of clusters, provided you're willing to do all those distance calculations. But keep in mind that the clustering methods in igraph will attempt to maximize the modularity, not the Dunn index. If a graph-based clustering strategy gives you a higher modularity but a lower Dunn index, you can hardly say that it performs poorly - it's just doing its job.

You also don't mention what flavor of Dunn index is being used. If you're using the one that involves computing the ratio of the minimum inter-cluster distance to the maximum intra-cluster distance, I'd say that this is far too conservative to be useful in single-cell data. A single misassigned cell is enough to make your index very small, even if the rest of the clustering is fine.

I want to try Dunn Index to validate my clustering results from scRNA-Seq data.

Don't use the word "validation". Validation implies that there is some kind of truth to be found, but this isn't really the purpose of clustering, as we have already discussed. Currently, all that you're doing is to evaluate the separation between clusters, which is fine and useful but is a long way from establishing truth. If you want to "validate" something, you should be performing functional experiments to demonstrate that your clusters correspond to cells that have different biological behaviour.

I tried range of k values and want to score them.

Or you could just pick one and see if it's useful. Clustering doesn't have to be perfect, it just has to be good enough for downstream interpretation.

I know modularity function can be used in that cases but I saw that modularity decreases with increasing of k in buildSNNGraph and buildKNNGraph functions so I wanted to use a different method.

This is a natural consequence of increasing the number of connections in the graph. I would say that this is a feature rather than a bug, because increasing the connectivity allows us to obtain more granular clusters. In this manner, we can adjust the resolution as desired if there are too few/many clusters for further examination.

ADD COMMENTlink modified 6 months ago • written 6 months ago by Aaron Lun24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 267 users visited in the last hour