Question: Silhouette Plots for Clustering Quality (Multiple Algorithm)
gravatar for hamza_karakurt
10 months ago by
hamza_karakurt30 wrote:

Hello everyone, I am working on a scRNA-Seq data with multiple clustering algorithms. I want to see the quality of clustering with Silhouette plot function of "cluster" package. As I know, the function requires a distance matrix. Now my data is stored in Seurat object (can be transformed to SingleCellExperiment) and clustering with Seurat method provides an SNN matrix. Which option would be better to use?

1) Using SNN matrix as distance matrix

2) Calculating a new distance matrix with dist() function. (Actually I tried with Euclidead distance but did not work. Plots look amazingly low quality). My data is really big so I am sure it will take so much time. I can create the distance matrix with principle components (let's say first 50 PCs).

Thank you in advance.

ADD COMMENTlink modified 10 months ago by Aaron Lun25k • written 10 months ago by hamza_karakurt30
Answer: Silhouette Plots for Clustering Quality (Multiple Algorithm)
gravatar for Aaron Lun
10 months ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

As I have said many times, Seurat is not a Bioconductor package, so I don't know why people keep on asking for help here. If you want an authoritative answer from the Seurat authors, you should contact them directly.

Having said that, you might consider the clusterModularity() function from scran. See Section 9.2 of the simpleSingleCell workflow for details on its usage (link). Don't use silhouette plots, you'll probably crash your computer.

ADD COMMENTlink written 10 months ago by Aaron Lun25k

Thank you Aaron, This function is looking so useful. I believe the SNN matrix is a similarity matrix. I tried silhouette plots before but the results were always so bad (-1). Now I see the reason. I just have a small question. Does buildSNNGraph uses Euclidean distance or Jaccard index as a measure?

Thank you again.

ADD REPLYlink written 9 months ago by hamza_karakurt30

I think you have your ideas mixed up here. To clarify:

buildSNNGraph uses the Euclidean distance to identify pairs of cells with shared neighbours. It creates a link between these paired cells, weighted based on the maximum average rank of the set of shared neighbours. (That is, a pair of cells that share their closest neighbour will have a high-weight link, while a pair of cells that only share their furthest neighbour will have a zero-weight link.) This is as described by Xu and Su (2015), see the References mentioned in ?buildSNNGraph.

You can also set type="number", which will define weights for each link based on the number of shared nearest neighbors. This ignores the ranking of the shared neighbours entirely, and is closest to the "Jaccard clustering" done by Seurat. I don't use this much, mostly because I'm lazy and this setting is not the default.

There is no choice between Euclidean distances and Jaccard indices (not that the latter is ever used, anyway). If you like, you can switch to Manhattan distances for the NN search - at least in the BioC-devel branch - by setting BNPARAM=KmknnParam(distance="manhattan"). But I don't see a strong reason to do this.

ADD REPLYlink modified 9 months ago • written 9 months ago by Aaron Lun25k

Hello Aaron, Thank you for your answer. You are right I am little confused. BuildSNNGraph uses Euclidean distance but the SNN matrix is a similarity matrix I believe (that's why silhouetteplots with that matrix does not work and you added warning to tutorial "We do not use the silhouette coefficient to assess clustering for large datasets. This is because cluster::silhouette requires the construction of a distance matrix, which may not be feasible when many cells are involved"). Then I will try type parameter and check the quality of clustering with modularity() function and consider the highest modularity as the best graph. I also want to use cluster_louvain() function of cluster package with the igraph object (SNN matrix). I am sorry for many questions and confused comments.

Thanks in advance

ADD REPLYlink written 9 months ago by hamza_karakurt30
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 303 users visited in the last hour