Consensus Cluster Plus pre-computed distance matrix
1
0
Entering edit mode
neuro3030 • 0
@neuro3030-15768
Last seen 5.1 years ago

In the package ConsensusClusterPlus, there is an option to input a pre-computed distance matrix to speed up the computation time. In the reference manual, it states this is because ConsensusClusterPlus re-calculates a distance matrix for each iteration.

Thus, I have pre-computed the distance matrix for a very large dataset (~700 samples with ~50,000 rows). However, when I input this distance object into ConsensusClusterPlus, the computation time is dramatically INCREASED and struggles to get past the first iteration. Of note, the "dist" object is very large for this large dataset (approx. 4-6 gb). Although, given the distance is pre-calculated, wouldn't this save time during consensus clustering?

Any ideas would be great. Thanks.

consensusclusterplus distance clustering • 1.2k views
ADD COMMENT
0
Entering edit mode
chris86 ▴ 420
@chris86-8408
Last seen 4.4 years ago
UCL, United Kingdom

May be better to pre-filter your data-set features based on variance. Your unlikely to need 50,000 features.

I also find the delta K with that method subjective to what constitutes the best number of clusters and it can't handle higher numbers of clusters. An alternative to this is M3C which uses the PAC score and various derivatives of this, if time is an issue it has a fast mode or lower the iterations param and it works well with PAM I find (https://www.bioconductor.org/packages/devel/bioc/html/M3C.html). Another good alternative, I have tested quite extensively, is CLEST (https://rdrr.io/cran/RSKC/man/Clest.html).

ADD COMMENT

Login before adding your answer.

Traffic: 884 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6