Question: Consensus Cluster Plus pre-computed distance matrix
0
gravatar for neuro3030
8 months ago by
neuro30300
neuro30300 wrote:

In the package ConsensusClusterPlus, there is an option to input a pre-computed distance matrix to speed up the computation time. In the reference manual, it states this is because ConsensusClusterPlus re-calculates a distance matrix for each iteration.

Thus, I have pre-computed the distance matrix for a very large dataset (~700 samples with ~50,000 rows). However, when I input this distance object into ConsensusClusterPlus, the computation time is dramatically INCREASED and struggles to get past the first iteration. Of note, the "dist" object is very large for this large dataset (approx. 4-6 gb). Although, given the distance is pre-calculated, wouldn't this save time during consensus clustering?

Any ideas would be great. Thanks.

ADD COMMENTlink modified 8 months ago by chris86390 • written 8 months ago by neuro30300
Answer: Consensus Cluster Plus pre-computed distance matrix
0
gravatar for chris86
8 months ago by
chris86390
UCL, United Kingdom
chris86390 wrote:

May be better to pre-filter your data-set features based on the co efficient of variation. Your unlikely to need 50,000 features.

I also find the delta K with that method subjective to what constitutes the best number of clusters and it can't handle higher numbers of clusters. An alternative to this is M3C which uses the PAC score and various derivatives of this, if time is an issue it has a fast mode or lower the iterations param and it works well with PAM I find (https://www.bioconductor.org/packages/devel/bioc/html/M3C.html). Another good alternative, I have tested quite extensively, is CLEST (https://rdrr.io/cran/RSKC/man/Clest.html).

ADD COMMENTlink modified 8 months ago • written 8 months ago by chris86390
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 226 users visited in the last hour