Question: FlowSOM and ConsensusClusterPlus: reproducibility issue
0
2.7 years ago by
Johns Hopkins University
Lukas Weber0 wrote:

Hi,

I have a question about reproducibility of meta-clustering results in FlowSOM.

My colleague Malgorzata Nowicka noticed a while ago that the 'metaClustering_consensus' function does not give reproducible results when setting a seed with 'set.seed()'. Then we saw that this is because 'metaClustering_consensus' internally calls 'ConsensusClusterPlus::ConsensusClusterPlus', which automatically sets the seed to 'as.numeric(Sys.time())' if it is not specified with the 'seed' argument; hence it ignores any seeds set externally with 'set.seed()'.

The FlowSOM authors kindly provided us with a bug fix which solved the problem (by including an additional seed argument in 'metaClustering_consensus'), but we have noticed that this bug fix was never included in the version on Bioconductor.

It would be great if this bug fix could be included in the release version on Bioconductor. We have found the FlowSOM package to be very useful in our CyTOF data analysis pipelines, and having this seed argument in the release version would make things easier for getting reproducible results.

In addition, it may also be useful for the ConsensusClusterPlus authors to consider removing the default setting of the seed to 'as.numeric(Sys.time())', since users will often set a seed with 'set.seed()' at the top of their analysis script, and expect it to propagate through.

I have pasted a copy of the updated 'FlowSOM::metaClustering_consensus' function below (with the additional seed argument), for reference.

Thanks again for creating this very useful package.

Best regards,

Lukas

> metaClustering_consensus

function(data, k=7, seed=NULL){
results <- suppressMessages(ConsensusClusterPlus::ConsensusClusterPlus(
t(data),
maxK=k, reps=100, pItem=0.9, pFeature=1,
title=tempdir(), plot="pdf", verbose=FALSE,
clusterAlg="hc",
distance="euclidean",
seed=seed
))

results[[k]]\$consensusClass
}