Hello everyone,
I try to use ConsensusClusterPlus to subgroup my samples. This package will generate a random seed number and put the seed number in the output log file. However, I found I couldn’t repeat the result with the seed number even with the domo dataset. The code and result I get is listed below.
The other question is about the reproducibility. Using my own data, the output from manually assigned a seed number (seed = as.numeric(Sys.time())) is very different from the output from randomly generate a seed number. All the “non_seeded_outputs” I get are very similar if I didn’t give any seed number (with bootstrap=1000). All the “seeded_outputs” I get are also similar to each other (also with bootstrap=1000). But the “non_seeded_outputs" and “seeded_outputs” are very different. :(
I appreciate any suggestions.
##### prepare data #####
data(ALL)
d = exprs(ALL)
mads = apply(d, 1, mad)
d = d[rev(order(mads)[1:5000]),]
d = sweep(d,1,apply(d,1,median, na.rm=T)) # for each value minused by the median of each column)
######## running consensus clustering ##########
rep_times = 50
title = paste("pam_no_seed_rep_", rep_times, sep="")
results1 = ConsensusClusterPlus(d,maxK=6, reps= rep_times, pItem=0.8, pFeature=0.8, title=title,clusterAlg="pam",distance="pearson",plot="png", writeTable = TRUE)
logInfo = read.delim(paste(getwd(), "/", title, "/", title, ".log.csv", sep=""), sep=",")
seed = logInfo[13,2]
title=paste(title, "_seed_", seed, sep="")
results2 = ConsensusClusterPlus(d,maxK=6, reps=rep_times, pItem=0.8, pFeature=0.8, title=title,clusterAlg="pam",distance="pearson",seed = 1475989793.02773, plot="png", writeTable = TRUE)
title=paste(title, "_2_seed_", seed, sep="")
results3 = ConsensusClusterPlus(d,maxK=6, reps=rep_times, pItem=0.8, pFeature=0.8, title=title,clusterAlg="pam",distance="pearson",seed = 1475989793.02773, plot="png", writeTable = TRUE)
identical(results1, results2)
identical(results2, results3)
i havent had the same problem running ccplus - results are always the same with seed param, but i dont normally resample the features - try setting that parameter to 1