Question about seed in ConsensusClusterPlus, can't repeat the result
0
0
Entering edit mode
afraTW • 0
@afratw-11637
Last seen 8.1 years ago

Hello everyone,

I try to use ConsensusClusterPlus to subgroup my samples. This package will generate a random seed number and put the seed number in the output log file. However, I found I couldn’t repeat the result with the seed number even with the domo dataset. The code and result I get is listed below. 

The other question is about the reproducibility. Using my own data, the output from manually assigned a seed number (seed = as.numeric(Sys.time())) is very different from the output from randomly generate a seed number. All the “non_seeded_outputs” I get are very similar if I didn’t give any seed number (with bootstrap=1000). All the “seeded_outputs” I get are also similar to each other (also with bootstrap=1000). But the “non_seeded_outputs" and “seeded_outputs” are very different.  :(

I appreciate any suggestions.

                                                                                                                          


##### prepare data #####
data(ALL)
d = exprs(ALL)
mads = apply(d, 1, mad)
d = d[rev(order(mads)[1:5000]),] 
d = sweep(d,1,apply(d,1,median, na.rm=T)) # for each value minused by the median of each column)
######## running consensus clustering ##########

rep_times = 50

title = paste("pam_no_seed_rep_", rep_times, sep="")

results1 = ConsensusClusterPlus(d,maxK=6, reps= rep_times, pItem=0.8, pFeature=0.8, title=title,clusterAlg="pam",distance="pearson",plot="png", writeTable = TRUE)

logInfo = read.delim(paste(getwd(), "/", title, "/", title, ".log.csv", sep=""), sep=",")
seed = logInfo[13,2]

title=paste(title, "_seed_", seed, sep="")
results2 = ConsensusClusterPlus(d,maxK=6, reps=rep_times, pItem=0.8, pFeature=0.8, title=title,clusterAlg="pam",distance="pearson",seed = 1475989793.02773, plot="png", writeTable = TRUE)

title=paste(title, "_2_seed_", seed, sep="")
results3 = ConsensusClusterPlus(d,maxK=6, reps=rep_times, pItem=0.8, pFeature=0.8, title=title,clusterAlg="pam",distance="pearson",seed = 1475989793.02773, plot="png", writeTable = TRUE)
identical(results1, results2)

identical(results2, results3)

 

 

bioconductor consensusclusterplus clustering • 1.8k views
ADD COMMENT
0
Entering edit mode

i havent had the same problem running ccplus - results are always the same with seed param, but i dont normally resample the features - try setting that parameter to 1

ADD REPLY

Login before adding your answer.

Traffic: 666 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6