Entering edit mode
Guest User
★
13k
@guest-user-4897
Last seen 9.6 years ago
Hello,
Are there ways in which using method = "pearson" and link = "average"
or "complete" in the hcluster function of amap fails? In other words,
is there a mathematical reason why the Pearson correlation as a
distance metric yields undefined clustering, or did I encounter a bug?
Full story:
I've ran into an error while using the hclust2treeview function in the
ctc package stemming from such a call to hcluster.
Code:
data <- read.table('file.dat', header=TRUE, sep='\t')
clusterings <- hclust2treeview(data, file='filename.cdt',
method='pearson', keep.hclust=TRUE) #this calls hr <-
hcluster(coverage, method = "pearson", link = "average") internally
I traced the problem to the 5 - 9th entries of hr$order, which had the
value -5744, thus throwing the following error when the negative
number was used as an index:
Error in `[.default`(xj, i) :
only 0's may be mixed with negative subscripts
Calls: hclust2treeview ... r2cdt -> [ -> [.data.frame -> [ -> [.factor
-> NextMethod
Execution halted
I tried using method = "euclidean", and no error appeared, but I would
prefer using another distance metric or know why I can't use the
Pearson correlation. My data file seemed to be correctly formatted and
comprised a header line followed by a matrix of non-negative
integers).
I found this related help thread: http://r.789695.n4.nabble.com
/hierarchical-clustering-with-pearson-s-coefficient-td4662788.html
Thanks,
Eric Liaw
Stanford University, undergraduate student
-- output of sessionInfo():
R version 2.15.2 (2012-10-26)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ctc_1.32.0 amap_0.8-7 Biobase_2.14.0
loaded via a namespace (and not attached):
[1] tools_2.15.2
--
Sent via the guest posting facility at bioconductor.org.