Question: HOPACH error: "negative length vectors are not allowed" in distancematrix(). Matrix too large?
0
4.6 years ago by
ejliaw0
United States
ejliaw0 wrote:

Greetings,

I was attempting to use HOPACH to cluster the rows of a 610758 x 9 matrix of floating points, but the distancematrix function gave the following error:

Error in .Call("R_disscosangle", as.vector(X), as.numeric(dim(X)[1]),  :
negative length vectors are not allowed

I can recreate the error as follows (session info included):

> library("hopach")
> test <- matrix(runif(1000*10), 1000, 10)
> my.dist <- distancematrix(test, "cosangle") # works
> dim(my.dist)
[1] 1000 1000
> test <- matrix(runif(610758*9), 610758, 9)
> my.dist <- distancematrix(test, "cosangle") # error message shows up immediately
Error in .Call("R_disscosangle", as.vector(X), as.numeric(dim(X)[1]),  :
negative length vectors are not allowed

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] hopach_2.26.0       Biobase_2.26.0      BiocGenerics_0.12.1
[4] cluster_2.0.1

> test <- matrix(runif(100000*10), 100000, 10)
> my.dist <- distancematrix(test, "cosangle") # after a while we get a segfault

*** caught segfault ***

Traceback:
1: .Call("R_disscosangle", as.vector(X), as.numeric(dim(X)[1]),     as.numeric(dim(X)[2]), as.logical(na.rm))
2: disscosangle(X, na.rm)
3: distancematrix(test, "cosangle")

Am I running out of memory? (https://www.google.com/webhp?q=negative+length+vectors+are+not+allowed+r)

Cheers,

Eric

hopach • 4.2k views
modified 4.6 years ago by kpollard110 • written 4.6 years ago by ejliaw0
Answer: HOPACH error: "negative length vectors are not allowed" in distancematrix(). Mat
1
4.6 years ago by
kpollard110
United States
kpollard110 wrote:

Hi Eric - It indeed looks like you've hit the memory limit. The object my.dist that you are attempting to create is a vector of length 610758*610757/2.

Best,

Katie

Hi Katie,

Would you have any suggestions for this situation? If the Internet is correct in saying that R (even 3.1.2) cannot allocate vectors longer than 2^31 - 1, is there a package somewhere that has circumvented this?

Thanks

R can allocate larger vectors (try integer(2^31), for instance, if your computer has enough memory!) but packages with C code have to be written to work with large vectors; packages that were developed before R supported large vectors (like hopach) are not likely to support these.

It seems like the reasonable statistical thing to do is to pre-process your data in some way to reduce its volume, e.g., by filtering on variability or kmeans-clustering followed by use of centroids (but these are naive suggestions, maybe Katie can provide something more substantive).

Thanks Martin,

Indeed, I've found that any function requiring a distance matrix, like the built in hclust(), cannot handle that many rows.

To see any pattern, I've been using kmeans with a large k (e.g. 80), then using hclust on the 80 cluster centroids, and finding an optimal ordering for the tree of the centroids (with the 'cba' package) to reorder my original matrix. Is this what you meant by using k-means clustering to pre-process data?