Question

HOPACH error: "negative length vectors are not allowed" in distancematrix(). Matrix too large?

0

Entering edit mode

ejliaw • 0

@ejliaw-7382

Last seen 6.3 years ago

United States

Greetings,

I was attempting to use HOPACH to cluster the rows of a 610758 x 9 matrix of floating points, but the distancematrix function gave the following error:

Error in .Call("R_disscosangle", as.vector(X), as.numeric(dim(X)[1]),  : 
  negative length vectors are not allowed

I can recreate the error as follows (session info included):

> library("hopach")
> test <- matrix(runif(1000*10), 1000, 10)
> my.dist <- distancematrix(test, "cosangle") # works
> dim(my.dist)
[1] 1000 1000
> test <- matrix(runif(610758*9), 610758, 9)
> my.dist <- distancematrix(test, "cosangle") # error message shows up immediately
Error in .Call("R_disscosangle", as.vector(X), as.numeric(dim(X)[1]),  : 
  negative length vectors are not allowed

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] hopach_2.26.0       Biobase_2.26.0      BiocGenerics_0.12.1
[4] cluster_2.0.1      

> test <- matrix(runif(100000*10), 100000, 10)
> my.dist <- distancematrix(test, "cosangle") # after a while we get a segfault

 *** caught segfault ***
address 0x7f60e326b000, cause 'invalid permissions'

Traceback:
 1: .Call("R_disscosangle", as.vector(X), as.numeric(dim(X)[1]),     as.numeric(dim(X)[2]), as.logical(na.rm))
 2: disscosangle(X, na.rm)
 3: distancematrix(test, "cosangle")

Am I running out of memory? (https://www.google.com/webhp?q=negative+length+vectors+are+not+allowed+r)

Cheers,

Eric

hopach • 5.7k views

ADD COMMENT • link updated 9.0 years ago by kpollard ▴ 110 • written 9.1 years ago by ejliaw • 0

score 1 · Accepted Answer · 2015-04-12

1

Entering edit mode

kpollard ▴ 110

@kpollard-7578

Last seen 8.9 years ago

United States

Hi Eric - It indeed looks like you've hit the memory limit. The object my.dist that you are attempting to create is a vector of length 610758*610757/2.

Best,

Katie

ADD COMMENT • link 9.0 years ago kpollard ▴ 110

0

Entering edit mode

Hi Katie,

Would you have any suggestions for this situation? If the Internet is correct in saying that R (even 3.1.2) cannot allocate vectors longer than 2^31 - 1, is there a package somewhere that has circumvented this?

Thanks

ADD REPLY • link 8.9 years ago ejliaw • 0

0

Entering edit mode

R can allocate larger vectors (try integer(2^31), for instance, if your computer has enough memory!) but packages with C code have to be written to work with large vectors; packages that were developed before R supported large vectors (like hopach) are not likely to support these.

It seems like the reasonable statistical thing to do is to pre-process your data in some way to reduce its volume, e.g., by filtering on variability or kmeans-clustering followed by use of centroids (but these are naive suggestions, maybe Katie can provide something more substantive).

ADD REPLY • link 8.9 years ago Martin Morgan 25k

0

Entering edit mode

Thanks Martin,

Indeed, I've found that any function requiring a distance matrix, like the built in hclust(), cannot handle that many rows.

To see any pattern, I've been using kmeans with a large k (e.g. 80), then using hclust on the 80 cluster centroids, and finding an optimal ordering for the tree of the centroids (with the 'cba' package) to reorder my original matrix. Is this what you meant by using k-means clustering to pre-process data?