Question: HOPACH error: "negative length vectors are not allowed" in distancematrix(). Matrix too large?
0
gravatar for ejliaw
4.6 years ago by
ejliaw0
United States
ejliaw0 wrote:

Greetings,

I was attempting to use HOPACH to cluster the rows of a 610758 x 9 matrix of floating points, but the distancematrix function gave the following error:

Error in .Call("R_disscosangle", as.vector(X), as.numeric(dim(X)[1]),  : 
  negative length vectors are not allowed

I can recreate the error as follows (session info included):

> library("hopach")
> test <- matrix(runif(1000*10), 1000, 10)
> my.dist <- distancematrix(test, "cosangle") # works
> dim(my.dist)
[1] 1000 1000
> test <- matrix(runif(610758*9), 610758, 9)
> my.dist <- distancematrix(test, "cosangle") # error message shows up immediately
Error in .Call("R_disscosangle", as.vector(X), as.numeric(dim(X)[1]),  : 
  negative length vectors are not allowed

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] hopach_2.26.0       Biobase_2.26.0      BiocGenerics_0.12.1
[4] cluster_2.0.1      

> test <- matrix(runif(100000*10), 100000, 10)
> my.dist <- distancematrix(test, "cosangle") # after a while we get a segfault

 *** caught segfault ***
address 0x7f60e326b000, cause 'invalid permissions'

Traceback:
 1: .Call("R_disscosangle", as.vector(X), as.numeric(dim(X)[1]),     as.numeric(dim(X)[2]), as.logical(na.rm))
 2: disscosangle(X, na.rm)
 3: distancematrix(test, "cosangle")

Am I running out of memory? (https://www.google.com/webhp?q=negative+length+vectors+are+not+allowed+r)

Cheers,

Eric

 

 

 

hopach • 4.2k views
ADD COMMENTlink modified 4.6 years ago by kpollard110 • written 4.6 years ago by ejliaw0
Answer: HOPACH error: "negative length vectors are not allowed" in distancematrix(). Mat
1
gravatar for kpollard
4.6 years ago by
kpollard110
United States
kpollard110 wrote:

Hi Eric - It indeed looks like you've hit the memory limit. The object my.dist that you are attempting to create is a vector of length 610758*610757/2. 

Best,

Katie

ADD COMMENTlink written 4.6 years ago by kpollard110

Hi Katie,

Would you have any suggestions for this situation? If the Internet is correct in saying that R (even 3.1.2) cannot allocate vectors longer than 2^31 - 1, is there a package somewhere that has circumvented this?

Thanks

ADD REPLYlink written 4.5 years ago by ejliaw0

R can allocate larger vectors (try integer(2^31), for instance, if your computer has enough memory!) but packages with C code have to be written to work with large vectors; packages that were developed before R supported large vectors (like hopach) are not likely to support these.

It seems like the reasonable statistical thing to do is to pre-process your data in some way to reduce its volume, e.g., by filtering on variability or kmeans-clustering followed by use of centroids (but these are naive suggestions, maybe Katie can provide something more substantive).

ADD REPLYlink written 4.5 years ago by Martin Morgan ♦♦ 24k

Thanks Martin,

Indeed, I've found that any function requiring a distance matrix, like the built in hclust(), cannot handle that many rows.

To see any pattern, I've been using kmeans with a large k (e.g. 80), then using hclust on the 80 cluster centroids, and finding an optimal ordering for the tree of the centroids (with the 'cba' package) to reorder my original matrix. Is this what you meant by using k-means clustering to pre-process data?

ADD REPLYlink written 4.5 years ago by ejliaw0

Yes that sounds approximately like what I was thinking.

ADD REPLYlink written 4.5 years ago by Martin Morgan ♦♦ 24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 191 users visited in the last hour