problem with impute.knn in the impute package
0
0
Entering edit mode
@he-yiwen-nihcit-1177
Last seen 9.6 years ago
Hi, I have R version 2.0.1 and bioconductor 1.5 on both PC and Unix. I was trying to use the impute.knn function of the impute package on a dataset of 7332 genes and 3 arrays: > library(impute) > dim(dd) [1] 7332 3 > is.matrix(dd) [1] TRUE > dd.imputed <- impute.knn(dd) When run on PC (windows XP), the R program crashes after a few seconds. When run on a unix box, I can see such output: Cluster size 7332 broken into 5667 1665 Cluster size 5667 broken into 4141 1526 Cluster size 4141 broken into 1796 2345 Cluster size 1796 broken into 840 956 Done cluster 840 Done cluster 956 Done cluster 1796 And R session was closed. So the clustering was started but aborted somewhere in the middle. I searched the archive and found another report of such problem, for a dataset of 30000 x 2, but with no answers. I have some interesting findings playing around with the parameters and data size: 1). > impute.knn(dd, k=3) works, but for k bigger than 3, R crashes as described. 2). > dd2 <- cbind(dd,dd) > dim(dd2) [1] 7332 6 > impute.knn(dd2, k=8) works, but for k bigger than 8, R crashes. 3). > dd3 <- cbind(dd, dd, dd) > dim(dd3) [1] 7332 9 > impute.knn(dd3) works. (k defaults to 10) > impute.knn(dd3, k=17) R crashes. I also played around with other parameters but they didn't help. My conclusion is that the number of neighbors (k) is critical here. However, it's not straightforward how to set it based on data size. Can anybody help, or at least point me to the maintainer of the impute package? Thanks, Yiwen Yiwen He Contractor Center for Information Technology National Institute of Health
Clustering impute Clustering impute • 1.1k views
ADD COMMENT

Login before adding your answer.

Traffic: 707 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6