Question

Clustering in R....

0

Entering edit mode

Marcus ▴ 150

@marcus-410

Last seen 11.2 years ago

>Hello again. Back from some weeks of laborative work I still have some >questions on clustering in R. > >I got a lot of help from Sean Davis (thanks a lot :o) ) so if he or >someone else have the time.... > >My problem is that I have some spots flagges as NA in a matrix of M-values >organised slidewise. I want to cluster those but I get error messages when >using heatmap due to the NA:s in the matrix. I mailed Andy Liaw (who wrote >the heatmap function) and he gave med the tip to look into the daisy >function. And the daisy function is supposed to handle NA:s. > >But what do you get out of the function? > >test <- daisy(mymatrix) >This creates an object of type dissimilarity right? And you can convert it >into a matrix with the help of >testII <- as.matrix(test) >Is this what I should use hclust on? or should I do >testIII <- as.dist(testII) before. Neither works so I do not know really >what is true. > >And I tried to use daisy directly with heatmap but that didnt work but >produced the same error as with dist. > >heatmap(mymatrix[1:22,], distfun = dist) >Error in hclustfun(distfun(x)) : NA/NaN/Inf in foreign function call (arg 11) >This is due to the fact that I only have 2 M-values in the twentisecond >row and 16 NA:s. > >So basically my question is, how do you do to get heatmap to work with a >matrix of M-values that has got spots flagged NA in them ? What distance >function works and how do you use it? > >Could someone please help me and perhaps write an example of how to do. I >think the help files are not so good in this perspective. > >Best regards > >/ Marcus ********************************************************************** ********************* Marcus Gry Bj?rklund Royal Institute of Technology AlbaNova University Center Stockholm Center for Physics, Astronomy and Biotechnology Department of Molecular Biotechnology 106 91 Stockholm, Sweden Phone (office): +46 8 553 783 45 Fax: + 46 8 553 784 81 Visiting adress: Roslagstullsbacken 21, Floor 3 Delivery adress: Roslagsv?gen 30B

Clustering Clustering • 5.2k views

ADD COMMENT • link updated 22.0 years ago by Sean Davis 21k • written 22.0 years ago by Marcus ▴ 150

score 0 · Answer 1 · 2003-11-11

Marcus, Here is a fairly general method for working with heatmap that I have used. You can substitute any function that you want for distance (eg., 1-correlation, etc.) and for clustering (don't have to use hclust). Make sure that you do the coercion (to distance or dendrogram objects as needed), though. Also, some distance functions that you can dream up will not work with NA's, but dist does. > m <- matrix(rnorm(100),nrow=10,ncol=10) > m [,1] [,2] [,3] [,4] [,5] [,6] [1,] -1.0326191 1.09744204 0.9923254 -0.05780237 1.6853566 -0.5938021 [2,] -0.6493561 -0.58846041 0.8735639 0.34492342 -0.1398261 1.4288108 [3,] -1.0020073 0.75130128 -2.6110435 1.27265445 0.1211387 0.7048981 [4,] -0.1658810 0.45351434 -0.8973168 -0.17738084 -0.1056792 -1.7251339 [5,] 0.1466563 0.11917823 0.9372353 0.29040600 0.8463049 0.9192848 [6,] 0.6020565 -0.90338771 -0.7453363 -1.34284821 -0.7684490 0.2177409 [7,] 0.5290555 0.58798246 0.4085396 0.63305003 0.2014624 -0.5613248 [8,] 1.4456958 0.06372875 0.1829127 0.20681971 0.5745696 -0.3555856 [9,] 0.5973093 -0.35483585 1.1074023 0.63930734 -1.2452399 -1.2721422 [10,] 1.2563169 0.92249574 -0.7103717 -0.41067056 0.2277188 0.3861969 [,7] [,8] [,9] [,10] [1,] -1.63852314 -1.0773165 0.5601368 1.05115476 [2,] -0.14026278 -0.9013605 0.1581475 0.36730440 [3,] 0.45517561 -1.5211124 -1.1641732 1.97321531 [4,] 0.08338336 1.4846938 0.3096862 0.44513675 [5,] 0.85917332 1.0337033 -0.1784938 -0.48848017 [6,] 0.05054810 1.3712665 -0.6545246 0.10251154 [7,] 2.30894410 -0.6089214 1.5761573 0.66912925 [8,] -0.85946317 0.0855971 -0.7014037 -2.19050881 [9,] 1.53911617 1.1185075 0.2428764 -0.09556405 [10,] -1.61446618 1.0605298 0.5160358 0.04152571 > m[10,1:8] <- NA > m [,1] [,2] [,3] [,4] [,5] [,6] [1,] -1.0326191 1.09744204 0.9923254 -0.05780237 1.6853566 -0.5938021 [2,] -0.6493561 -0.58846041 0.8735639 0.34492342 -0.1398261 1.4288108 [3,] -1.0020073 0.75130128 -2.6110435 1.27265445 0.1211387 0.7048981 [4,] -0.1658810 0.45351434 -0.8973168 -0.17738084 -0.1056792 -1.7251339 [5,] 0.1466563 0.11917823 0.9372353 0.29040600 0.8463049 0.9192848 [6,] 0.6020565 -0.90338771 -0.7453363 -1.34284821 -0.7684490 0.2177409 [7,] 0.5290555 0.58798246 0.4085396 0.63305003 0.2014624 -0.5613248 [8,] 1.4456958 0.06372875 0.1829127 0.20681971 0.5745696 -0.3555856 [9,] 0.5973093 -0.35483585 1.1074023 0.63930734 -1.2452399 -1.2721422 [10,] NA NA NA NA NA NA [,7] [,8] [,9] [,10] [1,] -1.63852314 -1.0773165 0.5601368 1.05115476 [2,] -0.14026278 -0.9013605 0.1581475 0.36730440 [3,] 0.45517561 -1.5211124 -1.1641732 1.97321531 [4,] 0.08338336 1.4846938 0.3096862 0.44513675 [5,] 0.85917332 1.0337033 -0.1784938 -0.48848017 [6,] 0.05054810 1.3712665 -0.6545246 0.10251154 [7,] 2.30894410 -0.6089214 1.5761573 0.66912925 [8,] -0.85946317 0.0855971 -0.7014037 -2.19050881 [9,] 1.53911617 1.1185075 0.2428764 -0.09556405 [10,] NA NA 0.5160358 0.04152571 > sampdist=dist(t(m)) > sclus=hclust(sampdist) # sclus is a dendrogram that you can plot(sclus) > genedist=dist(m) > gclus=hclust(genedist) # gclus is also a dendrogram > heatmap(m,Rowv=gclus,Colv=sclus) #this doesn't work! Error in lV + rV : non-numeric argument to binary operator > heatmap(m,Rowv=as.dendrogram(gclus),Colv=as.dendrogram(sclus)) # need proper coercion for this to work Although this works, note that using a gene that has 16 NA values out of 22 is probably not going to be useful, as the distance matrix for this example for the genes is: > genedist 1 2 3 4 5 6 7 8 2 3.673241 3 5.235695 4.536603 4 4.381494 4.522069 5.046200 5 4.367649 2.821795 5.437622 3.688942 6 5.408318 3.863713 5.380546 3.014530 3.345877 7 4.764409 3.915998 5.194822 3.911820 3.548220 4.830247 8 4.825510 4.216357 6.212646 4.149383 3.314914 3.844966 5.041345 9 5.536079 4.169987 6.179576 3.158424 3.249127 3.637840 3.149486 4.264858 10 2.259752 1.082164 5.724739 1.013612 1.953558 2.621002 2.754763 5.685128 9 2 3 4 5 6 7 8 9 10 0.6834093 See how much different the distance involving row 10 is from the others--the NA values were simply dropped. You will probably have to either deal with the missing values beforehand or use another distance measure that is not sensitive to NA values. I can't tell you what to do on that part, as that is also somewhat dependent on your need to use that gene's data and the practicality of doing more experiments. Hope that helps. Sean -- Clinical Fellow Pediatric Oncology Johns Hopkins/ National Institutes of Health NCI/NHGRI -- On 11/10/03 7:41 AM, "Marcus" <marcusb@biotech.kth.se> wrote: >> Hello again. Back from some weeks of laborative work I still have some >> questions on clustering in R. >> >> My problem is that I have some spots flagges as NA in a matrix of M-values >> organised slidewise. I want to cluster those but I get error messages when >> using heatmap due to the NA:s in the matrix. I mailed Andy Liaw (who wrote >> the heatmap function) and he gave med the tip to look into the daisy >> function. And the daisy function is supposed to handle NA:s. >> >> But what do you get out of the function? >> >> test <- daisy(mymatrix) >> This creates an object of type dissimilarity right? And you can convert it >> into a matrix with the help of >> testII <- as.matrix(test) >> Is this what I should use hclust on? or should I do >> testIII <- as.dist(testII) before. Neither works so I do not know really >> what is true. >> >> And I tried to use daisy directly with heatmap but that didnt work but >> produced the same error as with dist. >> >> heatmap(mymatrix[1:22,], distfun = dist) >> Error in hclustfun(distfun(x)) : NA/NaN/Inf in foreign function call (arg 11) >> This is due to the fact that I only have 2 M-values in the twentisecond >> row and 16 NA:s. >> >> So basically my question is, how do you do to get heatmap to work with a >> matrix of M-values that has got spots flagged NA in them ? What distance >> function works and how do you use it?