Marcus,
Here is a fairly general method for working with heatmap that I have
used.
You can substitute any function that you want for distance (eg.,
1-correlation, etc.) and for clustering (don't have to use hclust).
Make
sure that you do the coercion (to distance or dendrogram objects as
needed),
though. Also, some distance functions that you can dream up will not
work
with NA's, but dist does.
> m <- matrix(rnorm(100),nrow=10,ncol=10)
> m
[,1] [,2] [,3] [,4] [,5]
[,6]
[1,] -1.0326191 1.09744204 0.9923254 -0.05780237 1.6853566
-0.5938021
[2,] -0.6493561 -0.58846041 0.8735639 0.34492342 -0.1398261
1.4288108
[3,] -1.0020073 0.75130128 -2.6110435 1.27265445 0.1211387
0.7048981
[4,] -0.1658810 0.45351434 -0.8973168 -0.17738084 -0.1056792
-1.7251339
[5,] 0.1466563 0.11917823 0.9372353 0.29040600 0.8463049
0.9192848
[6,] 0.6020565 -0.90338771 -0.7453363 -1.34284821 -0.7684490
0.2177409
[7,] 0.5290555 0.58798246 0.4085396 0.63305003 0.2014624
-0.5613248
[8,] 1.4456958 0.06372875 0.1829127 0.20681971 0.5745696
-0.3555856
[9,] 0.5973093 -0.35483585 1.1074023 0.63930734 -1.2452399
-1.2721422
[10,] 1.2563169 0.92249574 -0.7103717 -0.41067056 0.2277188
0.3861969
[,7] [,8] [,9] [,10]
[1,] -1.63852314 -1.0773165 0.5601368 1.05115476
[2,] -0.14026278 -0.9013605 0.1581475 0.36730440
[3,] 0.45517561 -1.5211124 -1.1641732 1.97321531
[4,] 0.08338336 1.4846938 0.3096862 0.44513675
[5,] 0.85917332 1.0337033 -0.1784938 -0.48848017
[6,] 0.05054810 1.3712665 -0.6545246 0.10251154
[7,] 2.30894410 -0.6089214 1.5761573 0.66912925
[8,] -0.85946317 0.0855971 -0.7014037 -2.19050881
[9,] 1.53911617 1.1185075 0.2428764 -0.09556405
[10,] -1.61446618 1.0605298 0.5160358 0.04152571
> m[10,1:8] <- NA
> m
[,1] [,2] [,3] [,4] [,5]
[,6]
[1,] -1.0326191 1.09744204 0.9923254 -0.05780237 1.6853566
-0.5938021
[2,] -0.6493561 -0.58846041 0.8735639 0.34492342 -0.1398261
1.4288108
[3,] -1.0020073 0.75130128 -2.6110435 1.27265445 0.1211387
0.7048981
[4,] -0.1658810 0.45351434 -0.8973168 -0.17738084 -0.1056792
-1.7251339
[5,] 0.1466563 0.11917823 0.9372353 0.29040600 0.8463049
0.9192848
[6,] 0.6020565 -0.90338771 -0.7453363 -1.34284821 -0.7684490
0.2177409
[7,] 0.5290555 0.58798246 0.4085396 0.63305003 0.2014624
-0.5613248
[8,] 1.4456958 0.06372875 0.1829127 0.20681971 0.5745696
-0.3555856
[9,] 0.5973093 -0.35483585 1.1074023 0.63930734 -1.2452399
-1.2721422
[10,] NA NA NA NA NA
NA
[,7] [,8] [,9] [,10]
[1,] -1.63852314 -1.0773165 0.5601368 1.05115476
[2,] -0.14026278 -0.9013605 0.1581475 0.36730440
[3,] 0.45517561 -1.5211124 -1.1641732 1.97321531
[4,] 0.08338336 1.4846938 0.3096862 0.44513675
[5,] 0.85917332 1.0337033 -0.1784938 -0.48848017
[6,] 0.05054810 1.3712665 -0.6545246 0.10251154
[7,] 2.30894410 -0.6089214 1.5761573 0.66912925
[8,] -0.85946317 0.0855971 -0.7014037 -2.19050881
[9,] 1.53911617 1.1185075 0.2428764 -0.09556405
[10,] NA NA 0.5160358 0.04152571
> sampdist=dist(t(m))
> sclus=hclust(sampdist) # sclus is a dendrogram that you can
plot(sclus)
> genedist=dist(m)
> gclus=hclust(genedist) # gclus is also a dendrogram
> heatmap(m,Rowv=gclus,Colv=sclus) #this doesn't work!
Error in lV + rV : non-numeric argument to binary operator
> heatmap(m,Rowv=as.dendrogram(gclus),Colv=as.dendrogram(sclus)) #
need proper
coercion for this to work
Although this works, note that using a gene that has 16 NA values out
of 22
is probably not going to be useful, as the distance matrix for this
example
for the genes is:
> genedist
1 2 3 4 5 6 7
8
2 3.673241
3 5.235695 4.536603
4 4.381494 4.522069 5.046200
5 4.367649 2.821795 5.437622 3.688942
6 5.408318 3.863713 5.380546 3.014530 3.345877
7 4.764409 3.915998 5.194822 3.911820 3.548220 4.830247
8 4.825510 4.216357 6.212646 4.149383 3.314914 3.844966 5.041345
9 5.536079 4.169987 6.179576 3.158424 3.249127 3.637840 3.149486
4.264858
10 2.259752 1.082164 5.724739 1.013612 1.953558 2.621002 2.754763
5.685128
9
2
3
4
5
6
7
8
9
10 0.6834093
See how much different the distance involving row 10 is from the
others--the
NA values were simply dropped. You will probably have to either deal
with
the missing values beforehand or use another distance measure that is
not
sensitive to NA values. I can't tell you what to do on that part, as
that
is also somewhat dependent on your need to use that gene's data and
the
practicality of doing more experiments.
Hope that helps.
Sean
--
Clinical Fellow
Pediatric Oncology
Johns Hopkins/
National Institutes of Health
NCI/NHGRI
--
On 11/10/03 7:41 AM, "Marcus" <marcusb@biotech.kth.se> wrote:
>> Hello again. Back from some weeks of laborative work I still have
some
>> questions on clustering in R.
>>
>> My problem is that I have some spots flagges as NA in a matrix of
M-values
>> organised slidewise. I want to cluster those but I get error
messages when
>> using heatmap due to the NA:s in the matrix. I mailed Andy Liaw
(who wrote
>> the heatmap function) and he gave med the tip to look into the
daisy
>> function. And the daisy function is supposed to handle NA:s.
>>
>> But what do you get out of the function?
>>
>> test <- daisy(mymatrix)
>> This creates an object of type dissimilarity right? And you can
convert it
>> into a matrix with the help of
>> testII <- as.matrix(test)
>> Is this what I should use hclust on? or should I do
>> testIII <- as.dist(testII) before. Neither works so I do not know
really
>> what is true.
>>
>> And I tried to use daisy directly with heatmap but that didnt work
but
>> produced the same error as with dist.
>>
>> heatmap(mymatrix[1:22,], distfun = dist)
>> Error in hclustfun(distfun(x)) : NA/NaN/Inf in foreign function
call (arg 11)
>> This is due to the fact that I only have 2 M-values in the
twentisecond
>> row and 16 NA:s.
>>
>> So basically my question is, how do you do to get heatmap to work
with a
>> matrix of M-values that has got spots flagged NA in them ? What
distance
>> function works and how do you use it?