Think you need to filter those genes as you almost certainly have too
much noise . Suggest you filter the genes: if you are doing class
discovery perhaps use the ratio of mean/sd - choose ones which the
most
variation and used about 500-1000 ish . The heatmap won't be readable
in
any case ....I would perhaps try principle components/ spectral
decomposition like:
the.pca <- prcomp(data,scale = TRUE) # for samples/genes (try using
attributes(the.pca )
dim(the.pca$x)
### estimate PCA's you need
the.pca.var <- round(the.pca$sdev^2 / sum(the.pca$sdev^2)*100,2)
plot(c(1:length(the.pca.var)),the.pca.var,type="b",xlab="#
components",ylab="% variance",main="Scree Plot for
Hits",col="red",cex=1.5,cex.lab=1.5)
savePlot("scree plot.jpeg",type="jpeg")
centers<-15
the.cl<-kmeans(the.pca$x[,1:2],centers=centers,iter.max=1000) #Do
kmeans
colours <- rainbow(centers)
##2D
plot(range(the.pca$x[,1]),range(the.pca
$x[,2]),xlab="PCA1",ylab="PCA2",main="Spectral clustering of
differential hits")
text(the.pca$x[,1],the.pca$x[,2],label=rownames(the.pca
$x),col=colours[the.cl$cluster],cex=0.75)
library(scatterplot3d)
### 3D
s3d<-scatterplot3d(range(the.pca$x[,1]),range(the.pca
$x[,2]),range(the.pca
$x[,3]),xlab="PCA1",ylab="PCA2",zlab="PCA3",main="Spectral clustering
of
differential hits",angle=120)
text(s3d$xyz.convert(the.pca$x[,1],the.pca$x[,2],the.pca
$x[,3]),label=rownames(the.pca$x),col=colours[the.cl$cluster],cex=0.75
)
points(s3d$xyz.convert(the.pca$x[wanted,1],the.pca$x[wanted,2],the.pca
$x[wanted,3]),col=color,cex=5.0)
Otherwise if you have class labels SAM or PAM.
Hope that helps
Cheers
Paul
-----Original Message-----
From: Gaston Fiore <gaston.fiore@gmail.com>
To: bioconductor@stat.math.ethz.ch
Subject: [BioC] Heatmap with 7120x500 array
Date: Fri, 27 Aug 2010 15:39:37 -0400
Hello everyone,
I'm trying to produce a heat map that clusters 7120 genes into 6
groups based on 500 conditions. I'm using kmeans and then image, but
I've two problems. The first one is that kmeans sometimes doesn't
converge even with 10 restarts, and the second one is that the image
produced is basically all read (I'm using the standard color scheme),
not to mention it's size is massive and very hard to deal with. Does
anyone have any suggestions on how I could accomplish this task
efficiently, or is this data just too big to cluster?
Thanks a lot,
-Gaston
_______________________________________________
Bioconductor mailing list
Bioconductor@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
[[alternative HTML version deleted]]