HEATMAP on LARGE DATA
2
0
Entering edit mode
@mark-salsburg-1360
Last seen 9.6 years ago
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20060313/ 24b34128/attachment.pl
• 844 views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 12 weeks ago
United States
On 3/13/06 21:08, "mark salsburg" <mark.salsburg at="" gmail.com=""> wrote: > I am having trouble getting the function heatmap() to work on the following > gene expression > >> dim(SAMPLES_log) > [1] 12626 20 > > > sample1 sample2...................sample20 > gen1 > gen2 > gen3 > .... > gen12626 > > > > I have converted SAMPLES_log to a numeric matrix using: > > as.matrix(SAMPLES_log) > > when I use the following command: > > heatmap(SAMPLES_log) > > Error: cannot allocate vector of size 622668 Kb > In addition: Warning messages: > 1: Reached total allocation of 1022Mb: see help(memory.size) > 2: Reached total allocation of 1022Mb: see help(memory.size) Mark, In order to do a heatmap on 12000 genes, a triangular matrix of size 12000x12000/2 needs to be calculated. This is large and will often result in the out-of-memory error that you see. I don't often find that clustering that many genes is meaningful in any major way, particularly since you will be including a large number of genes that do not vary in the samples. If you really need to do this, I would suggest that you use an external program like cluster/treeview, as they may be somewhat less memory-hungry than R (but I haven't tested that directly). > Is there some library in BioConductor that will allow me to output a > heatmap. I want to compare the expression of the first 10 samples with the > last 10 samples. If you want to do an unsupervised clustering of samples, use just hclust. If you want to do an unsupervised clustering of samples AND genes, I would suggest reducing the number of genes using a filter for genes that show variability (by using, say, the top 500 genes when sorted by coefficient of variation, for example). In other words, there is no need to include a gene in a heatmap that is the same for all samples. Ultimately, though, if you want to compare gene expression in two groups of samples, you are asking a question that is best answered using a supervised method, like a t-test. There are numerous ways to do a t-test between two groups including the limma, siggenes, and multtest packages. Hope that helps. Sean
ADD COMMENT
0
Entering edit mode
Hi, For large data sets, hcluster will requires twice less memory than hclust (package amap). For even larger data sets, you can use xcluster program from Gavin Sherlock http://genetics. stanford.edu/~sherlock/cluster.html Package ctc has all tools dialog with this [free] software. And for visualization, I recommend TreeView or Freeview http://magix.fri.uni-lj.si/freeview But exploration on very large tree should be analysed carefully as each branch could be switch with another one like that: --- A == --- A +- B +- C + C + B Regards, Antoine Lucas. Le Mon, 13 Mar 2006 22:22:53 -0500 Sean Davis <sdavis2 at="" mail.nih.gov=""> a ?crit: > > > > On 3/13/06 21:08, "mark salsburg" <mark.salsburg at="" gmail.com=""> wrote: > > > I am having trouble getting the function heatmap() to work on the following > > gene expression > > > >> dim(SAMPLES_log) > > [1] 12626 20 > > > > > > sample1 sample2...................sample20 > > gen1 > > gen2 > > gen3 > > .... > > gen12626 > > > > > > > > I have converted SAMPLES_log to a numeric matrix using: > > > > as.matrix(SAMPLES_log) > > > > when I use the following command: > > > > heatmap(SAMPLES_log) > > > > Error: cannot allocate vector of size 622668 Kb > > In addition: Warning messages: > > 1: Reached total allocation of 1022Mb: see help(memory.size) > > 2: Reached total allocation of 1022Mb: see help(memory.size) > > Mark, > > In order to do a heatmap on 12000 genes, a triangular matrix of size > 12000x12000/2 needs to be calculated. This is large and will often result > in the out-of-memory error that you see. I don't often find that clustering > that many genes is meaningful in any major way, particularly since you will > be including a large number of genes that do not vary in the samples. If > you really need to do this, I would suggest that you use an external program > like cluster/treeview, as they may be somewhat less memory-hungry than R > (but I haven't tested that directly). > > > Is there some library in BioConductor that will allow me to output a > > heatmap. I want to compare the expression of the first 10 samples with the > > last 10 samples. > > If you want to do an unsupervised clustering of samples, use just hclust. > > If you want to do an unsupervised clustering of samples AND genes, I would > suggest reducing the number of genes using a filter for genes that show > variability (by using, say, the top 500 genes when sorted by coefficient of > variation, for example). In other words, there is no need to include a gene > in a heatmap that is the same for all samples. > > Ultimately, though, if you want to compare gene expression in two groups of > samples, you are asking a question that is best answered using a supervised > method, like a t-test. There are numerous ways to do a t-test between two > groups including the limma, siggenes, and multtest packages. > > Hope that helps. > > Sean > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > -- Antoine Lucas Centre de g?n?tique Mol?culaire, CNRS 91198 Gif sur Yvette Cedex Tel: (33)1 69 82 38 89 Fax: (33)1 69 82 38 77
ADD REPLY
0
Entering edit mode
jon butchar ▴ 50
@jon-butchar-1365
Last seen 9.6 years ago
On Monday 13 March 2006 21:08, mark salsburg wrote: > I am having trouble getting the function heatmap() to work on the following > gene expression > > > dim(SAMPLES_log) > > [1] 12626 20 > > > sample1 sample2...................sample20 > gen1 > gen2 > gen3 > .... > gen12626 > > > > I have converted SAMPLES_log to a numeric matrix using: > > as.matrix(SAMPLES_log) > > when I use the following command: > > heatmap(SAMPLES_log) > > Error: cannot allocate vector of size 622668 Kb > In addition: Warning messages: > 1: Reached total allocation of 1022Mb: see help(memory.size) > 2: Reached total allocation of 1022Mb: see help(memory.size) > > > > Is there some library in BioConductor that will allow me to output a > heatmap. I want to compare the expression of the first 10 samples with the > last 10 samples. > > I have tried running that command in a Linux environment, also with no > success > > thank you, > > [[alternative HTML version deleted]] Mark, along with the good stuff Sean Davis mentioned, maybe you could think about upgrading your computer hardware in the near future. You can get hardware that supports 64-bit memory addressing and put in 4 GB RAM, all for about $3k. That's relatively little compared to what it costs to run 20 chips. fwiw, I've compared a 32-bit system against a 64-bit system (both with 4 GB RAM), and can heartily recommend just going straight for a 64-bit system (hardware _and_ operating system); just fewer headaches. The number of chips you run will probably only increase during the next several years and, as you've discovered, lack of system resources can make you lose quite a lot of valuable time. Best of luck, jon butchar
ADD COMMENT

Login before adding your answer.

Traffic: 622 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6