heatmap for high number of genes

0

Entering edit mode

avehna ▴ 240

@avehna-3930

Last seen 9.6 years ago

Hi All: I would like to display all the differentially expressed genes that I got for 4 different treatments (respect to the control) onto a heatmap plot, in such a way that they're ordered respect to their expression values. My questions are: 1- Is there a way to make a heatmap for around 10000 genes (the "union" set from all differentially expressed genes) without taking so long in Bioconductor ('cause my computer is getting frozen). 2- How could I order the gene expression profiles from high to low expression values. (I guess in this case I should take into account one of the treatment). I'd like to get a beautiful heatmap from red to blue (for example). Looking forward to hearing from you soon! Yours, Avhena [[alternative HTML version deleted]]

• 3.3k views

ADD COMMENT • link updated 14.2 years ago by Steve Lianoglou ★ 13k • written 14.2 years ago by avehna ▴ 240

0

Entering edit mode

Gilbert Feng ▴ 300

@gilbert-feng-3778

Last seen 9.6 years ago

Hello, BioC forks, I notice that org.Hs.egMAPCOUNTS reports that org.Hs.egGO2EG is 8245. Are these 8245 genes are unique or, do all of GO terms contain 8245 human genes (could be counted many times)? Actually, I wonder how many unique human genes in GO and its subdirectories, BP, MF and CC. Is there any function to retrieve such information easily or I have to write several lines to do that? Thanks a lot! Gilbert

ADD COMMENT • link 14.2 years ago Gilbert Feng ▴ 300

0

Entering edit mode

Hi Gilbert, To answer questions like this you might want to look at the manual page for count.mappedRkeys() and count.mappedLkeys(). These can be used to count the number of mapped keys in a mapping. # An example to give you an idea: count.mappedRkeys(org.Hs.egGO2EG) # This gives us: [1] 8245 # That is the same number as you saw in the MAPCOUNTS, But what exactly did we just count? # Lets look at one key just to see: mappedRkeys(org.Hs.egGO2EG[1]) # So you can see that we were just found that 8245 GO terms are matched onto some gene! [1] "GO:0000002" # So how can we see how many genes are matched onto GO terms?? count.mappedLkeys(org.Hs.egGO2EG) # which comed out to be 17673 [1] 17673 # And just to verify that the Lkeys are entrez gene IDs: mappedLkeys(org.Hs.egGO2EG[1]) You may have noticed that the Lkeys are still the entrez gene IDs in both the "reversed" maps as well as the "forward" maps. This is because we are using "undirected" methods as described in the manual page. So when using such methods you should pay attention to what you are counting as this can seem confusing when used with a "reversed" such as a "GO2EG" mapping. You might expect that the Lkeys and Rkeys would switch places, but for these methods they have not. Hope this clarifies things a bit, Marc On 03/02/2010 01:19 PM, Gilbert Feng wrote: > Hello, BioC forks, > > I notice that org.Hs.egMAPCOUNTS reports that org.Hs.egGO2EG is 8245. Are > these 8245 genes are unique or, do all of GO terms contain 8245 human genes > (could be counted many times)? Actually, I wonder how many unique human > genes in GO and its subdirectories, BP, MF and CC. Is there any function to > retrieve such information easily or I have to write several lines to do > that? > > Thanks a lot! > > Gilbert > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD REPLY • link 14.2 years ago Marc Carlson ★ 7.2k

0

Entering edit mode

Hi Gilbert, It's not a good idea to start a new thread by picking up a random post and pressing the "Reply" button. Then your post will show up in the middle of an existing thread and people will most likely ignore it. Gilbert Feng wrote: > Hello, BioC forks, > > I notice that org.Hs.egMAPCOUNTS reports that org.Hs.egGO2EG is 8245. As explained in the man page for org.Hs.egMAPCOUNTS (see ?org.Hs.egMAPCOUNTS), this is just the number of "keys" that are mapped. In the case of the org.Hs.egGO2EG map, since this map is actually the reverse of the org.Hs.egGO map, that means that the keys are on the right side of the Gene-to-GO mapping, or, said otherwise, that the keys are GO ids, not genes. So this means that the Human genes in org.Hs.eg.db are mapped to 8245 distinct go terms: > count.mappedRkeys(org.Hs.egGO2EG) [1] 8245 > count.mappedRkeys(org.Hs.egGO) [1] 8245 Note that those 2 maps only hold the GO terms that are linked to at least 1 gene. > Are > these 8245 genes are unique or, do all of GO terms contain 8245 human genes > (could be counted many times)? None of them. > Actually, I wonder how many unique human > genes in GO and its subdirectories, BP, MF and CC. Number of Human genes in org.Hs.eg.db that are mapped to at least 1 GO term: > count.mappedkeys(org.Hs.egGO) [1] 17673 For the BP, MF and CC ontologies, if you are familiar with GO it should be easy for you to find the GO ids for the 3 top-level nodes of each ontology: GO:0008150 for BP, GO:0003674 for MF and GO:0005575 for CC. Then you can count the number of Human genes that are mapped to at least 1 GO term in the BP ontology with: > count.mappedLkeys(org.Hs.egGO2ALLEGS["GO:0008150"]) [1] 14221 Cheers, H. > Is there any function to > retrieve such information easily or I have to write several lines to do > that? > > Thanks a lot! > > Gilbert > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319

ADD REPLY • link 14.2 years ago Hervé Pagès 16k

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 14 months ago

United States

Hi Avehna, On Tue, Mar 2, 2010 at 4:01 PM, avehna <avhena at="" gmail.com=""> wrote: > Hi All: > > I would like to display all the differentially expressed genes that I got > for 4 different treatments (respect to the control) onto a heatmap plot, in > such a way that they're ordered respect to their expression values. My > questions are: > > 1- Is there a way to make a heatmap for around 10000 genes (the "union" set > from all differentially expressed genes) without taking so long in > Bioconductor ('cause my computer is getting frozen). > 2- How could I order the gene expression profiles from high to low > expression values. (I guess in this case I should take into account one of > the treatment). I'd like to get a beautiful heatmap from red to blue (for > example). The heatmap is taking so long because its calculating the pairwise similarity of the genes (rows) of your matrix in order to group/cluster them. It sounds like you don't *want* heatmap to cluster the genes, because you want them to be displayed in a very specific and pre-determined order (high to low ... something?), so calculating this row-wise clustering is exactly what you don't want to do. You didn't mention which heatmap function you're using, but lets assume you're using the gplots::heatmap.2 function, you can set its "Rowv" parameter to FALSE, and it won't (shouldn't) reorder cluster your matrix by rows. Just ensure that you reorder the rows of your input matrix (let's call it X) the way you want, then just: R> heatmap.2(X, Rowv=FALSE, ...) That should go much faster. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD COMMENT • link 14.2 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Hi Steve: Thank you so much for your message. Yes, I'm using "heatmap.2" to display my genes. Do you know about any function in R to rearrange a matrix based on the "high to low"-ordering of just one column (treatment)? Best Regards, Avhena. Do you know about any other function in R On Tue, Mar 2, 2010 at 4:37 PM, Steve Lianoglou < mailinglist.honeypot@gmail.com> wrote: > Hi Avehna, > > On Tue, Mar 2, 2010 at 4:01 PM, avehna <avhena@gmail.com> wrote: > > Hi All: > > > > I would like to display all the differentially expressed genes that I got > > for 4 different treatments (respect to the control) onto a heatmap plot, > in > > such a way that they're ordered respect to their expression values. My > > questions are: > > > > 1- Is there a way to make a heatmap for around 10000 genes (the "union" > set > > from all differentially expressed genes) without taking so long in > > Bioconductor ('cause my computer is getting frozen). > > 2- How could I order the gene expression profiles from high to low > > expression values. (I guess in this case I should take into account one > of > > the treatment). I'd like to get a beautiful heatmap from red to blue (for > > example). > > The heatmap is taking so long because its calculating the pairwise > similarity of the genes (rows) of your matrix in order to > group/cluster them. > > It sounds like you don't *want* heatmap to cluster the genes, because > you want them to be displayed in a very specific and pre-determined > order (high to low ... something?), so calculating this row-wise > clustering is exactly what you don't want to do. > > You didn't mention which heatmap function you're using, but lets > assume you're using the gplots::heatmap.2 function, you can set its > "Rowv" parameter to FALSE, and it won't (shouldn't) reorder cluster > your matrix by rows. > > Just ensure that you reorder the rows of your input matrix (let's call > it X) the way you want, then just: > > R> heatmap.2(X, Rowv=FALSE, ...) > > That should go much faster. > > -steve > > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact<http: cbio.mskc="" c.org="" %7elianos="" contact=""> > [[alternative HTML version deleted]]

ADD REPLY • link 14.2 years ago avehna ▴ 240

0

Entering edit mode

Hi, On Tue, Mar 2, 2010 at 5:07 PM, avehna <avhena at="" gmail.com=""> wrote: > Hi Steve: > > Thank you so much for your message. > > Yes, I'm using "heatmap.2" to display my genes. Do you know about any > function in R to rearrange a matrix based on the "high to low"-ordering of > just one column (treatment)? You should look at the "order" function. Type "?order" at the R prompt (w/o the quotes). If your second column is your treatment column, then something like: o <- order(X[,2], decreasing=TRUE) X <- X[o,] Now the rows of X will be reordered by decreasing order (from column 2). -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 14.2 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

An example: library(gplots) library(geneplotter) myd <- matrix(rnorm(40000), nrow=10000) # sort low to high on column 2 l2h <- myd[order(myd[,2]),] # sort high to low on column 2 h2l <- myd[order(myd[,2],decreasing=TRUE),] heatmap.2(l2h, Rowv=FALSE, Colv=FALSE, dendrogram="none", scale="none", col=greenred.colors(2000), trace="none", key=FALSE) x11() heatmap.2(h2l, Rowv=FALSE, Colv=FALSE, dendrogram="none", scale="none", col=dChip.colors(2000), trace="none", key=FALSE) ________________________________________ From: bioconductor-bounces@stat.math.ethz.ch [bioconductor- bounces@stat.math.ethz.ch] On Behalf Of avehna [avhena@gmail.com] Sent: 02 March 2010 22:07 To: Steve Lianoglou; bioconductor at stat.math.ethz.ch Subject: Re: [BioC] heatmap for high number of genes Hi Steve: Thank you so much for your message. Yes, I'm using "heatmap.2" to display my genes. Do you know about any function in R to rearrange a matrix based on the "high to low"-ordering of just one column (treatment)? Best Regards, Avhena. Do you know about any other function in R On Tue, Mar 2, 2010 at 4:37 PM, Steve Lianoglou < mailinglist.honeypot at gmail.com> wrote: > Hi Avehna, > > On Tue, Mar 2, 2010 at 4:01 PM, avehna <avhena at="" gmail.com=""> wrote: > > Hi All: > > > > I would like to display all the differentially expressed genes that I got > > for 4 different treatments (respect to the control) onto a heatmap plot, > in > > such a way that they're ordered respect to their expression values. My > > questions are: > > > > 1- Is there a way to make a heatmap for around 10000 genes (the "union" > set > > from all differentially expressed genes) without taking so long in > > Bioconductor ('cause my computer is getting frozen). > > 2- How could I order the gene expression profiles from high to low > > expression values. (I guess in this case I should take into account one > of > > the treatment). I'd like to get a beautiful heatmap from red to blue (for > > example). > > The heatmap is taking so long because its calculating the pairwise > similarity of the genes (rows) of your matrix in order to > group/cluster them. > > It sounds like you don't *want* heatmap to cluster the genes, because > you want them to be displayed in a very specific and pre-determined > order (high to low ... something?), so calculating this row-wise > clustering is exactly what you don't want to do. > > You didn't mention which heatmap function you're using, but lets > assume you're using the gplots::heatmap.2 function, you can set its > "Rowv" parameter to FALSE, and it won't (shouldn't) reorder cluster > your matrix by rows. > > Just ensure that you reorder the rows of your input matrix (let's call > it X) the way you want, then just: > > R> heatmap.2(X, Rowv=FALSE, ...) > > That should go much faster. > > -steve > > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact<http: cbio.mskc="" c.org="" %7elianos="" contact=""> > [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 14.2 years ago michael watson IAH-C ★ 3.4k

0

Entering edit mode

Thanks a lot guys.... Your messages have been very helpful. I'm going to do that!. Yours, Avhena. On Tue, Mar 2, 2010 at 5:30 PM, michael watson (IAH-C) < michael.watson@bbsrc.ac.uk> wrote: > An example: > > library(gplots) > library(geneplotter) > > myd <- matrix(rnorm(40000), nrow=10000) > > # sort low to high on column 2 > l2h <- myd[order(myd[,2]),] > > # sort high to low on column 2 > h2l <- myd[order(myd[,2],decreasing=TRUE),] > > heatmap.2(l2h, Rowv=FALSE, Colv=FALSE, dendrogram="none", > scale="none", col=greenred.colors(2000), > trace="none", key=FALSE) > > x11() > heatmap.2(h2l, Rowv=FALSE, Colv=FALSE, dendrogram="none", > scale="none", col=dChip.colors(2000), > trace="none", key=FALSE) > ________________________________________ > From: bioconductor-bounces@stat.math.ethz.ch [ > bioconductor-bounces@stat.math.ethz.ch] On Behalf Of avehna [ > avhena@gmail.com] > Sent: 02 March 2010 22:07 > To: Steve Lianoglou; bioconductor@stat.math.ethz.ch > Subject: Re: [BioC] heatmap for high number of genes > > Hi Steve: > > Thank you so much for your message. > > Yes, I'm using "heatmap.2" to display my genes. Do you know about any > function in R to rearrange a matrix based on the "high to low"-ordering of > just one column (treatment)? > > Best Regards, > Avhena. > > Do you know about any other function in R > > On Tue, Mar 2, 2010 at 4:37 PM, Steve Lianoglou < > mailinglist.honeypot@gmail.com> wrote: > > > Hi Avehna, > > > > On Tue, Mar 2, 2010 at 4:01 PM, avehna <avhena@gmail.com> wrote: > > > Hi All: > > > > > > I would like to display all the differentially expressed genes that I > got > > > for 4 different treatments (respect to the control) onto a heatmap > plot, > > in > > > such a way that they're ordered respect to their expression values. My > > > questions are: > > > > > > 1- Is there a way to make a heatmap for around 10000 genes (the "union" > > set > > > from all differentially expressed genes) without taking so long in > > > Bioconductor ('cause my computer is getting frozen). > > > 2- How could I order the gene expression profiles from high to low > > > expression values. (I guess in this case I should take into account one > > of > > > the treatment). I'd like to get a beautiful heatmap from red to blue > (for > > > example). > > > > The heatmap is taking so long because its calculating the pairwise > > similarity of the genes (rows) of your matrix in order to > > group/cluster them. > > > > It sounds like you don't *want* heatmap to cluster the genes, because > > you want them to be displayed in a very specific and pre- determined > > order (high to low ... something?), so calculating this row-wise > > clustering is exactly what you don't want to do. > > > > You didn't mention which heatmap function you're using, but lets > > assume you're using the gplots::heatmap.2 function, you can set its > > "Rowv" parameter to FALSE, and it won't (shouldn't) reorder cluster > > your matrix by rows. > > > > Just ensure that you reorder the rows of your input matrix (let's call > > it X) the way you want, then just: > > > > R> heatmap.2(X, Rowv=FALSE, ...) > > > > That should go much faster. > > > > -steve > > > > > > -- > > Steve Lianoglou > > Graduate Student: Computational Systems Biology > > | Memorial Sloan-Kettering Cancer Center > > | Weill Medical College of Cornell University > > Contact Info: http://cbio.mskcc.org/~lianos/contact<http: cbio.ms="" kcc.org="" %7elianos="" contact=""> > <http: cbio.mskcc.org="" %7elianos="" contact=""> > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 14.2 years ago avehna ▴ 240

0

Entering edit mode

Hi guys again, I would like to do now a *hierarchical clustering* using hclust (for 10000 genes) based on the gene expression *correlation* matrix and I'm getting an "out of memory" error. Do you have any idea about how could I get my clusters in a reasonable time without getting this error? (maybe there is another bioconductor library that I could use more efficiently) Thank you so much! Avhena. On Tue, Mar 2, 2010 at 5:38 PM, avehna <avhena@gmail.com> wrote: > Thanks a lot guys.... Your messages have been very helpful. I'm going to do > that!. > > Yours, > Avhena. > > > On Tue, Mar 2, 2010 at 5:30 PM, michael watson (IAH-C) < > michael.watson@bbsrc.ac.uk> wrote: > >> An example: >> >> library(gplots) >> library(geneplotter) >> >> myd <- matrix(rnorm(40000), nrow=10000) >> >> # sort low to high on column 2 >> l2h <- myd[order(myd[,2]),] >> >> # sort high to low on column 2 >> h2l <- myd[order(myd[,2],decreasing=TRUE),] >> >> heatmap.2(l2h, Rowv=FALSE, Colv=FALSE, dendrogram="none", >> scale="none", col=greenred.colors(2000), >> trace="none", key=FALSE) >> >> x11() >> heatmap.2(h2l, Rowv=FALSE, Colv=FALSE, dendrogram="none", >> scale="none", col=dChip.colors(2000), >> trace="none", key=FALSE) >> ________________________________________ >> From: bioconductor-bounces@stat.math.ethz.ch [ >> bioconductor-bounces@stat.math.ethz.ch] On Behalf Of avehna [ >> avhena@gmail.com] >> Sent: 02 March 2010 22:07 >> To: Steve Lianoglou; bioconductor@stat.math.ethz.ch >> Subject: Re: [BioC] heatmap for high number of genes >> >> Hi Steve: >> >> Thank you so much for your message. >> >> Yes, I'm using "heatmap.2" to display my genes. Do you know about any >> function in R to rearrange a matrix based on the "high to low"-ordering of >> just one column (treatment)? >> >> Best Regards, >> Avhena. >> >> Do you know about any other function in R >> >> On Tue, Mar 2, 2010 at 4:37 PM, Steve Lianoglou < >> mailinglist.honeypot@gmail.com> wrote: >> >> > Hi Avehna, >> > >> > On Tue, Mar 2, 2010 at 4:01 PM, avehna <avhena@gmail.com> wrote: >> > > Hi All: >> > > >> > > I would like to display all the differentially expressed genes that I >> got >> > > for 4 different treatments (respect to the control) onto a heatmap >> plot, >> > in >> > > such a way that they're ordered respect to their expression values. My >> > > questions are: >> > > >> > > 1- Is there a way to make a heatmap for around 10000 genes (the >> "union" >> > set >> > > from all differentially expressed genes) without taking so long in >> > > Bioconductor ('cause my computer is getting frozen). >> > > 2- How could I order the gene expression profiles from high to low >> > > expression values. (I guess in this case I should take into account >> one >> > of >> > > the treatment). I'd like to get a beautiful heatmap from red to blue >> (for >> > > example). >> > >> > The heatmap is taking so long because its calculating the pairwise >> > similarity of the genes (rows) of your matrix in order to >> > group/cluster them. >> > >> > It sounds like you don't *want* heatmap to cluster the genes, because >> > you want them to be displayed in a very specific and pre- determined >> > order (high to low ... something?), so calculating this row-wise >> > clustering is exactly what you don't want to do. >> > >> > You didn't mention which heatmap function you're using, but lets >> > assume you're using the gplots::heatmap.2 function, you can set its >> > "Rowv" parameter to FALSE, and it won't (shouldn't) reorder cluster >> > your matrix by rows. >> > >> > Just ensure that you reorder the rows of your input matrix (let's call >> > it X) the way you want, then just: >> > >> > R> heatmap.2(X, Rowv=FALSE, ...) >> > >> > That should go much faster. >> > >> > -steve >> > >> > >> > -- >> > Steve Lianoglou >> > Graduate Student: Computational Systems Biology >> > | Memorial Sloan-Kettering Cancer Center >> > | Weill Medical College of Cornell University >> > Contact Info: http://cbio.mskcc.org/~lianos/contact<http: cbio.m="" skcc.org="" %7elianos="" contact=""> >> <http: cbio.mskcc.org="" %7elianos="" contact=""> >> > >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]]

ADD REPLY • link 14.2 years ago avehna ▴ 240

Login before adding your answer.