heatmap.2 and makeContrasts

0

Entering edit mode

Supriya Munshaw ▴ 40

@supriya-munshaw-4253

Last seen 11.4 years ago

Hi all, I had 2 questions for you reg. using R and Bioconductor. Question 1: I'm using heatmap.2 to make a heatmap for my top differentially expressed genes. I also create a dendogram for my columns that clusters by sample. However, is there a way to modify these dendograms? For example, if you look at the color coding in the attached heatmap, I have clustered by 2 regions. But if you look closely, there is no reason that the dendogram can't be flipped so that the green sections align i.e. the first blue section from the left can be flipped with the second green section from the left which would keep the same information but provide a better visual representation of the clustering. Does anyone know how I can do this? Question 2: My phenotype data file looks like this Patient Disease State Tissue A D T1 A D T2 B D T1 B D T2 C N T1 C N T2 D N T1 D N T2 So the first comparison I want to make is between disease and non disease in all tissues. I can do that in 2 ways: Option 1: desMat <- model.matrix(~0+ DiseaseState) colnames(desMat) <- levels(DiseaseState) contMat <- makeContrasts(D-N, levels= colnames(desMat)) # I'm assuming this groups all disease states in one group and all non disease states in another, without regard to patient, treating each sample independently, which is fine. Option 2: Combine<-factor(paste(DiseaseState,Tissue,sep=".") #So now my states are D.T1, D.T2, N.T1, N.T2 desMat <- model.matrix(~0+ Combine) colnames(desMat) <- levels(Combine) contMat <- makeContrasts(((D.T1+D.T2)/2)- ((N.T1+N.T2)/2), levels= colnames(desMat)) Shouldn't option 1 and 2 give me the same answer? In my case, it does not and I'm not sure I understand why. I would really appreciate any help. Thank you! -------------- next part -------------- A non-text attachment was scrubbed... Name: heatmap.png Type: image/png Size: 46718 bytes Desc: heatmap.png URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20110302="" f27a8517="" attachment.png="">

Clustering Clustering • 1.3k views

ADD COMMENT • link updated 14.9 years ago by Wolfgang Huber ★ 13k • written 14.9 years ago by Supriya Munshaw ▴ 40

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 15 hours ago

United States

Hi Supriya, On 3/2/2011 10:44 AM, Supriya Munshaw wrote: > Hi all, I had 2 questions for you reg. using R and Bioconductor. > > Question 1: I'm using heatmap.2 to make a heatmap for my top > differentially expressed genes. I also create a dendogram for my > columns that clusters by sample. However, is there a way to modify > these dendograms? For example, if you look at the color coding in the > attached heatmap, I have clustered by 2 regions. But if you look > closely, there is no reason that the dendogram can't be flipped so > that the green sections align i.e. the first blue section from the > left can be flipped with the second green section from the left which > would keep the same information but provide a better visual > representation of the clustering. Does anyone know how I can do > this? > I don't think it is easily done. You might be able to hack at the hclust() code or output to give what you want, but it won't be via a simple argument to hclust(). > Question 2: > > My phenotype data file looks like this > > Patient > > Disease State > > Tissue > > A > > D > > T1 > > A > > D > > T2 > > B > > D > > T1 > > B > > D > > T2 > > C > > N > > T1 > > C > > N > > T2 > > D > > N > > T1 > > D > > N > > T2 > > > So the first comparison I want to make is between disease and non > disease in all tissues. I can do that in 2 ways: > > Option 1: desMat<- model.matrix(~0+ DiseaseState) colnames(desMat)<- > levels(DiseaseState) contMat<- makeContrasts(D-N, levels= > colnames(desMat)) # I'm assuming this groups all disease states in > one group and all non disease states in another, without regard to > patient, treating each sample independently, which is fine. > > Option 2: Combine<-factor(paste(DiseaseState,Tissue,sep=".") #So > now my states are D.T1, D.T2, N.T1, N.T2 desMat<- model.matrix(~0+ > Combine) colnames(desMat)<- levels(Combine) contMat<- > makeContrasts(((D.T1+D.T2)/2)- ((N.T1+N.T2)/2), levels= > colnames(desMat)) > > Shouldn't option 1 and 2 give me the same answer? In my case, it does > not and I'm not sure I understand why. No it should not. You are asking two subtly different questions in each case. In option 1 you are ignoring any differences between the tissues and asking if there is a difference between disease states. In option 2 you are accounting for the tissue differences and then asking if there is a difference between the disease states. This comes from how the denominator of the t-statistic is constructed. Note that in simple terms the denominator is an average of the variability within groups being compared. In option 1, you are computing the variability within the diseased group and normal group separately and then averaging them. In option 2 you are computing variability within the D.T1, D.T2, N.T1, N.T2 groups separately and then averaging. So if the tissues are quite different in expression, but are consistent within each disease state/tissue type, then you will tend to get significance in option2 but not option 1. As an example: D.T1 = c(4.5,4.3,4.7,4.2) D.T2 = c(6.4,5.8,6.0,5.8) N.T1 = c(6.5,6.3,6.1,6.6) N.T2 = c(7.3,7.2,7.0,7.5) Here you can see that the within-group variability is very small, but if you pool the diseased and normal samples, the variability will increase quite a bit, and may well no longer be significant. Best, Jim > > I would really appreciate any help. Thank you! > > > > _______________________________________________ Bioconductor mailing > list Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD COMMENT • link 14.9 years ago James W. MacDonald 68k

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 15 hours ago

United States

Hi Supriya, Please don't take things off-list. We hope that people can use the list archives to answer questions, and if you take questions off-list it subverts that function. On 3/3/2011 11:55 AM, Supriya Munshaw wrote: > Hi Jim, Thank you so much for your response. It was very helpful! > Adding a little bit to question 2, > > If I have 2 diseased patients (A and B) and 2 non-diseased patients > (C and D), I should get the same result for setting up my matrix as > Disease-NonDisease and (A+B)-(C+D) if there are no inter-group > differences, which is an important assumption I make when I group > them. But if I don't get the same result, it means that the > within-group differences exist and cannot be ignored. In this case, > can I find differences between disease and non disease by setting up > the constrast as (A-B)-(C-D)? Does this make sense? Let's set aside the fact that you can't do statistics without replication (so your example won't work), and assume you have replicates for A-D. If so, then what you are asking about is usually called an interaction, and it is designed to detect exactly the situation you describe. There is more than one example of this type of analysis in the Limma User's Guide, so you should look there for more information. But long story short, yes that makes sense. > > I'm new to microarray statistical analysis, so sorry for the dumb > questions. But thank you for your responses! There is no crime in ignorance. But there is danger, so it would be in your interest to (at the very least) read about linear modeling, especially ANOVA, so you have some theoretical understanding of what you are doing. Best, Jim > > > -----Original Message----- From: James W. MacDonald > [mailto:jmacdon at med.umich.edu] Sent: Thursday, March 03, 2011 11:33 > AM To: Supriya Munshaw Cc: bioconductor at stat.math.ethz.ch Subject: > Re: [BioC] heatmap.2 and makeContrasts > > Hi Supriya, > > On 3/2/2011 10:44 AM, Supriya Munshaw wrote: >> Hi all, I had 2 questions for you reg. using R and Bioconductor. >> >> Question 1: I'm using heatmap.2 to make a heatmap for my top >> differentially expressed genes. I also create a dendogram for my >> columns that clusters by sample. However, is there a way to modify >> these dendograms? For example, if you look at the color coding in >> the attached heatmap, I have clustered by 2 regions. But if you >> look closely, there is no reason that the dendogram can't be >> flipped so that the green sections align i.e. the first blue >> section from the left can be flipped with the second green section >> from the left which would keep the same information but provide a >> better visual representation of the clustering. Does anyone know >> how I can do this? >> > > I don't think it is easily done. You might be able to hack at the > hclust() code or output to give what you want, but it won't be via a > simple argument to hclust(). > > >> Question 2: >> >> My phenotype data file looks like this >> >> Patient >> >> Disease State >> >> Tissue >> >> A >> >> D >> >> T1 >> >> A >> >> D >> >> T2 >> >> B >> >> D >> >> T1 >> >> B >> >> D >> >> T2 >> >> C >> >> N >> >> T1 >> >> C >> >> N >> >> T2 >> >> D >> >> N >> >> T1 >> >> D >> >> N >> >> T2 >> >> >> So the first comparison I want to make is between disease and non >> disease in all tissues. I can do that in 2 ways: >> >> Option 1: desMat<- model.matrix(~0+ DiseaseState) >> colnames(desMat)<- levels(DiseaseState) contMat<- >> makeContrasts(D-N, levels= colnames(desMat)) # I'm assuming this >> groups all disease states in one group and all non disease states >> in another, without regard to patient, treating each sample >> independently, which is fine. >> >> Option 2: Combine<-factor(paste(DiseaseState,Tissue,sep=".") #So >> now my states are D.T1, D.T2, N.T1, N.T2 desMat<- model.matrix(~0+ >> Combine) colnames(desMat)<- levels(Combine) contMat<- >> makeContrasts(((D.T1+D.T2)/2)- ((N.T1+N.T2)/2), levels= >> colnames(desMat)) >> >> Shouldn't option 1 and 2 give me the same answer? In my case, it >> does not and I'm not sure I understand why. > > No it should not. You are asking two subtly different questions in > each case. In option 1 you are ignoring any differences between the > tissues and asking if there is a difference between disease states. > In option 2 you are accounting for the tissue differences and then > asking if there is a difference between the disease states. > > This comes from how the denominator of the t-statistic is > constructed. Note that in simple terms the denominator is an average > of the variability within groups being compared. In option 1, you are > computing the variability within the diseased group and normal group > separately and then averaging them. In option 2 you are computing > variability within the D.T1, D.T2, N.T1, N.T2 groups separately and > then averaging. > > So if the tissues are quite different in expression, but are > consistent within each disease state/tissue type, then you will tend > to get significance in option2 but not option 1. As an example: > > D.T1 = c(4.5,4.3,4.7,4.2) D.T2 = c(6.4,5.8,6.0,5.8) N.T1 = > c(6.5,6.3,6.1,6.6) N.T2 = c(7.3,7.2,7.0,7.5) > > Here you can see that the within-group variability is very small, but > if you pool the diseased and normal samples, the variability will > increase quite a bit, and may well no longer be significant. > > Best, > > Jim > > > > >> >> I would really appreciate any help. Thank you! >> >> >> >> _______________________________________________ Bioconductor >> mailing list Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the >> archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD COMMENT • link 14.9 years ago James W. MacDonald 68k

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 3 months ago

EMBL European Molecular Biology Laborat…

Dear Supriya I assume you are refering to the heatmap.2 function in the gplots package. Have a look at its man page and in particular its argument 'Colv'. Best wishes Wolfgang Il Mar/2/11 4:44 PM, Supriya Munshaw ha scritto: > Hi all, I had 2 questions for you reg. using R and Bioconductor. > > Question 1: I'm using heatmap.2 to make a heatmap for my top > differentially expressed genes. I also create a dendogram for my > columns that clusters by sample. However, is there a way to modify > these dendograms? For example, if you look at the color coding in the > attached heatmap, I have clustered by 2 regions. But if you look > closely, there is no reason that the dendogram can't be flipped so > that the green sections align i.e. the first blue section from the > left can be flipped with the second green section from the left which > would keep the same information but provide a better visual > representation of the clustering. Does anyone know how I can do > this? > > Question 2: > > My phenotype data file looks like this > > Patient > > Disease State > > Tissue > > A > > D > > T1 > > A > > D > > T2 > > B > > D > > T1 > > B > > D > > T2 > > C > > N > > T1 > > C > > N > > T2 > > D > > N > > T1 > > D > > N > > T2 > > > So the first comparison I want to make is between disease and non > disease in all tissues. I can do that in 2 ways: > > Option 1: desMat<- model.matrix(~0+ DiseaseState) colnames(desMat)<- > levels(DiseaseState) contMat<- makeContrasts(D-N, levels= > colnames(desMat)) # I'm assuming this groups all disease states in > one group and all non disease states in another, without regard to > patient, treating each sample independently, which is fine. > > Option 2: Combine<-factor(paste(DiseaseState,Tissue,sep=".") #So > now my states are D.T1, D.T2, N.T1, N.T2 desMat<- model.matrix(~0+ > Combine) colnames(desMat)<- levels(Combine) contMat<- > makeContrasts(((D.T1+D.T2)/2)- ((N.T1+N.T2)/2), levels= > colnames(desMat)) > > Shouldn't option 1 and 2 give me the same answer? In my case, it does > not and I'm not sure I understand why. > > I would really appreciate any help. Thank you! > > > > _______________________________________________ Bioconductor mailing > list Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber

ADD COMMENT • link 14.9 years ago Wolfgang Huber ★ 13k

Login before adding your answer.