Limma nestedF

0

Entering edit mode

noel0925@sbcglobal.net ▴ 90

@noel0925sbcglobalnet-1574

Last seen 9.7 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20061010/ 08576707/attachment.pl

• 804 views

ADD COMMENT • link updated 17.6 years ago by James W. MacDonald 65k • written 17.6 years ago by noel0925@sbcglobal.net ▴ 90

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 1 hour ago

United States

Hi Noelle, noel0925 at sbcglobal.net wrote: > Hi All, > > I am wondering if someone can explain when it is appropriate to use > the nestedF method of the decideTests function in Limma? > >> From the manual I am aware that this adjusts down genes then across >> contrasts and that this strategy is recommended for complex >> experiments with many contrasts (like mine) since it may be >> "desirable to select genes firstly on the basis of their moderated >> F-statistic, and subsequently decide which of the individual >> contrasts are significant for the selected gene. > > I am interested in identifying genes that are differentially > expressed in many contrasts- in particular across genotypes that are > fairly similar, across treatments, and in various genotype:treatment > combinations (ie interaction effects). I expect that the genotype > effect will alter the same genes for some of the samples. I also > expect that the treatment effect will alter some of the same genes > across all the genotypes. > > Is this an appropriate situation in which to use a nestedF test (I > will also correct for multiple testing using "BH")? Is it correct to > form a contrast matrix for all my contrasts of interest (including > interaction effects) and test them all simultaneously using > decideTests(x, method="nestedF", method.adjust="BH", p=0.05)? I am not sure the nestedF method is appropriate for this situation, because you have interaction terms. When there is an interaction term in the model, the usual thing to do is to check for significance of the interaction term, and if it isn't significant, then you would drop it from the model and check for significance of the main effects terms. The nestedF method won't do this - it will treat all the contrasts as if equal. Another issue that arises with microarray analyses is the usefulness of main effects in general. As Gordon mentions in the Limma User's Guide, the main effects are usually not that interesting for microarray data, and are often nonsensical. As an example, say you have wild type and knockout mice that you treated with a control vehicle and a drug. Usually one would be interested in finding those genes that are affected differently in the KO and WT mice when treated with the drug (i.e., the interaction). It is usually not interesting to ask which genes are differentially expressed in treated vs untreated mice regardless of KO/WT status (treatment main effect), or which genes are differentially expressed in WT vs KO mice regardless of treatment status (mouse main effect). Using the nestedF approach with various interactions and main effects looks (to me) like a 'shotgun' approach to the analysis. I think you would be much better served to approach the analysis in a stepwise manner, testing each particular hypothesis separately. Best, Jim > > I have done so and as can be expected this returned a far larger > number of DE genes (compared to decideTests(method = "separate", > method.adjust="BH", p=0.05). In fact, the number of genes called > significant by this approach for some of my contrasts is quite alot ~ > 5,000/24,000 (only ~ 100 for others) and perahps more than I perhaps > want use in for example other analyses such as GeneSet enrichment or > GO analyses, or heat maps. > > If I wish to only consider a smaller number of genes, is it more > correct in a statistical sense to use a more stringent p-value cutoff > after performing decideTests(x, method= "nestedF", > method.adjust="BH", p=0.0001) or to consider the contrasts separately > and use a larger p-value cutoff? Clearly, some of these tests are not > independent, so I am inclined to go with the nestedF approach and a > more stingent cutoff. > > I have read through other postings, and found this one especially > helpful: > https://stat.ethz.ch/pipermail/bioconductor/2006-March/012182.html > but am still uncertain about my approach. Any comments would be > appreciated. > > Thanks in advance, Noelle > > [[alternative HTML version deleted]] > > _______________________________________________ Bioconductor mailing > list Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.

ADD COMMENT • link 17.6 years ago James W. MacDonald 65k

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 1 hour ago

United States

Hi Noelle, First, please don't take things off-list. The list archives are a good resource for people to search, so taking questions/answers off-list reduces the information content. noel0925 at sbcglobal.net wrote: > Hi Jim, > > Thank you very much for your response. I am only a novice at these > types of analyses and am trying to get a handle on it all. > > So in this particular experiment there are six gentoypes (a wild- type > and 5 different mutants due to point mutations in the same gene). One > of the goals of this experiment is to be able to rank the mutants > according to their severity or the difference in their expression > profile compared to the wild-type in treatment 1 and treatment 2. It > is also of interest to see how the various mutants respond to the two > treatments, and finally it might also be interesting to consider the > interaction term (genotype*treatment). Thus, in this study the two > main effects are of interest, as are the interaction effects. Well, you don't have any classical main effects here. If you study section 8.7 in the Limma User's Guide, you will see what I mean. In fact the comparisons you show below are exactly what Gordon argues one should be making. Since there are no classical main effects (i.e., treatment without regard to WT/MT status, and WT vs MT without regard to treatment), then you don't really have to worry about interactions superceding main effects. > > Here is my contrast matrix: > > cont.matrix<- makeContrasts( WT.Treat.Effect=WT.T2 - WT.T1, > > Mt1.Geno.Effect.T1=Mt1.T1 - WT.T1, Mt2.Geno.Effect.T1=Mt2.T1 - WT.T1, > Mt3.Geno.Effect.T1=Mt3.T1 - WT.T1, Mt4.Geno.Effect.T1=Mt4.T1 - > WT.T1, Mt5.Geno.Effect.T1=Mt5.T1 - WT.T1, > > Mt1.Geno.Effect.T2=Mt1.T2 - WT.T2, Mt2.Geno.Effect.T2=Mt2.T2 - WT.T2, > Mt3.Geno.Effect.T2=Mt3.T2 - WT.T2, Mt4.Geno.Effect.T2=Mt4.T2 - > WT.T2, Mt5.Geno.Effect.T2=Mt5.T2 - WT.T2, > > Mt1.Treat.Effect=Mt1.T2 - Mt1.T1, Mt2.Treat.Effect=Mt2.T2 - Mt2.T1, > Mt3.Treat.Effect=Mt3.T2 - Mt3.T1, Mt4.Treat.Effect=Mt4.T2 - Mt4.T1, > Mt5.Treat.Effect=Mt5.T2 - Mt5.T1, > > Mt1.Int.Effect=(Mt1.T2 - Mt1.T1) - (WT.T2 - WT.T1), > Mt2.Int.Effect=(Mt2.T2 - Mt2.T1) - (WT.T2 - WT.T1), > Mt3.Int.Effect=(Mt3.T2 - Mt3.T1) - (WT.T2 - WT.T1), > Mt4.Int.Effect=(Mt4.T2 - Mt4.T1) - (WT.T2 - WT.T1), > Mt5.Int.Effect=(Mt5.T2 - Mt5.T1) - (WT.T2 - WT.T1), levels = design) > > Does the fact that main effects are of interest in any way alter how > you would tackle this problem? I am eager to hear your opinion! This experiment reminds me of a certain type of client that I see. When we talk about their experiment and ask if a certain comparison is of interest, they nod their head vigorously and say 'yes!'. Unfortunately, they say the same thing regardless of the comparison, so I end up making every possible comparison I could think of, and I send the results off wondering how they are ever going to get anything useful from the experiment (not because the comps are bad per se, but because of the deluge of data I just buried them with). This is the downside IMO of microarray experiments. They tend to be very expensive, so people often want to wring every last bit of information possible out of a given experiment, regardless of how interesting a bit of information may be. I suppose one could analyze the data once, output all the significant terms, and then look at them at their leisure, but I would prefer a more hypothesis/goal driven approach such as your first statement. If you want to rank the mutants according to their severity, how would one do that? Is severity a genotypic or phenotypic quantity? If genotypic, is it measured by the number of differentially expressed genes when compared to normal, or can you rank based on the particular genes that are differentially expressed? Thinking about what you really want from the data and how you will measure that quantity IMO is a better way to go. Anyway, enough ranting ;-D > > As regards handling interaction terms, I am aware of the concept of > testing for interaction terms and dropping them if they are not > significant, but how is this done in Limma or for microarray analyses > in general- is there a package in BioC that does this? Also, if as > you say the interaction terms are usually the only "interesting" > contrasts in most microaray experiments like the knockout example you > gave- then what do analysts usually do to handle this? It seems the > examples I read in the literature never mention testing for the > significance of interaction terms. Is there a refrerence for this you > can point me to? I am eager to learn. If you were to do something like this (as mentioned, it's not necessary), you would have to do it by hand. I don't think there is any functionality in limma for this, probably because a classical main effect is usually not sensical in the context of a microarray analysis. > > I for example, looked at the Weaver mutant data a 2X2 factorial > experiment for 2 color data found on page 75 of the Limma Users > Manual. While my data is single channel, it is likewise a factorial > design (6 X 2) and hence similar from that standpoint as well as that > both studies consider main effects and interaction terms. The factorial design in section 8.7 is probably a better example to look at. > > Also, you suggested using a stepwise approach and testing each > particular hypothesis separately- do you mean within Limma using > decideTests with the method "separate" or does this entail something > else? I am still uncertain how to implement this. > > Finally, if it is not too much to ask how does "separate" differ from > "global"? The Limma manual says that global will treat the entire > matrix of t-statistics as a single vector of unrelated tests- I > assume this means independent tests. It seems that "separate" does > the same since it treats each coefficient separately- obviously they > are not the same, but I have missed out on recognizing the > difference. The difference lies in the way multiplicity is handled. If you use the 'global' option, then you will have 21 x number of genes p-values that will then be adjusted for multiple comparisons. If you use the 'separate' option, then you will adjust for multiplicity for each contrast as if the other contrasts were not made, which is more powerful but may lead to more false positives. Best, Jim > > > Thank you again, Noelle > > > > > > > "James W. MacDonald" <jmacdon at="" med.umich.edu=""> wrote: Hi Noelle, > > noel0925 at sbcglobal.net wrote: > >> Hi All, >> >> I am wondering if someone can explain when it is appropriate to use >> the nestedF method of the decideTests function in Limma? >> >> >>> From the manual I am aware that this adjusts down genes then >>> across contrasts and that this strategy is recommended for >>> complex experiments with many contrasts (like mine) since it may >>> be "desirable to select genes firstly on the basis of their >>> moderated F-statistic, and subsequently decide which of the >>> individual contrasts are significant for the selected gene. >> >> I am interested in identifying genes that are differentially >> expressed in many contrasts- in particular across genotypes that >> are fairly similar, across treatments, and in various >> genotype:treatment combinations (ie interaction effects). I expect >> that the genotype effect will alter the same genes for some of the >> samples. I also expect that the treatment effect will alter some of >> the same genes across all the genotypes. >> >> Is this an appropriate situation in which to use a nestedF test (I >> will also correct for multiple testing using "BH")? Is it correct >> to form a contrast matrix for all my contrasts of interest >> (including interaction effects) and test them all simultaneously >> using decideTests(x, method="nestedF", method.adjust="BH", p=0.05)? >> > > > I am not sure the nestedF method is appropriate for this situation, > because you have interaction terms. When there is an interaction term > in the model, the usual thing to do is to check for significance of > the interaction term, and if it isn't significant, then you would > drop it from the model and check for significance of the main effects > terms. The nestedF method won't do this - it will treat all the > contrasts as if equal. > > Another issue that arises with microarray analyses is the usefulness > of main effects in general. As Gordon mentions in the Limma User's > Guide, the main effects are usually not that interesting for > microarray data, and are often nonsensical. > > As an example, say you have wild type and knockout mice that you > treated with a control vehicle and a drug. Usually one would be > interested in finding those genes that are affected differently in > the KO and WT mice when treated with the drug (i.e., the > interaction). It is usually not interesting to ask which genes are > differentially expressed in treated vs untreated mice regardless of > KO/WT status (treatment main effect), or which genes are > differentially expressed in WT vs KO mice regardless of treatment > status (mouse main effect). > > Using the nestedF approach with various interactions and main effects > looks (to me) like a 'shotgun' approach to the analysis. I think you > would be much better served to approach the analysis in a stepwise > manner, testing each particular hypothesis separately. > > Best, > > Jim > > > > >> I have done so and as can be expected this returned a far larger >> number of DE genes (compared to decideTests(method = "separate", >> method.adjust="BH", p=0.05). In fact, the number of genes called >> significant by this approach for some of my contrasts is quite alot >> ~ 5,000/24,000 (only ~ 100 for others) and perahps more than I >> perhaps want use in for example other analyses such as GeneSet >> enrichment or GO analyses, or heat maps. >> >> If I wish to only consider a smaller number of genes, is it more >> correct in a statistical sense to use a more stringent p-value >> cutoff after performing decideTests(x, method= "nestedF", >> method.adjust="BH", p=0.0001) or to consider the contrasts >> separately and use a larger p-value cutoff? Clearly, some of these >> tests are not independent, so I am inclined to go with the nestedF >> approach and a more stingent cutoff. >> >> I have read through other postings, and found this one especially >> helpful: >> https://stat.ethz.ch/pipermail/bioconductor/2006-March/012182.html >> but am still uncertain about my approach. Any comments would be >> appreciated. >> >> Thanks in advance, Noelle >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ Bioconductor >> mailing list Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the >> archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.

ADD COMMENT • link 17.6 years ago James W. MacDonald 65k

0

Entering edit mode

Hi Jim, Thanks so much for the helpful pointers and explanation- it is much more clear to me now. Thanks, Noelle --- "James W. MacDonald" <jmacdon at="" med.umich.edu=""> wrote: > Hi Noelle, > > First, please don't take things off-list. The list > archives are a good > resource for people to search, so taking > questions/answers off-list > reduces the information content. > > noel0925 at sbcglobal.net wrote: > > Hi Jim, > > > > Thank you very much for your response. I am only a > novice at these > > types of analyses and am trying to get a handle on > it all. > > > > So in this particular experiment there are six > gentoypes (a wild-type > > and 5 different mutants due to point mutations in > the same gene). One > > of the goals of this experiment is to be able to > rank the mutants > > according to their severity or the difference in > their expression > > profile compared to the wild-type in treatment 1 > and treatment 2. It > > is also of interest to see how the various mutants > respond to the two > > treatments, and finally it might also be > interesting to consider the > > interaction term (genotype*treatment). Thus, in > this study the two > > main effects are of interest, as are the > interaction effects. > > Well, you don't have any classical main effects > here. If you study > section 8.7 in the Limma User's Guide, you will see > what I mean. In fact > the comparisons you show below are exactly what > Gordon argues one should > be making. Since there are no classical main effects > (i.e., treatment > without regard to WT/MT status, and WT vs MT without > regard to > treatment), then you don't really have to worry > about interactions > superceding main effects. > > > > > Here is my contrast matrix: > > > > cont.matrix<- makeContrasts( WT.Treat.Effect=WT.T2 > - WT.T1, > > > > Mt1.Geno.Effect.T1=Mt1.T1 - WT.T1, > Mt2.Geno.Effect.T1=Mt2.T1 - WT.T1, > > Mt3.Geno.Effect.T1=Mt3.T1 - WT.T1, > Mt4.Geno.Effect.T1=Mt4.T1 - > > WT.T1, Mt5.Geno.Effect.T1=Mt5.T1 - WT.T1, > > > > Mt1.Geno.Effect.T2=Mt1.T2 - WT.T2, > Mt2.Geno.Effect.T2=Mt2.T2 - WT.T2, > > Mt3.Geno.Effect.T2=Mt3.T2 - WT.T2, > Mt4.Geno.Effect.T2=Mt4.T2 - > > WT.T2, Mt5.Geno.Effect.T2=Mt5.T2 - WT.T2, > > > > Mt1.Treat.Effect=Mt1.T2 - Mt1.T1, > Mt2.Treat.Effect=Mt2.T2 - Mt2.T1, > > Mt3.Treat.Effect=Mt3.T2 - Mt3.T1, > Mt4.Treat.Effect=Mt4.T2 - Mt4.T1, > > Mt5.Treat.Effect=Mt5.T2 - Mt5.T1, > > > > Mt1.Int.Effect=(Mt1.T2 - Mt1.T1) - (WT.T2 - > WT.T1), > > Mt2.Int.Effect=(Mt2.T2 - Mt2.T1) - (WT.T2 - > WT.T1), > > Mt3.Int.Effect=(Mt3.T2 - Mt3.T1) - (WT.T2 - > WT.T1), > > Mt4.Int.Effect=(Mt4.T2 - Mt4.T1) - (WT.T2 - > WT.T1), > > Mt5.Int.Effect=(Mt5.T2 - Mt5.T1) - (WT.T2 - > WT.T1), levels = design) > > > > Does the fact that main effects are of interest in > any way alter how > > you would tackle this problem? I am eager to hear > your opinion! > > This experiment reminds me of a certain type of > client that I see. When > we talk about their experiment and ask if a certain > comparison is of > interest, they nod their head vigorously and say > 'yes!'. Unfortunately, > they say the same thing regardless of the > comparison, so I end up making > every possible comparison I could think of, and I > send the results off > wondering how they are ever going to get anything > useful from the > experiment (not because the comps are bad per se, > but because of the > deluge of data I just buried them with). > > This is the downside IMO of microarray experiments. > They tend to be very > expensive, so people often want to wring every last > bit of information > possible out of a given experiment, regardless of > how interesting a bit > of information may be. > > I suppose one could analyze the data once, output > all the significant > terms, and then look at them at their leisure, but I > would prefer a more > hypothesis/goal driven approach such as your first > statement. If you > want to rank the mutants according to their > severity, how would one do > that? Is severity a genotypic or phenotypic > quantity? If genotypic, is > it measured by the number of differentially > expressed genes when > compared to normal, or can you rank based on the > particular genes that > are differentially expressed? Thinking about what > you really want from > the data and how you will measure that quantity IMO > is a better way to > go. Anyway, enough ranting ;-D > > > > > As regards handling interaction terms, I am aware > of the concept of > > testing for interaction terms and dropping them if > they are not > > significant, but how is this done in Limma or for > microarray analyses > > in general- is there a package in BioC that does > this? Also, if as > > you say the interaction terms are usually the only > "interesting" > > contrasts in most microaray experiments like the > knockout example you > > gave- then what do analysts usually do to handle > this? It seems the > > examples I read in the literature never mention > testing for the > > significance of interaction terms. Is there a > refrerence for this you > > can point me to? I am eager to learn. > > If you were to do something like this (as mentioned, > it's not > necessary), you would have to do it by hand. I don't > think there is any > functionality in limma for this, probably because a > classical main > effect is usually not sensical in the context of a > microarray analysis. > > > > > I for example, looked at the Weaver mutant data a > 2X2 factorial > > experiment for 2 color data found on page 75 of > the Limma Users > > Manual. While my data is single channel, it is > likewise a factorial > > design (6 X 2) and hence similar from that > standpoint as well as that > > both studies consider main effects and interaction > terms. > > The factorial design in section 8.7 is probably a > better example to look at. > > > > > Also, you suggested using a stepwise approach and > testing each > > particular hypothesis separately- do you mean > within Limma using > > decideTests with the method "separate" or does > this entail something > > else? I am still uncertain how to implement this. > > > > Finally, if it is not too much to ask how does > "separate" differ from > > "global"? The Limma manual says that global will > treat the entire > > matrix of t-statistics as a single vector of > unrelated tests- I > > assume this means independent tests. It seems that > "separate" does > > the same since it treats each coefficient > separately- obviously they > > are not the same, but I have missed out on > recognizing the > > difference. > > === message truncated ===

ADD REPLY • link 17.6 years ago noel0925@sbcglobal.net ▴ 90

Login before adding your answer.