0
9.0 years ago by
sabrina.shao220
sabrina.shao220 wrote:
Hello, everyone: I have a question related to conceptual understanding of lmFit. I have the following experiment that I want to conduct, but I am not sure which is the right way to use design matrix and contrasts. Here is the experiment: say I have 3 different strains that are genetically different, A, B and C where A is the control. I also have two different treatments, T1 and T2. For each strain, I have 4 arrays for each treatment, so in total, I have 24 arrays. What I want to find out is the significantly differentially expressed genes for the following comparison: 1) for control strain A: T1 vs T2 2) under T1, B vs. A (control) 3) under T1, C vs. A 4) for B, T1 vs T2 5) for C, T1 vs T2 6) interaction term of A and B , T1 and T2 7) interaction term of A and C, T1 and T2. There are two ways I could use lmFit One is: for the design matrix, I will include all 3 strains and 2 conditions, I use the following code: A_T1, A_T2, B_T1, B_T2, C_T1, C_T2 sample1: 1 ,0 ,0, 0, 0 , 0 sample2 : Then make a contrast matrix and follow the code below: fitGene<-lmFit(gene,design=design,weights=arrayWt); fitGene2<-contrasts.fit(fitGene,cont.matrix) fitGene2<-eBayes(fitGene2,proportion=p); Two: Instead of using all samples at one time to fit into a lmFit function, I use two design matrix only involves A and B, T1 and T2, and second design matrix that involves A and C, T1 and T2, and make contrast matrix and fit separately. and later on I can compare these two results if I want to. The question I have is: which one is the right one? For the first method, I will have large DOF , and much lower p-values, but it was testing the same thing as the second one, so am I creating an artifact? Thanks for your help! Sabrina [[alternative HTML version deleted]]
• 836 views
modified 9.0 years ago • written 9.0 years ago by sabrina.shao220
0
9.0 years ago by
Jenny Drnevich1.9k
United States
Jenny Drnevich1.9k wrote:
Hi Sabrina, First, a little list ettiquette. If you don't get a response to a post within a day, it's not considered polite to just repost the same question verbatim the next day under a different Subject. Second: your question isn't specific to the modeling of lmFit. Instead, it's a general statistical question about why it's better to one ANOVA model instead of a series of t-tests. I suggest you consult a basic statistical textbook or a local statistician to find the answer. Cheers, Jenny At 10:39 AM 1/21/2010, sabrina s wrote: >Hello, everyone: > >I have a question related to conceptual understanding of lmFit. > >I have the following experiment that I want to conduct, but I am not sure >which is the right way to use design matrix and contrasts. Here is the >experiment: > >say I have 3 different strains that are genetically different, A, B and C >where A is the control. I also have two different treatments, > T1 and T2. For each strain, I have 4 arrays for each treatment, so in >total, I have 24 arrays. What I want to find out is the significantly >differentially expressed genes for the following comparison: >1) for control strain A: T1 vs T2 >2) under T1, B vs. A (control) >3) under T1, C vs. A >4) for B, T1 vs T2 >5) for C, T1 vs T2 >6) interaction term of A and B , T1 and T2 >7) interaction term of A and C, T1 and T2. > >There are two ways I could use lmFit > >One is: > >for the design matrix, I will include all 3 strains and 2 conditions, >I use the following code: > A_T1, A_T2, B_T1, B_T2, C_T1, C_T2 >sample1: 1 ,0 ,0, 0, 0 , 0 >sample2 : > >Then make a contrast matrix and follow the code below: > >fitGene<-lmFit(gene,design=design,weights=arrayWt); > fitGene2<-contrasts.fit(fitGene,cont.matrix) >fitGene2<-eBayes(fitGene2,proportion=p); > > >Two: >Instead of using all samples at one time to fit into a lmFit function, I use >two design matrix only involves A and B, T1 and T2, >and second design matrix that involves A and C, T1 and T2, and make contrast >matrix and fit separately. and later on I can compare these two >results if I want to. > > > >The question I have is: which one is the right one? For the first method, I >will have large DOF , and much lower p-values, but it was testing the >same thing as the second one, so am I creating an artifact? Thanks for >your help! > > > > >Sabrina > > [[alternative HTML version deleted]] > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at illinois.edu
Dear Sabrina, Experienced members of the group will have better things to say but here is my $0.25. As a statistician - I would prefer Design 1. The reason is - that data should never be ignored. Also, more the data, Limma can take more advantage of this information in the Empirical Bayesian Estimation of S.D. Lower p-values are because of this fact. (Taking less data might result in inflated SDs which can also result in lower p-values.) Comparing Differential expression and Fold Change is like comparing Apple and oranges. Differential expression has nothing to do with low fold change. As a statistician, I would always trust differential expression than Fold-Change. If you think that fold-change is important for you then you should select the differentially expressed genes ONLY if their log fold-change is above say 2. you can do this in limma using topTable and/or decideTests. Pls correct me if I am wrong. Thx S. On Thu, Jan 21, 2010 at 1:32 PM, sabrina s <sabrina.shao@gmail.com> wrote: > Hi, Jenny: > Thanks for the quick reply. And thanks for pointing out about posting. I > thought maybe my subject was not good enough to be noticed and that is why > I > posted again. This is my first post, so long way to go! > Regarding your second point: I don't think my question is a general one > about why ANOVA is better than a series of t-tests. I actually did both, > but > realized that the result from one single model ( use all samples) gave me > much lower p-values, but when I looked at the expression value, the fold > change was nothing , like 0.5. That is why I wonder if the inflated DOF > gave > me much low p-values. Any thoughts on that? > > Thanks! > > Sabrina > > On Thu, Jan 21, 2010 at 12:05 PM, Jenny Drnevich <drnevich@illinois.edu> >wrote: > > > Hi Sabrina, > > > > First, a little list ettiquette. If you don't get a response to a post > > within a day, it's not considered polite to just repost the same question > > verbatim the next day under a different Subject. > > > > Second: your question isn't specific to the modeling of lmFit. Instead, > > it's a general statistical question about why it's better to one ANOVA > model > > instead of a series of t-tests. I suggest you consult a basic statistical > > textbook or a local statistician to find the answer. > > > > Cheers, > > Jenny > > > > > > At 10:39 AM 1/21/2010, sabrina s wrote: > > > >> Hello, everyone: > >> > >> I have a question related to conceptual understanding of lmFit. > >> > >> I have the following experiment that I want to conduct, but I am not > sure > >> which is the right way to use design matrix and contrasts. Here is the > >> experiment: > >> > >> say I have 3 different strains that are genetically different, A, B and > C > >> where A is the control. I also have two different treatments, > >> T1 and T2. For each strain, I have 4 arrays for each treatment, so in > >> total, I have 24 arrays. What I want to find out is the significantly > >> differentially expressed genes for the following comparison: > >> 1) for control strain A: T1 vs T2 > >> 2) under T1, B vs. A (control) > >> 3) under T1, C vs. A > >> 4) for B, T1 vs T2 > >> 5) for C, T1 vs T2 > >> 6) interaction term of A and B , T1 and T2 > >> 7) interaction term of A and C, T1 and T2. > >> > >> There are two ways I could use lmFit > >> > >> One is: > >> > >> for the design matrix, I will include all 3 strains and 2 conditions, > >> I use the following code: > >> A_T1, A_T2, B_T1, B_T2, C_T1, C_T2 > >> sample1: 1 ,0 ,0, 0, 0 , 0 > >> sample2 : > >> > >> Then make a contrast matrix and follow the code below: > >> > >> fitGene<-lmFit(gene,design=design,weights=arrayWt); > >> fitGene2<-contrasts.fit(fitGene,cont.matrix) > >> fitGene2<-eBayes(fitGene2,proportion=p); > >> > >> > >> Two: > >> Instead of using all samples at one time to fit into a lmFit function, I > >> use > >> two design matrix only involves A and B, T1 and T2, > >> and second design matrix that involves A and C, T1 and T2, and make > >> contrast > >> matrix and fit separately. and later on I can compare these two > >> results if I want to. > >> > >> > >> > >> The question I have is: which one is the right one? For the first > method, > >> I > >> will have large DOF , and much lower p-values, but it was testing the > >> same thing as the second one, so am I creating an artifact? Thanks for > >> your help! > >> > >> > >> > >> > >> Sabrina > >> > >> [[alternative HTML version deleted]] > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@stat.math.ethz.ch > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > > > Jenny Drnevich, Ph.D. > > > > Functional Genomics Bioinformatics Specialist > > W.M. Keck Center for Comparative and Functional Genomics > > Roy J. Carver Biotechnology Center > > University of Illinois, Urbana-Champaign > > > > 330 ERML > > 1201 W. Gregory Dr. > > Urbana, IL 61801 > > USA > > > > ph: 217-244-7355 > > fax: 217-265-5066 > > e-mail: drnevich@illinois.edu > > > > > > -- > Sabrina > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]] ADD REPLYlink written 9.0 years ago by Sunny Srivastava340 Dear Sunny: Thanks for your input. personally I prefer combine p-value and fc together because you can not validate all genes detected, but pick some with higher FC will probably feasible to do. Sabrina On Mon, Jan 25, 2010 at 12:05 AM, Sunny Srivastava <research.baba@gmail.com>wrote: > Dear Sabrina, > Experienced members of the group will have better things to say but here is > my$0.25. > As a statistician - I would prefer Design 1. The reason is - that data > should never be ignored. > > Also, more the data, Limma can take more advantage of this information in > the Empirical Bayesian Estimation of S.D. Lower p-values are because of this > fact. (Taking less data might result in inflated SDs which can also result > in lower p-values.) > > Comparing Differential expression and Fold Change is like comparing Apple > and oranges. Differential expression has nothing to do with low fold change. > As a statistician, I would always trust differential expression than > Fold-Change. > If you think that fold-change is important for you then you should select > the differentially expressed genes ONLY if their log fold-change is above > say 2. > > you can do this in limma using topTable and/or decideTests. > > Pls correct me if I am wrong. > > Thx > S. > > On Thu, Jan 21, 2010 at 1:32 PM, sabrina s <sabrina.shao@gmail.com> wrote: > >> Hi, Jenny: >> Thanks for the quick reply. And thanks for pointing out about posting. I >> thought maybe my subject was not good enough to be noticed and that is why >> I >> posted again. This is my first post, so long way to go! >> Regarding your second point: I don't think my question is a general one >> about why ANOVA is better than a series of t-tests. I actually did both, >> but >> realized that the result from one single model ( use all samples) gave me >> much lower p-values, but when I looked at the expression value, the fold >> change was nothing , like 0.5. That is why I wonder if the inflated DOF >> gave >> me much low p-values. Any thoughts on that? >> >> Thanks! >> >> Sabrina >> >> On Thu, Jan 21, 2010 at 12:05 PM, Jenny Drnevich <drnevich@illinois.edu>> >wrote: >> >> > Hi Sabrina, >> > >> > First, a little list ettiquette. If you don't get a response to a post >> > within a day, it's not considered polite to just repost the same >> question >> > verbatim the next day under a different Subject. >> > >> > Second: your question isn't specific to the modeling of lmFit. Instead, >> > it's a general statistical question about why it's better to one ANOVA >> model >> > instead of a series of t-tests. I suggest you consult a basic >> statistical >> > textbook or a local statistician to find the answer. >> > >> > Cheers, >> > Jenny >> > >> > >> > At 10:39 AM 1/21/2010, sabrina s wrote: >> > >> >> Hello, everyone: >> >> >> >> I have a question related to conceptual understanding of lmFit. >> >> >> >> I have the following experiment that I want to conduct, but I am not >> sure >> >> which is the right way to use design matrix and contrasts. Here is the >> >> experiment: >> >> >> >> say I have 3 different strains that are genetically different, A, B and >> C >> >> where A is the control. I also have two different treatments, >> >> T1 and T2. For each strain, I have 4 arrays for each treatment, so in >> >> total, I have 24 arrays. What I want to find out is the significantly >> >> differentially expressed genes for the following comparison: >> >> 1) for control strain A: T1 vs T2 >> >> 2) under T1, B vs. A (control) >> >> 3) under T1, C vs. A >> >> 4) for B, T1 vs T2 >> >> 5) for C, T1 vs T2 >> >> 6) interaction term of A and B , T1 and T2 >> >> 7) interaction term of A and C, T1 and T2. >> >> >> >> There are two ways I could use lmFit >> >> >> >> One is: >> >> >> >> for the design matrix, I will include all 3 strains and 2 conditions, >> >> I use the following code: >> >> A_T1, A_T2, B_T1, B_T2, C_T1, C_T2 >> >> sample1: 1 ,0 ,0, 0, 0 , 0 >> >> sample2 : >> >> >> >> Then make a contrast matrix and follow the code below: >> >> >> >> fitGene<-lmFit(gene,design=design,weights=arrayWt); >> >> fitGene2<-contrasts.fit(fitGene,cont.matrix) >> >> fitGene2<-eBayes(fitGene2,proportion=p); >> >> >> >> >> >> Two: >> >> Instead of using all samples at one time to fit into a lmFit function, >> I >> >> use >> >> two design matrix only involves A and B, T1 and T2, >> >> and second design matrix that involves A and C, T1 and T2, and make >> >> contrast >> >> matrix and fit separately. and later on I can compare these two >> >> results if I want to. >> >> >> >> >> >> >> >> The question I have is: which one is the right one? For the first >> method, >> >> I >> >> will have large DOF , and much lower p-values, but it was testing the >> >> same thing as the second one, so am I creating an artifact? Thanks for >> >> your help! >> >> >> >> >> >> >> >> >> >> Sabrina >> >> >> >> [[alternative HTML version deleted]] >> >> >> >> _______________________________________________ >> >> Bioconductor mailing list >> >> Bioconductor@stat.math.ethz.ch >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> Search the archives: >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> > >> > Jenny Drnevich, Ph.D. >> > >> > Functional Genomics Bioinformatics Specialist >> > W.M. Keck Center for Comparative and Functional Genomics >> > Roy J. Carver Biotechnology Center >> > University of Illinois, Urbana-Champaign >> > >> > 330 ERML >> > 1201 W. Gregory Dr. >> > Urbana, IL 61801 >> > USA >> > >> > ph: 217-244-7355 >> > fax: 217-265-5066 >> > e-mail: drnevich@illinois.edu >> > >> >> >> >> -- >> Sabrina >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -- Sabrina [[alternative HTML version deleted]]
This strategy is bound to be less efficient, though. See a recent article on this subject. http://www.biomedcentral.com/1471-2105/10/402 -Christos Christos Hatzis, Ph.D. Nuvera Biosciences, Inc. 400 West Cummings Park, Suite 5350 Woburn, MA 01801 781-938-3844 -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of sabrina s Sent: Monday, January 25, 2010 3:17 PM To: Sunny Srivastava Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] question about lmFit model Dear Sunny: Thanks for your input. personally I prefer combine p-value and fc together because you can not validate all genes detected, but pick some with higher FC will probably feasible to do. Sabrina On Mon, Jan 25, 2010 at 12:05 AM, Sunny Srivastava <research.baba at="" gmail.com="">wrote: > Dear Sabrina, > Experienced members of the group will have better things to say but here is > my $0.25. > As a statistician - I would prefer Design 1. The reason is - that data > should never be ignored. > > Also, more the data, Limma can take more advantage of this information in > the Empirical Bayesian Estimation of S.D. Lower p-values are because of this > fact. (Taking less data might result in inflated SDs which can also result > in lower p-values.) > > Comparing Differential expression and Fold Change is like comparing Apple > and oranges. Differential expression has nothing to do with low fold change. > As a statistician, I would always trust differential expression than > Fold-Change. > If you think that fold-change is important for you then you should select > the differentially expressed genes ONLY if their log fold-change is above > say 2. > > you can do this in limma using topTable and/or decideTests. > > Pls correct me if I am wrong. > > Thx > S. > > On Thu, Jan 21, 2010 at 1:32 PM, sabrina s <sabrina.shao at="" gmail.com=""> wrote: > >> Hi, Jenny: >> Thanks for the quick reply. And thanks for pointing out about posting. I >> thought maybe my subject was not good enough to be noticed and that is why >> I >> posted again. This is my first post, so long way to go! >> Regarding your second point: I don't think my question is a general one >> about why ANOVA is better than a series of t-tests. I actually did both, >> but >> realized that the result from one single model ( use all samples) gave me >> much lower p-values, but when I looked at the expression value, the fold >> change was nothing , like 0.5. That is why I wonder if the inflated DOF >> gave >> me much low p-values. Any thoughts on that? >> >> Thanks! >> >> Sabrina >> >> On Thu, Jan 21, 2010 at 12:05 PM, Jenny Drnevich <drnevich at="" illinois.edu="">> >wrote: >> >> > Hi Sabrina, >> > >> > First, a little list ettiquette. If you don't get a response to a post >> > within a day, it's not considered polite to just repost the same >> question >> > verbatim the next day under a different Subject. >> > >> > Second: your question isn't specific to the modeling of lmFit. Instead, >> > it's a general statistical question about why it's better to one ANOVA >> model >> > instead of a series of t-tests. I suggest you consult a basic >> statistical >> > textbook or a local statistician to find the answer. >> > >> > Cheers, >> > Jenny >> > >> > >> > At 10:39 AM 1/21/2010, sabrina s wrote: >> > >> >> Hello, everyone: >> >> >> >> I have a question related to conceptual understanding of lmFit. >> >> >> >> I have the following experiment that I want to conduct, but I am not >> sure >> >> which is the right way to use design matrix and contrasts. Here is the >> >> experiment: >> >> >> >> say I have 3 different strains that are genetically different, A, B and >> C >> >> where A is the control. I also have two different treatments, >> >> T1 and T2. For each strain, I have 4 arrays for each treatment, so in >> >> total, I have 24 arrays. What I want to find out is the significantly >> >> differentially expressed genes for the following comparison: >> >> 1) for control strain A: T1 vs T2 >> >> 2) under T1, B vs. A (control) >> >> 3) under T1, C vs. A >> >> 4) for B, T1 vs T2 >> >> 5) for C, T1 vs T2 >> >> 6) interaction term of A and B , T1 and T2 >> >> 7) interaction term of A and C, T1 and T2. >> >> >> >> There are two ways I could use lmFit >> >> >> >> One is: >> >> >> >> for the design matrix, I will include all 3 strains and 2 conditions, >> >> I use the following code: >> >> A_T1, A_T2, B_T1, B_T2, C_T1, C_T2 >> >> sample1: 1 ,0 ,0, 0, 0 , 0 >> >> sample2 : >> >> >> >> Then make a contrast matrix and follow the code below: >> >> >> >> fitGene<-lmFit(gene,design=design,weights=arrayWt); >> >> fitGene2<-contrasts.fit(fitGene,cont.matrix) >> >> fitGene2<-eBayes(fitGene2,proportion=p); >> >> >> >> >> >> Two: >> >> Instead of using all samples at one time to fit into a lmFit function, >> I >> >> use >> >> two design matrix only involves A and B, T1 and T2, >> >> and second design matrix that involves A and C, T1 and T2, and make >> >> contrast >> >> matrix and fit separately. and later on I can compare these two >> >> results if I want to. >> >> >> >> >> >> >> >> The question I have is: which one is the right one? For the first >> method, >> >> I >> >> will have large DOF , and much lower p-values, but it was testing the >> >> same thing as the second one, so am I creating an artifact? Thanks for >> >> your help! >> >> >> >> >> >> >> >> >> >> Sabrina >> >> >> >> [[alternative HTML version deleted]] >> >> >> >> _______________________________________________ >> >> Bioconductor mailing list >> >> Bioconductor at stat.math.ethz.ch >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> Search the archives: >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> > >> > Jenny Drnevich, Ph.D. >> > >> > Functional Genomics Bioinformatics Specialist >> > W.M. Keck Center for Comparative and Functional Genomics >> > Roy J. Carver Biotechnology Center >> > University of Illinois, Urbana-Champaign >> > >> > 330 ERML >> > 1201 W. Gregory Dr. >> > Urbana, IL 61801 >> > USA >> > >> > ph: 217-244-7355 >> > fax: 217-265-5066 >> > e-mail: drnevich at illinois.edu >> > >> >> >> >> -- >> Sabrina >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -- Sabrina [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ADD REPLYlink written 9.0 years ago by Christos Hatzis110 Answer: question about lmFit model 0 9.0 years ago by Sunny Srivastava340 wrote: Dear Christos, Thanks for pointing out this research paper! This is interesting! I am wondering if non-specific filtering based on variances is a good way to reduce the number of genes (probes) in this case? Lets say - we exclude the genes above a particular variance cutoff (for eg: >90 percentile) Thx S. On Mon, Jan 25, 2010 at 3:30 PM, Christos Hatzis < christos.hatzis@nuverabio.com> wrote: > This strategy is bound to be less efficient, though. > See a recent article on this subject. > http://www.biomedcentral.com/1471-2105/10/402 > > -Christos > > > Christos Hatzis, Ph.D. > Nuvera Biosciences, Inc. > 400 West Cummings Park, Suite 5350 > Woburn, MA 01801 > 781-938-3844 > > > > -----Original Message----- > From: bioconductor-bounces@stat.math.ethz.ch > [mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of sabrina s > Sent: Monday, January 25, 2010 3:17 PM > To: Sunny Srivastava > Cc: bioconductor@stat.math.ethz.ch > Subject: Re: [BioC] question about lmFit model > > Dear Sunny: > Thanks for your input. personally I prefer combine p-value and fc together > because you can not validate all genes detected, but pick some with higher > FC will probably feasible to do. > > Sabrina > > > > On Mon, Jan 25, 2010 at 12:05 AM, Sunny Srivastava > <research.baba@gmail.com>wrote: > > > Dear Sabrina, > > Experienced members of the group will have better things to say but here > is > > my$0.25. > > As a statistician - I would prefer Design 1. The reason is - that data > > should never be ignored. > > > > Also, more the data, Limma can take more advantage of this information in > > the Empirical Bayesian Estimation of S.D. Lower p-values are because of > this > > fact. (Taking less data might result in inflated SDs which can also > result > > in lower p-values.) > > > > Comparing Differential expression and Fold Change is like comparing Apple > > and oranges. Differential expression has nothing to do with low fold > change. > > As a statistician, I would always trust differential expression than > > Fold-Change. > > If you think that fold-change is important for you then you should select > > the differentially expressed genes ONLY if their log fold-change is above > > say 2. > > > > you can do this in limma using topTable and/or decideTests. > > > > Pls correct me if I am wrong. > > > > Thx > > S. > > > > On Thu, Jan 21, 2010 at 1:32 PM, sabrina s <sabrina.shao@gmail.com> > wrote: > > > >> Hi, Jenny: > >> Thanks for the quick reply. And thanks for pointing out about posting. I > >> thought maybe my subject was not good enough to be noticed and that is > why > >> I > >> posted again. This is my first post, so long way to go! > >> Regarding your second point: I don't think my question is a general one > >> about why ANOVA is better than a series of t-tests. I actually did both, > >> but > >> realized that the result from one single model ( use all samples) gave > me > >> much lower p-values, but when I looked at the expression value, the fold > >> change was nothing , like 0.5. That is why I wonder if the inflated DOF > >> gave > >> me much low p-values. Any thoughts on that? > >> > >> Thanks! > >> > >> Sabrina > >> > >> On Thu, Jan 21, 2010 at 12:05 PM, Jenny Drnevich <drnevich@illinois.edu> >> >wrote: > >> > >> > Hi Sabrina, > >> > > >> > First, a little list ettiquette. If you don't get a response to a post > >> > within a day, it's not considered polite to just repost the same > >> question > >> > verbatim the next day under a different Subject. > >> > > >> > Second: your question isn't specific to the modeling of lmFit. > Instead, > >> > it's a general statistical question about why it's better to one ANOVA > >> model > >> > instead of a series of t-tests. I suggest you consult a basic > >> statistical > >> > textbook or a local statistician to find the answer. > >> > > >> > Cheers, > >> > Jenny > >> > > >> > > >> > At 10:39 AM 1/21/2010, sabrina s wrote: > >> > > >> >> Hello, everyone: > >> >> > >> >> I have a question related to conceptual understanding of lmFit. > >> >> > >> >> I have the following experiment that I want to conduct, but I am not > >> sure > >> >> which is the right way to use design matrix and contrasts. Here is > the > >> >> experiment: > >> >> > >> >> say I have 3 different strains that are genetically different, A, B > and > >> C > >> >> where A is the control. I also have two different treatments, > >> >> T1 and T2. For each strain, I have 4 arrays for each treatment, so > in > >> >> total, I have 24 arrays. What I want to find out is the significantly > >> >> differentially expressed genes for the following comparison: > >> >> 1) for control strain A: T1 vs T2 > >> >> 2) under T1, B vs. A (control) > >> >> 3) under T1, C vs. A > >> >> 4) for B, T1 vs T2 > >> >> 5) for C, T1 vs T2 > >> >> 6) interaction term of A and B , T1 and T2 > >> >> 7) interaction term of A and C, T1 and T2. > >> >> > >> >> There are two ways I could use lmFit > >> >> > >> >> One is: > >> >> > >> >> for the design matrix, I will include all 3 strains and 2 conditions, > >> >> I use the following code: > >> >> A_T1, A_T2, B_T1, B_T2, C_T1, C_T2 > >> >> sample1: 1 ,0 ,0, 0, 0 , 0 > >> >> sample2 : > >> >> > >> >> Then make a contrast matrix and follow the code below: > >> >> > >> >> fitGene<-lmFit(gene,design=design,weights=arrayWt); > >> >> fitGene2<-contrasts.fit(fitGene,cont.matrix) > >> >> fitGene2<-eBayes(fitGene2,proportion=p); > >> >> > >> >> > >> >> Two: > >> >> Instead of using all samples at one time to fit into a lmFit > function, > >> I > >> >> use > >> >> two design matrix only involves A and B, T1 and T2, > >> >> and second design matrix that involves A and C, T1 and T2, and make > >> >> contrast > >> >> matrix and fit separately. and later on I can compare these two > >> >> results if I want to. > >> >> > >> >> > >> >> > >> >> The question I have is: which one is the right one? For the first > >> method, > >> >> I > >> >> will have large DOF , and much lower p-values, but it was testing the > >> >> same thing as the second one, so am I creating an artifact? Thanks > for > >> >> your help! > >> >> > >> >> > >> >> > >> >> > >> >> Sabrina > >> >> > >> >> [[alternative HTML version deleted]] > >> >> > >> >> _______________________________________________ > >> >> Bioconductor mailing list > >> >> Bioconductor@stat.math.ethz.ch > >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> >> Search the archives: > >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> >> > >> > > >> > Jenny Drnevich, Ph.D. > >> > > >> > Functional Genomics Bioinformatics Specialist > >> > W.M. Keck Center for Comparative and Functional Genomics > >> > Roy J. Carver Biotechnology Center > >> > University of Illinois, Urbana-Champaign > >> > > >> > 330 ERML > >> > 1201 W. Gregory Dr. > >> > Urbana, IL 61801 > >> > USA > >> > > >> > ph: 217-244-7355 > >> > fax: 217-265-5066 > >> > e-mail: drnevich@illinois.edu > >> > > >> > >> > >> > >> -- > >> Sabrina > >> > >> [[alternative HTML version deleted]] > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@stat.math.ethz.ch > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > > > > > > -- > Sabrina > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > [[alternative HTML version deleted]]
I don't think that removing the high variance genes will improve the power to detect differentially expressed genes. Removing the low variance genes might though, as such genes would be less likely to exhibit large between groups sum of squares that would be associated with differential expression. -Christos From: Sunny Srivastava [mailto:research.baba@gmail.com] Sent: Monday, January 25, 2010 11:18 PM To: christos.hatzis@nuverabio.com Cc: sabrina s; bioconductor@stat.math.ethz.ch Subject: Re: [BioC] question about lmFit model Dear Christos, Thanks for pointing out this research paper! This is interesting! I am wondering if non-specific filtering based on variances is a good way to reduce the number of genes (probes) in this case? Lets say - we exclude the genes above a particular variance cutoff (for eg: >90 percentile) Thx S. On Mon, Jan 25, 2010 at 3:30 PM, Christos Hatzis <christos.hatzis@nuverabio.com> wrote: This strategy is bound to be less efficient, though. See a recent article on this subject. http://www.biomedcentral.com/1471-2105/10/402 -Christos Christos Hatzis, Ph.D. Nuvera Biosciences, Inc. 400 West Cummings Park, Suite 5350 Woburn, MA 01801 781-938-3844 -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of sabrina s Sent: Monday, January 25, 2010 3:17 PM To: Sunny Srivastava Cc: bioconductor@stat.math.ethz.ch Subject: Re: [BioC] question about lmFit model Dear Sunny: Thanks for your input. personally I prefer combine p-value and fc together because you can not validate all genes detected, but pick some with higher FC will probably feasible to do. Sabrina On Mon, Jan 25, 2010 at 12:05 AM, Sunny Srivastava <research.baba@gmail.com>wrote: > Dear Sabrina, > Experienced members of the group will have better things to say but here is > my $0.25. > As a statistician - I would prefer Design 1. The reason is - that data > should never be ignored. > > Also, more the data, Limma can take more advantage of this information in > the Empirical Bayesian Estimation of S.D. Lower p-values are because of this > fact. (Taking less data might result in inflated SDs which can also result > in lower p-values.) > > Comparing Differential expression and Fold Change is like comparing Apple > and oranges. Differential expression has nothing to do with low fold change. > As a statistician, I would always trust differential expression than > Fold-Change. > If you think that fold-change is important for you then you should select > the differentially expressed genes ONLY if their log fold-change is above > say 2. > > you can do this in limma using topTable and/or decideTests. > > Pls correct me if I am wrong. > > Thx > S. > > On Thu, Jan 21, 2010 at 1:32 PM, sabrina s <sabrina.shao@gmail.com> wrote: > >> Hi, Jenny: >> Thanks for the quick reply. And thanks for pointing out about posting. I >> thought maybe my subject was not good enough to be noticed and that is why >> I >> posted again. This is my first post, so long way to go! >> Regarding your second point: I don't think my question is a general one >> about why ANOVA is better than a series of t-tests. I actually did both, >> but >> realized that the result from one single model ( use all samples) gave me >> much lower p-values, but when I looked at the expression value, the fold >> change was nothing , like 0.5. That is why I wonder if the inflated DOF >> gave >> me much low p-values. Any thoughts on that? >> >> Thanks! >> >> Sabrina >> >> On Thu, Jan 21, 2010 at 12:05 PM, Jenny Drnevich <drnevich@illinois.edu>> >wrote: >> >> > Hi Sabrina, >> > >> > First, a little list ettiquette. If you don't get a response to a post >> > within a day, it's not considered polite to just repost the same >> question >> > verbatim the next day under a different Subject. >> > >> > Second: your question isn't specific to the modeling of lmFit. Instead, >> > it's a general statistical question about why it's better to one ANOVA >> model >> > instead of a series of t-tests. I suggest you consult a basic >> statistical >> > textbook or a local statistician to find the answer. >> > >> > Cheers, >> > Jenny >> > >> > >> > At 10:39 AM 1/21/2010, sabrina s wrote: >> > >> >> Hello, everyone: >> >> >> >> I have a question related to conceptual understanding of lmFit. >> >> >> >> I have the following experiment that I want to conduct, but I am not >> sure >> >> which is the right way to use design matrix and contrasts. Here is the >> >> experiment: >> >> >> >> say I have 3 different strains that are genetically different, A, B and >> C >> >> where A is the control. I also have two different treatments, >> >> T1 and T2. For each strain, I have 4 arrays for each treatment, so in >> >> total, I have 24 arrays. What I want to find out is the significantly >> >> differentially expressed genes for the following comparison: >> >> 1) for control strain A: T1 vs T2 >> >> 2) under T1, B vs. A (control) >> >> 3) under T1, C vs. A >> >> 4) for B, T1 vs T2 >> >> 5) for C, T1 vs T2 >> >> 6) interaction term of A and B , T1 and T2 >> >> 7) interaction term of A and C, T1 and T2. >> >> >> >> There are two ways I could use lmFit >> >> >> >> One is: >> >> >> >> for the design matrix, I will include all 3 strains and 2 conditions, >> >> I use the following code: >> >> A_T1, A_T2, B_T1, B_T2, C_T1, C_T2 >> >> sample1: 1 ,0 ,0, 0, 0 , 0 >> >> sample2 : >> >> >> >> Then make a contrast matrix and follow the code below: >> >> >> >> fitGene<-lmFit(gene,design=design,weights=arrayWt); >> >> fitGene2<-contrasts.fit(fitGene,cont.matrix) >> >> fitGene2<-eBayes(fitGene2,proportion=p); >> >> >> >> >> >> Two: >> >> Instead of using all samples at one time to fit into a lmFit function, >> I >> >> use >> >> two design matrix only involves A and B, T1 and T2, >> >> and second design matrix that involves A and C, T1 and T2, and make >> >> contrast >> >> matrix and fit separately. and later on I can compare these two >> >> results if I want to. >> >> >> >> >> >> >> >> The question I have is: which one is the right one? For the first >> method, >> >> I >> >> will have large DOF , and much lower p-values, but it was testing the >> >> same thing as the second one, so am I creating an artifact? Thanks for >> >> your help! >> >> >> >> >> >> >> >> >> >> Sabrina >> >> >> >> [[alternative HTML version deleted]] >> >> >> >> _______________________________________________ >> >> Bioconductor mailing list >> >> Bioconductor@stat.math.ethz.ch >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> Search the archives: >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> > >> > Jenny Drnevich, Ph.D. >> > >> > Functional Genomics Bioinformatics Specialist >> > W.M. Keck Center for Comparative and Functional Genomics >> > Roy J. Carver Biotechnology Center >> > University of Illinois, Urbana-Champaign >> > >> > 330 ERML >> > 1201 W. Gregory Dr. >> > Urbana, IL 61801 >> > USA >> > >> > ph: 217-244-7355 >> > fax: 217-265-5066 >> > e-mail: drnevich@illinois.edu >> > >> >> >> >> -- >> Sabrina >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -- Sabrina [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]] ADD REPLYlink written 9.0 years ago by Christos Hatzis110 Answer: question about lmFit model 0 9.0 years ago by sabrina.shao220 sabrina.shao220 wrote: Hi, Christos: Thanks for pointing this out! Sabrina On Mon, Jan 25, 2010 at 3:30 PM, Christos Hatzis < christos.hatzis@nuverabio.com> wrote: > This strategy is bound to be less efficient, though. > See a recent article on this subject. > http://www.biomedcentral.com/1471-2105/10/402 > > -Christos > > > Christos Hatzis, Ph.D. > Nuvera Biosciences, Inc. > 400 West Cummings Park, Suite 5350 > Woburn, MA 01801 > 781-938-3844 > > > > -----Original Message----- > From: bioconductor-bounces@stat.math.ethz.ch > [mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of sabrina s > Sent: Monday, January 25, 2010 3:17 PM > To: Sunny Srivastava > Cc: bioconductor@stat.math.ethz.ch > Subject: Re: [BioC] question about lmFit model > > Dear Sunny: > Thanks for your input. personally I prefer combine p-value and fc together > because you can not validate all genes detected, but pick some with higher > FC will probably feasible to do. > > Sabrina > > > > On Mon, Jan 25, 2010 at 12:05 AM, Sunny Srivastava > <research.baba@gmail.com>wrote: > > > Dear Sabrina, > > Experienced members of the group will have better things to say but here > is > > my$0.25. > > As a statistician - I would prefer Design 1. The reason is - that data > > should never be ignored. > > > > Also, more the data, Limma can take more advantage of this information in > > the Empirical Bayesian Estimation of S.D. Lower p-values are because of > this > > fact. (Taking less data might result in inflated SDs which can also > result > > in lower p-values.) > > > > Comparing Differential expression and Fold Change is like comparing Apple > > and oranges. Differential expression has nothing to do with low fold > change. > > As a statistician, I would always trust differential expression than > > Fold-Change. > > If you think that fold-change is important for you then you should select > > the differentially expressed genes ONLY if their log fold-change is above > > say 2. > > > > you can do this in limma using topTable and/or decideTests. > > > > Pls correct me if I am wrong. > > > > Thx > > S. > > > > On Thu, Jan 21, 2010 at 1:32 PM, sabrina s <sabrina.shao@gmail.com> > wrote: > > > >> Hi, Jenny: > >> Thanks for the quick reply. And thanks for pointing out about posting. I > >> thought maybe my subject was not good enough to be noticed and that is > why > >> I > >> posted again. This is my first post, so long way to go! > >> Regarding your second point: I don't think my question is a general one > >> about why ANOVA is better than a series of t-tests. I actually did both, > >> but > >> realized that the result from one single model ( use all samples) gave > me > >> much lower p-values, but when I looked at the expression value, the fold > >> change was nothing , like 0.5. That is why I wonder if the inflated DOF > >> gave > >> me much low p-values. Any thoughts on that? > >> > >> Thanks! > >> > >> Sabrina > >> > >> On Thu, Jan 21, 2010 at 12:05 PM, Jenny Drnevich <drnevich@illinois.edu> >> >wrote: > >> > >> > Hi Sabrina, > >> > > >> > First, a little list ettiquette. If you don't get a response to a post > >> > within a day, it's not considered polite to just repost the same > >> question > >> > verbatim the next day under a different Subject. > >> > > >> > Second: your question isn't specific to the modeling of lmFit. > Instead, > >> > it's a general statistical question about why it's better to one ANOVA > >> model > >> > instead of a series of t-tests. I suggest you consult a basic > >> statistical > >> > textbook or a local statistician to find the answer. > >> > > >> > Cheers, > >> > Jenny > >> > > >> > > >> > At 10:39 AM 1/21/2010, sabrina s wrote: > >> > > >> >> Hello, everyone: > >> >> > >> >> I have a question related to conceptual understanding of lmFit. > >> >> > >> >> I have the following experiment that I want to conduct, but I am not > >> sure > >> >> which is the right way to use design matrix and contrasts. Here is > the > >> >> experiment: > >> >> > >> >> say I have 3 different strains that are genetically different, A, B > and > >> C > >> >> where A is the control. I also have two different treatments, > >> >> T1 and T2. For each strain, I have 4 arrays for each treatment, so > in > >> >> total, I have 24 arrays. What I want to find out is the significantly > >> >> differentially expressed genes for the following comparison: > >> >> 1) for control strain A: T1 vs T2 > >> >> 2) under T1, B vs. A (control) > >> >> 3) under T1, C vs. A > >> >> 4) for B, T1 vs T2 > >> >> 5) for C, T1 vs T2 > >> >> 6) interaction term of A and B , T1 and T2 > >> >> 7) interaction term of A and C, T1 and T2. > >> >> > >> >> There are two ways I could use lmFit > >> >> > >> >> One is: > >> >> > >> >> for the design matrix, I will include all 3 strains and 2 conditions, > >> >> I use the following code: > >> >> A_T1, A_T2, B_T1, B_T2, C_T1, C_T2 > >> >> sample1: 1 ,0 ,0, 0, 0 , 0 > >> >> sample2 : > >> >> > >> >> Then make a contrast matrix and follow the code below: > >> >> > >> >> fitGene<-lmFit(gene,design=design,weights=arrayWt); > >> >> fitGene2<-contrasts.fit(fitGene,cont.matrix) > >> >> fitGene2<-eBayes(fitGene2,proportion=p); > >> >> > >> >> > >> >> Two: > >> >> Instead of using all samples at one time to fit into a lmFit > function, > >> I > >> >> use > >> >> two design matrix only involves A and B, T1 and T2, > >> >> and second design matrix that involves A and C, T1 and T2, and make > >> >> contrast > >> >> matrix and fit separately. and later on I can compare these two > >> >> results if I want to. > >> >> > >> >> > >> >> > >> >> The question I have is: which one is the right one? For the first > >> method, > >> >> I > >> >> will have large DOF , and much lower p-values, but it was testing the > >> >> same thing as the second one, so am I creating an artifact? Thanks > for > >> >> your help! > >> >> > >> >> > >> >> > >> >> > >> >> Sabrina > >> >> > >> >> [[alternative HTML version deleted]] > >> >> > >> >> _______________________________________________ > >> >> Bioconductor mailing list > >> >> Bioconductor@stat.math.ethz.ch > >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> >> Search the archives: > >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> >> > >> > > >> > Jenny Drnevich, Ph.D. > >> > > >> > Functional Genomics Bioinformatics Specialist > >> > W.M. Keck Center for Comparative and Functional Genomics > >> > Roy J. Carver Biotechnology Center > >> > University of Illinois, Urbana-Champaign > >> > > >> > 330 ERML > >> > 1201 W. Gregory Dr. > >> > Urbana, IL 61801 > >> > USA > >> > > >> > ph: 217-244-7355 > >> > fax: 217-265-5066 > >> > e-mail: drnevich@illinois.edu > >> > > >> > >> > >> > >> -- > >> Sabrina > >> > >> [[alternative HTML version deleted]] > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@stat.math.ethz.ch > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > > > > > > -- > Sabrina > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- Sabrina [[alternative HTML version deleted]]

Content
Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.