Question: question about topTable ranking of limma
0
gravatar for Roger Liu
9.9 years ago by
Roger Liu260
Roger Liu260 wrote:
Dear List, I have several biological replicates affy arrayes (a simple one group 4 arrayes), and tried to use eBayes to get the differentially expressed genes. The topTable ranked the genes by B statistics, which mixed over- expressed genes and under-expressed genes. My question is how I should separate the over and under expressed genes from topTable results. My idea is to calculate the mean average expressed value/intensities (extracted from topTable results with using the number of all the genes) and compare ranked genes with the mean value, if the expressed value is greater than the mean, I take this gene as over-expressed, otherwise, it's under-expressed. Since I don't know the underlying implement of topTable or eBayes, I want to make sure if my method is right. Or you have some better ideas. Thanks. [[alternative HTML version deleted]]
affy • 664 views
ADD COMMENTlink modified 9.9 years ago by Jenny Drnevich1.9k • written 9.9 years ago by Roger Liu260
Answer: question about topTable ranking of limma
0
gravatar for Heidi Dvinge
9.9 years ago by
Heidi Dvinge2.0k
Heidi Dvinge2.0k wrote:
Hello, you can just sort the topTable result by the t-statistics since these will be either positive or negative, or call it directly with sort.by="t" and then filter for significant p-values. HTH \Heidi On 15 Sep 2009, at 10:05, zrl wrote: > Dear List, > > I have several biological replicates affy arrayes (a simple one > group 4 > arrayes), and tried to use eBayes to get the differentially > expressed genes. > The topTable ranked the genes by B statistics, which mixed over- > expressed > genes and under-expressed genes. My question is how I should > separate the > over and under expressed genes from topTable results. My idea is to > calculate the mean average expressed value/intensities (extracted from > topTable results with using the number of all the genes) and > compare ranked > genes with the mean value, if the expressed value is greater than > the mean, > I take this gene as over-expressed, otherwise, it's under-expressed. > Since I don't know the underlying implement of topTable or eBayes, > I want to > make sure if my method is right. Or you have some better ideas. > Thanks. > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/ > gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD COMMENTlink written 9.9 years ago by Heidi Dvinge2.0k
Hi Heidi, Thank you for your response. Maybe I didn't make my question very clear. This analysis is for only one group of 4 biological replicates such as: group array1 On Tue, Sep 15, 2009 at 4:20 AM, Heidi Dvinge <heidi@ebi.ac.uk> wrote: > Hello, > you can just sort the topTable result by the t-statistics since these will > be either positive or negative, or call it directly with sort.by="t" and > then filter for significant p-values. > > HTH > \Heidi > > On 15 Sep 2009, at 10:05, zrl wrote: > > Dear List, > > I have several biological replicates affy arrayes (a simple one group 4 > arrayes), and tried to use eBayes to get the differentially expressed > genes. > The topTable ranked the genes by B statistics, which mixed over- expressed > genes and under-expressed genes. My question is how I should separate the > over and under expressed genes from topTable results. My idea is to > calculate the mean average expressed value/intensities (extracted from > topTable results with using the number of all the genes) and compare ranked > genes with the mean value, if the expressed value is greater than the mean, > I take this gene as over-expressed, otherwise, it's under-expressed. > Since I don't know the underlying implement of topTable or eBayes, I want > to > make sure if my method is right. Or you have some better ideas. Thanks. > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > [[alternative HTML version deleted]]
ADD REPLYlink written 9.9 years ago by Roger Liu260
Sorry for the incomplete message, click the send accidentally. This analysis is for only one group of 4 biological replicates such as: group array1 a array2 a array3 a array4 a I tried to identify the genes which are differently expressed in group a, but no other reference groups for comparison. Therefore, even all the t statistics are positive. Any thoughts? Thanks. On Tue, Sep 15, 2009 at 11:13 AM, zrl <zrl1974@gmail.com> wrote: > Hi Heidi, > > Thank you for your response. Maybe I didn't make my question very clear. > This analysis is for only one group of 4 biological replicates such as: > group > array1 > > > > > On Tue, Sep 15, 2009 at 4:20 AM, Heidi Dvinge <heidi@ebi.ac.uk> wrote: > >> Hello, >> you can just sort the topTable result by the t-statistics since these will >> be either positive or negative, or call it directly with sort.by="t" and >> then filter for significant p-values. >> >> HTH >> \Heidi >> >> On 15 Sep 2009, at 10:05, zrl wrote: >> >> Dear List, >> >> I have several biological replicates affy arrayes (a simple one group 4 >> arrayes), and tried to use eBayes to get the differentially expressed >> genes. >> The topTable ranked the genes by B statistics, which mixed over- expressed >> genes and under-expressed genes. My question is how I should separate the >> over and under expressed genes from topTable results. My idea is to >> calculate the mean average expressed value/intensities (extracted from >> topTable results with using the number of all the genes) and compare >> ranked >> genes with the mean value, if the expressed value is greater than the >> mean, >> I take this gene as over-expressed, otherwise, it's under- expressed. >> Since I don't know the underlying implement of topTable or eBayes, I want >> to >> make sure if my method is right. Or you have some better ideas. Thanks. >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> > [[alternative HTML version deleted]]
ADD REPLYlink written 9.9 years ago by Roger Liu260
Hi, With only one group you can not speak of "differentially expressed" and testing, as that assumes that you have at least two different groups or conditions. The test that you have performed probably just compares gene expression to zero (a moderated one-sample t-test) and for that you would expect all genes to be significant. What you (I am guessing) probably mean by "differentially expressed" is that you are interested to find genes that vary highly between your 4 replicates. To find those the best you can do is to rank the genes with respect to their variances/standard deviations. But you can't get a p-value for this, because (unless all values are identical) any gene will have a variance that is significantly higher than 0. Best Wishes Claus > -----Original Message----- > From: bioconductor-bounces at stat.math.ethz.ch [mailto:bioconductor- > bounces at stat.math.ethz.ch] On Behalf Of zrl > Sent: 15 September 2009 17:18 > To: Heidi Dvinge > Cc: bioconductor > Subject: Re: [BioC] question about topTable ranking of limma > > Sorry for the incomplete message, click the send accidentally. > > This analysis is for only one group of 4 biological replicates such as: > group > array1 a > array2 a > array3 a > array4 a > > I tried to identify the genes which are differently expressed in group a, > but no other reference groups for comparison. Therefore, even all the t > statistics are positive. > > Any thoughts? Thanks. > > > > On Tue, Sep 15, 2009 at 11:13 AM, zrl <zrl1974 at="" gmail.com=""> wrote: > > > Hi Heidi, > > > > Thank you for your response. Maybe I didn't make my question very clear. > > This analysis is for only one group of 4 biological replicates such as: > > group > > array1 > > > > > > > > > > On Tue, Sep 15, 2009 at 4:20 AM, Heidi Dvinge <heidi at="" ebi.ac.uk=""> wrote: > > > >> Hello, > >> you can just sort the topTable result by the t-statistics since these > will > >> be either positive or negative, or call it directly with sort.by="t" > and > >> then filter for significant p-values. > >> > >> HTH > >> \Heidi > >> > >> On 15 Sep 2009, at 10:05, zrl wrote: > >> > >> Dear List, > >> > >> I have several biological replicates affy arrayes (a simple one group 4 > >> arrayes), and tried to use eBayes to get the differentially expressed > >> genes. > >> The topTable ranked the genes by B statistics, which mixed over- > expressed > >> genes and under-expressed genes. My question is how I should separate > the > >> over and under expressed genes from topTable results. My idea is to > >> calculate the mean average expressed value/intensities (extracted from > >> topTable results with using the number of all the genes) and compare > >> ranked > >> genes with the mean value, if the expressed value is greater than the > >> mean, > >> I take this gene as over-expressed, otherwise, it's under- expressed. > >> Since I don't know the underlying implement of topTable or eBayes, I > want > >> to > >> make sure if my method is right. Or you have some better ideas. Thanks. > >> > >> [[alternative HTML version deleted]] > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor at stat.math.ethz.ch > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > >> > >> > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor The University of Aberdeen is a charity registered in Scotland, No SC013683.
ADD REPLYlink written 9.9 years ago by Mayer, Claus-Dieter120
Claus, Thank you for your response. However, at some points, I don't agree with you. The differentially expressed genes for just one group, I mean the genes whose average expression levels across the biological replicates (here 4 replicates) are over/under the grand mean expression value. I think it's similar as the analysis of identifying the differently expressed genes in experiment of 4 replicates with two color arrays (Cy5 Mut, Cy3 Ref), which you got single log ratios for each gene across 4 biological replicates. For my design, I just measure the absolute expression value(single channel intensity). Therefore, when I fit the limma model, it actully evaluate the average expression level(intensity) for each gene across the replicates. Of course I may just rank the average intensities from high to low and compare them with mean to get the idea of differently expressed genes. But I believe limma can do better job, since I want not only ranking but also significant level. If the variability is hight among the replicates, the expression level for this gene maybe not reliable even the average is high for this gene. I just try to figure out a way to separate the over/under expressed values. If I was wrong, please let me know. Thank you. On Tue, Sep 15, 2009 at 12:55 PM, Mayer, Claus-Dieter <c.mayer@abdn.ac.uk>wrote: > Hi, > > With only one group you can not speak of "differentially expressed" and > testing, as that assumes that you have at least two different groups or > conditions. The test that you have performed probably just compares gene > expression to zero (a moderated one-sample t-test) and for that you would > expect all genes to be significant. > > What you (I am guessing) probably mean by "differentially expressed" is > that you are interested to find genes that vary highly between your 4 > replicates. To find those the best you can do is to rank the genes with > respect to their variances/standard deviations. But you can't get a p-value > for this, because (unless all values are identical) any gene will have a > variance that is significantly higher than 0. > > Best Wishes > > Claus > > > -----Original Message----- > > From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor- > > bounces@stat.math.ethz.ch] On Behalf Of zrl > > Sent: 15 September 2009 17:18 > > To: Heidi Dvinge > > Cc: bioconductor > > Subject: Re: [BioC] question about topTable ranking of limma > > > > Sorry for the incomplete message, click the send accidentally. > > > > This analysis is for only one group of 4 biological replicates such as: > > group > > array1 a > > array2 a > > array3 a > > array4 a > > > > I tried to identify the genes which are differently expressed in group a, > > but no other reference groups for comparison. Therefore, even all the t > > statistics are positive. > > > > Any thoughts? Thanks. > > > > > > > > On Tue, Sep 15, 2009 at 11:13 AM, zrl <zrl1974@gmail.com> wrote: > > > > > Hi Heidi, > > > > > > Thank you for your response. Maybe I didn't make my question very > clear. > > > This analysis is for only one group of 4 biological replicates such as: > > > group > > > array1 > > > > > > > > > > > > > > > On Tue, Sep 15, 2009 at 4:20 AM, Heidi Dvinge <heidi@ebi.ac.uk> wrote: > > > > > >> Hello, > > >> you can just sort the topTable result by the t-statistics since these > > will > > >> be either positive or negative, or call it directly with sort.by="t" > > and > > >> then filter for significant p-values. > > >> > > >> HTH > > >> \Heidi > > >> > > >> On 15 Sep 2009, at 10:05, zrl wrote: > > >> > > >> Dear List, > > >> > > >> I have several biological replicates affy arrayes (a simple one group > 4 > > >> arrayes), and tried to use eBayes to get the differentially expressed > > >> genes. > > >> The topTable ranked the genes by B statistics, which mixed over- > > expressed > > >> genes and under-expressed genes. My question is how I should separate > > the > > >> over and under expressed genes from topTable results. My idea is to > > >> calculate the mean average expressed value/intensities (extracted from > > >> topTable results with using the number of all the genes) and compare > > >> ranked > > >> genes with the mean value, if the expressed value is greater than the > > >> mean, > > >> I take this gene as over-expressed, otherwise, it's under- expressed. > > >> Since I don't know the underlying implement of topTable or eBayes, I > > want > > >> to > > >> make sure if my method is right. Or you have some better ideas. > Thanks. > > >> > > >> [[alternative HTML version deleted]] > > >> > > >> _______________________________________________ > > >> Bioconductor mailing list > > >> Bioconductor@stat.math.ethz.ch > > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > > >> Search the archives: > > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > >> > > >> > > >> > > > > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > The University of Aberdeen is a charity registered in Scotland, No > SC013683. > [[alternative HTML version deleted]]
ADD REPLYlink written 9.9 years ago by Roger Liu260
Answer: question about topTable ranking of limma
0
gravatar for Jenny Drnevich
9.9 years ago by
Jenny Drnevich1.9k
United States
Jenny Drnevich1.9k wrote:
Hi, It's not necessarily an unfair question to ask "which genes have high expression (i.e, many mRNAs) and which genes have low expression in this treatment?" However, you cannot get a quantitative answer to this using microarrays, because the expression values between different genes are NOT directly comparable. Different probe sequences have different binding efficiencies (among other biases) such that the same number of mRNA copies of one gene may not lead to the same measured fluorescence value as the same number of mRNA copies as another gene. You are also confused as to what value is measured on a single-color array versus a two-color array. A single color array measures the fluorescence value for that probe in that sample, whereas a two-color array log ratio value is the ratio of fluorescence values for that probe between samples. In your example, the log ratios are measuring the ratio of mutant to reference FOR THAT PARTICULAR SPOT, not the ratio of the mutant value of that spot to the average of the mutant value of all other spots. I think you could do some sort of qualitative assessment of expression level, because a genes with a log2 expression value lower than 5 almost certainly have fewer mRNA copies than genes with log2 expression values over say, 10. However, you cannot do any sort of statistical test because the fluorescence values are not directly comparable between genes. And finally, in the contest of microarray experiments, "differential expression" almost universally means differences in levels of ONE gene between TWO groups. HTH, Jenny At 01:17 PM 9/15/2009, zrl wrote: >Claus, > >Thank you for your response. However, at some points, I don't agree with >you. >The differentially expressed genes for just one group, I mean the genes >whose average expression levels across the biological replicates (here 4 >replicates) are over/under the grand mean expression value. I think it's >similar as the analysis of identifying the differently expressed genes in >experiment of 4 replicates with two color arrays (Cy5 Mut, Cy3 Ref), which >you got single log ratios for each gene across 4 biological replicates. For >my design, I just measure the absolute expression value(single channel >intensity). >Therefore, when I fit the limma model, it actully evaluate the average >expression level(intensity) for each gene across the replicates. >Of course I may just rank the average intensities from high to low and >compare them with mean to get the idea of differently expressed genes. But I >believe limma can do better job, since I want not only ranking but also >significant level. If the variability is hight among the replicates, the >expression level for this gene maybe not reliable even the average is high >for this gene. I just try to figure out a way to separate the over/under >expressed values. > >If I was wrong, please let me know. Thank you. > > > >On Tue, Sep 15, 2009 at 12:55 PM, Mayer, Claus-Dieter ><c.mayer at="" abdn.ac.uk="">wrote: > > > Hi, > > > > With only one group you can not speak of "differentially expressed" and > > testing, as that assumes that you have at least two different groups or > > conditions. The test that you have performed probably just compares gene > > expression to zero (a moderated one-sample t-test) and for that you would > > expect all genes to be significant. > > > > What you (I am guessing) probably mean by "differentially expressed" is > > that you are interested to find genes that vary highly between your 4 > > replicates. To find those the best you can do is to rank the genes with > > respect to their variances/standard deviations. But you can't get a p-value > > for this, because (unless all values are identical) any gene will have a > > variance that is significantly higher than 0. > > > > Best Wishes > > > > Claus > > > > > -----Original Message----- > > > From: bioconductor-bounces at stat.math.ethz.ch [mailto:bioconductor- > > > bounces at stat.math.ethz.ch] On Behalf Of zrl > > > Sent: 15 September 2009 17:18 > > > To: Heidi Dvinge > > > Cc: bioconductor > > > Subject: Re: [BioC] question about topTable ranking of limma > > > > > > Sorry for the incomplete message, click the send accidentally. > > > > > > This analysis is for only one group of 4 biological replicates such as: > > > group > > > array1 a > > > array2 a > > > array3 a > > > array4 a > > > > > > I tried to identify the genes which are differently expressed in group a, > > > but no other reference groups for comparison. Therefore, even all the t > > > statistics are positive. > > > > > > Any thoughts? Thanks. > > > > > > > > > > > > On Tue, Sep 15, 2009 at 11:13 AM, zrl <zrl1974 at="" gmail.com=""> wrote: > > > > > > > Hi Heidi, > > > > > > > > Thank you for your response. Maybe I didn't make my question very > > clear. > > > > This analysis is for only one group of 4 biological replicates such as: > > > > group > > > > array1 > > > > > > > > > > > > > > > > > > > > On Tue, Sep 15, 2009 at 4:20 AM, Heidi Dvinge <heidi at="" ebi.ac.uk=""> wrote: > > > > > > > >> Hello, > > > >> you can just sort the topTable result by the t-statistics since these > > > will > > > >> be either positive or negative, or call it directly with sort.by="t" > > > and > > > >> then filter for significant p-values. > > > >> > > > >> HTH > > > >> \Heidi > > > >> > > > >> On 15 Sep 2009, at 10:05, zrl wrote: > > > >> > > > >> Dear List, > > > >> > > > >> I have several biological replicates affy arrayes (a simple one group > > 4 > > > >> arrayes), and tried to use eBayes to get the differentially expressed > > > >> genes. > > > >> The topTable ranked the genes by B statistics, which mixed over- > > > expressed > > > >> genes and under-expressed genes. My question is how I should separate > > > the > > > >> over and under expressed genes from topTable results. My idea is to > > > >> calculate the mean average expressed value/intensities (extracted from > > > >> topTable results with using the number of all the genes) and compare > > > >> ranked > > > >> genes with the mean value, if the expressed value is greater than the > > > >> mean, > > > >> I take this gene as over-expressed, otherwise, it's under- expressed. > > > >> Since I don't know the underlying implement of topTable or eBayes, I > > > want > > > >> to > > > >> make sure if my method is right. Or you have some better ideas. > > Thanks. > > > >> > > > >> [[alternative HTML version deleted]] > > > >> > > > >> _______________________________________________ > > > >> Bioconductor mailing list > > > >> Bioconductor at stat.math.ethz.ch > > > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > > > >> Search the archives: > > > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > >> > > > >> > > > >> > > > > > > > > > > [[alternative HTML version deleted]] > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor at stat.math.ethz.ch > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: > > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > The University of Aberdeen is a charity registered in Scotland, No > > SC013683. > > > > [[alternative HTML version deleted]] > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at illinois.edu
ADD COMMENTlink written 9.9 years ago by Jenny Drnevich1.9k
Hi Jenny, I agree that the expression values between different genes are NOT directly comparable. That's why I said, a single gene's expression value is compared with the grand mean of all the genes expression value. I know for single channel it measures the fluorescence value/intensity of that probe, and two color array measure the ratio of that probe. What I wanted to point out is that when we try to fit a limma model, in these two cases (single channel, two channel-ref and mut), we all use the same model matrix. (1,1,1,1). Therefore, in fact we are fitting the same model. Just don't know how to interpret topTable results to separate the over/under expressed genes. Thank you. ZRL On Tue, Sep 15, 2009 at 2:14 PM, Jenny Drnevich <drnevich@illinois.edu>wrote: > Hi, > > It's not necessarily an unfair question to ask "which genes have high > expression (i.e, many mRNAs) and which genes have low expression in this > treatment?" However, you cannot get a quantitative answer to this using > microarrays, because the expression values between different genes are NOT > directly comparable. Different probe sequences have different binding > efficiencies (among other biases) such that the same number of mRNA copies > of one gene may not lead to the same measured fluorescence value as the same > number of mRNA copies as another gene. > > You are also confused as to what value is measured on a single-color array > versus a two-color array. A single color array measures the fluorescence > value for that probe in that sample, whereas a two-color array log ratio > value is the ratio of fluorescence values for that probe between samples. In > your example, the log ratios are measuring the ratio of mutant to reference > FOR THAT PARTICULAR SPOT, not the ratio of the mutant value of that spot to > the average of the mutant value of all other spots. > > I think you could do some sort of qualitative assessment of expression > level, because a genes with a log2 expression value lower than 5 almost > certainly have fewer mRNA copies than genes with log2 expression values over > say, 10. However, you cannot do any sort of statistical test because the > fluorescence values are not directly comparable between genes. And finally, > in the contest of microarray experiments, "differential expression" almost > universally means differences in levels of ONE gene between TWO groups. > > HTH, > Jenny > > > At 01:17 PM 9/15/2009, zrl wrote: > >> Claus, >> >> Thank you for your response. However, at some points, I don't agree with >> you. >> The differentially expressed genes for just one group, I mean the genes >> whose average expression levels across the biological replicates (here 4 >> replicates) are over/under the grand mean expression value. I think it's >> similar as the analysis of identifying the differently expressed genes in >> experiment of 4 replicates with two color arrays (Cy5 Mut, Cy3 Ref), which >> you got single log ratios for each gene across 4 biological replicates. >> For >> my design, I just measure the absolute expression value(single channel >> intensity). >> Therefore, when I fit the limma model, it actully evaluate the average >> expression level(intensity) for each gene across the replicates. >> Of course I may just rank the average intensities from high to low and >> compare them with mean to get the idea of differently expressed genes. But >> I >> believe limma can do better job, since I want not only ranking but also >> significant level. If the variability is hight among the replicates, the >> expression level for this gene maybe not reliable even the average is high >> for this gene. I just try to figure out a way to separate the over/under >> expressed values. >> >> If I was wrong, please let me know. Thank you. >> >> >> >> On Tue, Sep 15, 2009 at 12:55 PM, Mayer, Claus-Dieter <c.mayer@abdn.ac.uk>> >wrote: >> >> > Hi, >> > >> > With only one group you can not speak of "differentially expressed" and >> > testing, as that assumes that you have at least two different groups or >> > conditions. The test that you have performed probably just compares gene >> > expression to zero (a moderated one-sample t-test) and for that you >> would >> > expect all genes to be significant. >> > >> > What you (I am guessing) probably mean by "differentially expressed" is >> > that you are interested to find genes that vary highly between your 4 >> > replicates. To find those the best you can do is to rank the genes with >> > respect to their variances/standard deviations. But you can't get a >> p-value >> > for this, because (unless all values are identical) any gene will have a >> > variance that is significantly higher than 0. >> > >> > Best Wishes >> > >> > Claus >> > >> > > -----Original Message----- >> > > From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor- >> > > bounces@stat.math.ethz.ch] On Behalf Of zrl >> > > Sent: 15 September 2009 17:18 >> > > To: Heidi Dvinge >> > > Cc: bioconductor >> > > Subject: Re: [BioC] question about topTable ranking of limma >> > > >> > > Sorry for the incomplete message, click the send accidentally. >> > > >> > > This analysis is for only one group of 4 biological replicates such >> as: >> > > group >> > > array1 a >> > > array2 a >> > > array3 a >> > > array4 a >> > > >> > > I tried to identify the genes which are differently expressed in group >> a, >> > > but no other reference groups for comparison. Therefore, even all the >> t >> > > statistics are positive. >> > > >> > > Any thoughts? Thanks. >> > > >> > > >> > > >> > > On Tue, Sep 15, 2009 at 11:13 AM, zrl <zrl1974@gmail.com> wrote: >> > > >> > > > Hi Heidi, >> > > > >> > > > Thank you for your response. Maybe I didn't make my question very >> > clear. >> > > > This analysis is for only one group of 4 biological replicates such >> as: >> > > > group >> > > > array1 >> > > > >> > > > >> > > > >> > > > >> > > > On Tue, Sep 15, 2009 at 4:20 AM, Heidi Dvinge <heidi@ebi.ac.uk> >> wrote: >> > > > >> > > >> Hello, >> > > >> you can just sort the topTable result by the t-statistics since >> these >> > > will >> > > >> be either positive or negative, or call it directly with sort.by >> ="t" >> > > and >> > > >> then filter for significant p-values. >> > > >> >> > > >> HTH >> > > >> \Heidi >> > > >> >> > > >> On 15 Sep 2009, at 10:05, zrl wrote: >> > > >> >> > > >> Dear List, >> > > >> >> > > >> I have several biological replicates affy arrayes (a simple one >> group >> > 4 >> > > >> arrayes), and tried to use eBayes to get the differentially >> expressed >> > > >> genes. >> > > >> The topTable ranked the genes by B statistics, which mixed over- >> > > expressed >> > > >> genes and under-expressed genes. My question is how I should >> separate >> > > the >> > > >> over and under expressed genes from topTable results. My idea is to >> > > >> calculate the mean average expressed value/intensities (extracted >> from >> > > >> topTable results with using the number of all the genes) and >> compare >> > > >> ranked >> > > >> genes with the mean value, if the expressed value is greater than >> the >> > > >> mean, >> > > >> I take this gene as over-expressed, otherwise, it's >> under-expressed. >> > > >> Since I don't know the underlying implement of topTable or eBayes, >> I >> > > want >> > > >> to >> > > >> make sure if my method is right. Or you have some better ideas. >> > Thanks. >> > > >> >> > > >> [[alternative HTML version deleted]] >> > > >> >> > > >> _______________________________________________ >> > > >> Bioconductor mailing list >> > > >> Bioconductor@stat.math.ethz.ch >> > > >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> > > >> Search the archives: >> > > >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > >> >> > > >> >> > > >> >> > > > >> > > >> > > [[alternative HTML version deleted]] >> > > >> > > _______________________________________________ >> > > Bioconductor mailing list >> > > Bioconductor@stat.math.ethz.ch >> > > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > > Search the archives: >> > > http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >> > >> > The University of Aberdeen is a charity registered in Scotland, No >> > SC013683. >> > >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > Jenny Drnevich, Ph.D. > > Functional Genomics Bioinformatics Specialist > W.M. Keck Center for Comparative and Functional Genomics > Roy J. Carver Biotechnology Center > University of Illinois, Urbana-Champaign > > 330 ERML > 1201 W. Gregory Dr. > Urbana, IL 61801 > USA > > ph: 217-244-7355 > fax: 217-265-5066 > e-mail: drnevich@illinois.edu > [[alternative HTML version deleted]]
ADD REPLYlink written 9.9 years ago by Roger Liu260
Answer: question about topTable ranking of limma
0
gravatar for Jenny Drnevich
9.9 years ago by
Jenny Drnevich1.9k
United States
Jenny Drnevich1.9k wrote:
Hi ZRL, While you are using the same model matrix, the data going into the model are not the same, so in a sense you are not fitting the same model. The two-color array data are centered on zero whereas the single-color array data are all positive. A difference between two groups is already incorporated into the two-color array data, whereas there is no difference in the single-color data. Also, if you agree that expression values between different genes are not directly comparable, why do you think a single gene's expression value is comparable to the grand mean of all arrays?!? A single gene's value could be lower than the grand mean, yet that gene COULD have more mRNA copies than most other genes with values near the grand mean! If you are ABSOLUTELY, POSITIVELY set on doing your analysis in a way that many of us on the list consider faulty, then subtract the grand mean from all of your expression values before fitting the model. This will give you values akin to two-color arrays that incorporate a difference in expression levels. Jenny At 02:29 PM 9/15/2009, zrl wrote: >Hi Jenny, > >I agree that the expression values between different genes are NOT >directly comparable. That's why I said, a single gene's expression >value is compared with the grand mean of all the genes expression value. > >I know for single channel it measures the fluorescence >value/intensity of that probe, and two color array measure the ratio >of that probe. What I wanted to point out is that when we try to fit >a limma model, in these two cases (single channel, two channel-ref >and mut), we all use the same model matrix. (1,1,1,1). Therefore, in >fact we are fitting the same model. Just don't know how to interpret >topTable results to separate the over/under expressed genes. > >Thank you. > >ZRL > > > > >On Tue, Sep 15, 2009 at 2:14 PM, Jenny Drnevich ><<mailto:drnevich@illinois.edu>drnevich@illinois.edu> wrote: >Hi, > >It's not necessarily an unfair question to ask "which genes have >high expression (i.e, many mRNAs) and which genes have low >expression in this treatment?" However, you cannot get a >quantitative answer to this using microarrays, because the >expression values between different genes are NOT directly >comparable. Different probe sequences have different binding >efficiencies (among other biases) such that the same number of mRNA >copies of one gene may not lead to the same measured fluorescence >value as the same number of mRNA copies as another gene. > >You are also confused as to what value is measured on a single-color >array versus a two-color array. A single color array measures the >fluorescence value for that probe in that sample, whereas a >two-color array log ratio value is the ratio of fluorescence values >for that probe between samples. In your example, the log ratios are >measuring the ratio of mutant to reference FOR THAT PARTICULAR SPOT, >not the ratio of the mutant value of that spot to the average of the >mutant value of all other spots. > >I think you could do some sort of qualitative assessment of >expression level, because a genes with a log2 expression value lower >than 5 almost certainly have fewer mRNA copies than genes with log2 >expression values over say, 10. However, you cannot do any sort of >statistical test because the fluorescence values are not directly >comparable between genes. And finally, in the contest of microarray >experiments, "differential expression" almost universally means >differences in levels of ONE gene between TWO groups. > >HTH, >Jenny > > >At 01:17 PM 9/15/2009, zrl wrote: >Claus, > >Thank you for your response. However, at some points, I don't agree with >you. >The differentially expressed genes for just one group, I mean the genes >whose average expression levels across the biological replicates (here 4 >replicates) are over/under the grand mean expression value. I think it's >similar as the analysis of identifying the differently expressed genes in >experiment of 4 replicates with two color arrays (Cy5 Mut, Cy3 Ref), which >you got single log ratios for each gene across 4 biological replicates. For >my design, I just measure the absolute expression value(single channel >intensity). >Therefore, when I fit the limma model, it actully evaluate the average >expression level(intensity) for each gene across the replicates. >Of course I may just rank the average intensities from high to low and >compare them with mean to get the idea of differently expressed genes. But I >believe limma can do better job, since I want not only ranking but also >significant level. If the variability is hight among the replicates, the >expression level for this gene maybe not reliable even the average is high >for this gene. I just try to figure out a way to separate the over/under >expressed values. > >If I was wrong, please let me know. Thank you. > > > >On Tue, Sep 15, 2009 at 12:55 PM, Mayer, Claus-Dieter ><<mailto:c.mayer@abdn.ac.uk>c.mayer@abdn.ac.uk>wrote: > > > Hi, > > > > With only one group you can not speak of "differentially expressed" and > > testing, as that assumes that you have at least two different groups or > > conditions. The test that you have performed probably just compares gene > > expression to zero (a moderated one-sample t-test) and for that you would > > expect all genes to be significant. > > > > What you (I am guessing) probably mean by "differentially expressed" is > > that you are interested to find genes that vary highly between your 4 > > replicates. To find those the best you can do is to rank the genes with > > respect to their variances/standard deviations. But you can't get a p-value > > for this, because (unless all values are identical) any gene will have a > > variance that is significantly higher than 0. > > > > Best Wishes > > > > Claus > > > > > -----Original Message----- > > > From: > <mailto:bioconductor-bounces@stat.math.ethz.ch>bioconductor- bounces@stat.math.ethz.ch > [mailto:<mailto:bioconductor->bioconductor- > > > <mailto:bounces@stat.math.ethz.ch>bounces@stat.math.ethz.ch] On > Behalf Of zrl > > > Sent: 15 September 2009 17:18 > > > To: Heidi Dvinge > > > Cc: bioconductor > > > Subject: Re: [BioC] question about topTable ranking of limma > > > > > > Sorry for the incomplete message, click the send accidentally. > > > > > > This analysis is for only one group of 4 biological replicates such as: > > > group > > > array1 a > > > array2 a > > > array3 a > > > array4 a > > > > > > I tried to identify the genes which are differently expressed in group a, > > > but no other reference groups for comparison. Therefore, even all the t > > > statistics are positive. > > > > > > Any thoughts? Thanks. > > > > > > > > > > > > On Tue, Sep 15, 2009 at 11:13 AM, zrl > <<mailto:zrl1974@gmail.com>zrl1974@gmail.com> wrote: > > > > > > > Hi Heidi, > > > > > > > > Thank you for your response. Maybe I didn't make my question very > > clear. > > > > This analysis is for only one group of 4 biological replicates such as: > > > > group > > > > array1 > > > > > > > > > > > > > > > > > > > > On Tue, Sep 15, 2009 at 4:20 AM, Heidi Dvinge > <<mailto:heidi@ebi.ac.uk>heidi@ebi.ac.uk> wrote: > > > > > > > >> Hello, > > > >> you can just sort the topTable result by the t-statistics since these > > > will > > > >> be either positive or negative, or call it directly with > <http: sort.by="">sort.by="t" > > > and > > > >> then filter for significant p-values. > > > >> > > > >> HTH > > > >> \Heidi > > > >> > > > >> On 15 Sep 2009, at 10:05, zrl wrote: > > > >> > > > >> Dear List, > > > >> > > > >> I have several biological replicates affy arrayes (a simple one group > > 4 > > > >> arrayes), and tried to use eBayes to get the differentially expressed > > > >> genes. > > > >> The topTable ranked the genes by B statistics, which mixed over- > > > expressed > > > >> genes and under-expressed genes. My question is how I should separate > > > the > > > >> over and under expressed genes from topTable results. My idea is to > > > >> calculate the mean average expressed value/intensities (extracted from > > > >> topTable results with using the number of all the genes) and compare > > > >> ranked > > > >> genes with the mean value, if the expressed value is greater than the > > > >> mean, > > > >> I take this gene as over-expressed, otherwise, it's under- expressed. > > > >> Since I don't know the underlying implement of topTable or eBayes, I > > > want > > > >> to > > > >> make sure if my method is right. Or you have some better ideas. > > Thanks. > > > >> > > > >> [[alternative HTML version deleted]] > > > >> > > > >> _______________________________________________ > > > >> Bioconductor mailing list > > > >> <mailto:bioconductor@stat.math.ethz.ch>Bioconductor@stat.math.ethz.ch > > > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > > > >> Search the archives: > > > >> > <http: news.gmane.org="" gmane.science.biology.informatics.conductor="">h ttp://news.gmane.org/gmane.science.biology.informatics.conductor > > > >> > > > >> > > > >> > > > > > > > > > > [[alternative HTML version deleted]] > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > <mailto:bioconductor@stat.math.ethz.ch>Bioconductor@stat.math.ethz.ch > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: > > > > <http: news.gmane.org="" gmane.science.biology.informatics.conductor="">h ttp://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > The University of Aberdeen is a charity registered in Scotland, No > > SC013683. > > > > [[alternative HTML version deleted]] > >_______________________________________________ >Bioconductor mailing list ><mailto:bioconductor@stat.math.ethz.ch>Bioconductor@stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: ><http: news.gmane.org="" gmane.science.biology.informatics.conductor="">ht tp://news.gmane.org/gmane.science.biology.informatics.conductor > > >Jenny Drnevich, Ph.D. > >Functional Genomics Bioinformatics Specialist >W.M. Keck Center for Comparative and Functional Genomics >Roy J. Carver Biotechnology Center >University of Illinois, Urbana-Champaign > >330 ERML >1201 W. Gregory Dr. >Urbana, IL 61801 >USA > >ph: 217-244-7355 >fax: 217-265-5066 >e-mail: <mailto:drnevich@illinois.edu>drnevich@illinois.edu > Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich@illinois.edu [[alternative HTML version deleted]]
ADD COMMENTlink written 9.9 years ago by Jenny Drnevich1.9k
Jenny, Thank you for your detailed explanation. In fact, maybe the word "differentially expressed genes" I used is so confusing. Under the setting of my experiments, I would like to have the ranked genes list, just with the biological replicates I have, i.e. no references arrays. So I have to use grand mean centered method. therefore, I may say some gene's expression value is higher than the mean someone is lower, which is just a way to rank the genes. Thank you for your input. -ZRL On Tue, Sep 15, 2009 at 3:51 PM, Jenny Drnevich <drnevich@illinois.edu>wrote: > Hi ZRL, > > While you are using the same model matrix, the data going into the model > are not the same, so in a sense you are not fitting the same model. The > two-color array data are centered on zero whereas the single-color array > data are all positive. A difference between two groups is already > incorporated into the two-color array data, whereas there is no difference > in the single-color data. > > Also, if you agree that expression values between different genes are not > directly comparable, why do you think a single gene's expression value is > comparable to the grand mean of all arrays?!? A single gene's value could > be lower than the grand mean, yet that gene COULD have more mRNA copies than > most other genes with values near the grand mean! > > If you are ABSOLUTELY, POSITIVELY set on doing your analysis in a way that > many of us on the list consider faulty, then subtract the grand mean from > all of your expression values before fitting the model. This will give you > values akin to two-color arrays that incorporate a difference in expression > levels. > > Jenny > > > At 02:29 PM 9/15/2009, zrl wrote: > > Hi Jenny, > > I agree that the expression values between different genes are NOT directly > comparable. That's why I said, a single gene's expression value is compared > with the grand mean of all the genes expression value. > > I know for single channel it measures the fluorescence value/intensity of > that probe, and two color array measure the ratio of that probe. What I > wanted to point out is that when we try to fit a limma model, in these two > cases (single channel, two channel-ref and mut), we all use the same model > matrix. (1,1,1,1). Therefore, in fact we are fitting the same model. Just > don't know how to interpret topTable results to separate the over/under > expressed genes. > > Thank you. > > ZRL > > > > > On Tue, Sep 15, 2009 at 2:14 PM, Jenny Drnevich <drnevich@illinois.edu> > wrote: > Hi, > > It's not necessarily an unfair question to ask "which genes have high > expression (i.e, many mRNAs) and which genes have low expression in this > treatment?" However, you cannot get a quantitative answer to this using > microarrays, because the expression values between different genes are NOT > directly comparable. Different probe sequences have different binding > efficiencies (among other biases) such that the same number of mRNA copies > of one gene may not lead to the same measured fluorescence value as the same > number of mRNA copies as another gene. > > You are also confused as to what value is measured on a single-color array > versus a two-color array. A single color array measures the fluorescence > value for that probe in that sample, whereas a two-color array log ratio > value is the ratio of fluorescence values for that probe between samples. In > your example, the log ratios are measuring the ratio of mutant to reference > FOR THAT PARTICULAR SPOT, not the ratio of the mutant value of that spot to > the average of the mutant value of all other spots. > > I think you could do some sort of qualitative assessment of expression > level, because a genes with a log2 expression value lower than 5 almost > certainly have fewer mRNA copies than genes with log2 expression values over > say, 10. However, you cannot do any sort of statistical test because the > fluorescence values are not directly comparable between genes. And finally, > in the contest of microarray experiments, "differential expression" almost > universally means differences in levels of ONE gene between TWO groups. > > HTH, > Jenny > > > At 01:17 PM 9/15/2009, zrl wrote: > Claus, > > Thank you for your response. However, at some points, I don't agree with > you. > The differentially expressed genes for just one group, I mean the genes > whose average expression levels across the biological replicates (here 4 > replicates) are over/under the grand mean expression value. I think it's > similar as the analysis of identifying the differently expressed genes in > experiment of 4 replicates with two color arrays (Cy5 Mut, Cy3 Ref), which > you got single log ratios for each gene across 4 biological replicates. For > my design, I just measure the absolute expression value(single channel > intensity). > Therefore, when I fit the limma model, it actully evaluate the average > expression level(intensity) for each gene across the replicates. > Of course I may just rank the average intensities from high to low and > compare them with mean to get the idea of differently expressed genes. But > I > believe limma can do better job, since I want not only ranking but also > significant level. If the variability is hight among the replicates, the > expression level for this gene maybe not reliable even the average is high > for this gene. I just try to figure out a way to separate the over/under > expressed values. > > If I was wrong, please let me know. Thank you. > > > > On Tue, Sep 15, 2009 at 12:55 PM, Mayer, Claus-Dieter <c.mayer@abdn.ac.uk>wrote: > > > Hi, > > > > With only one group you can not speak of "differentially expressed" and > > testing, as that assumes that you have at least two different groups or > > conditions. The test that you have performed probably just compares gene > > expression to zero (a moderated one-sample t-test) and for that you would > > expect all genes to be significant. > > > > What you (I am guessing) probably mean by "differentially expressed" is > > that you are interested to find genes that vary highly between your 4 > > replicates. To find those the best you can do is to rank the genes with > > respect to their variances/standard deviations. But you can't get a > p-value > > for this, because (unless all values are identical) any gene will have a > > variance that is significantly higher than 0. > > > > Best Wishes > > > > Claus > > > > > -----Original Message----- > > > From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor- > > > bounces@stat.math.ethz.ch] On Behalf Of zrl > > > Sent: 15 September 2009 17:18 > > > To: Heidi Dvinge > > > Cc: bioconductor > > > Subject: Re: [BioC] question about topTable ranking of limma > > > > > > Sorry for the incomplete message, click the send accidentally. > > > > > > This analysis is for only one group of 4 biological replicates such as: > > > group > > > array1 a > > > array2 a > > > array3 a > > > array4 a > > > > > > I tried to identify the genes which are differently expressed in group > a, > > > but no other reference groups for comparison. Therefore, even all the t > > > statistics are positive. > > > > > > Any thoughts? Thanks. > > > > > > > > > > > > On Tue, Sep 15, 2009 at 11:13 AM, zrl <zrl1974@gmail.com> wrote: > > > > > > > Hi Heidi, > > > > > > > > Thank you for your response. Maybe I didn't make my question very > > clear. > > > > This analysis is for only one group of 4 biological replicates such > as: > > > > group > > > > array1 > > > > > > > > > > > > > > > > > > > > On Tue, Sep 15, 2009 at 4:20 AM, Heidi Dvinge <heidi@ebi.ac.uk> > wrote: > > > > > > > >> Hello, > > > >> you can just sort the topTable result by the t-statistics since > these > > > will > > > >> be either positive or negative, or call it directly with sort.by > ="t" > > > and > > > >> then filter for significant p-values. > > > >> > > > >> HTH > > > >> \Heidi > > > >> > > > >> On 15 Sep 2009, at 10:05, zrl wrote: > > > >> > > > >> Dear List, > > > >> > > > >> I have several biological replicates affy arrayes (a simple one > group > > 4 > > > >> arrayes), and tried to use eBayes to get the differentially > expressed > > > >> genes. > > > >> The topTable ranked the genes by B statistics, which mixed over- > > > expressed > > > >> genes and under-expressed genes. My question is how I should > separate > > > the > > > >> over and under expressed genes from topTable results. My idea is to > > > >> calculate the mean average expressed value/intensities (extracted > from > > > >> topTable results with using the number of all the genes) and compare > > > >> ranked > > > >> genes with the mean value, if the expressed value is greater than > the > > > >> mean, > > > >> I take this gene as over-expressed, otherwise, it's under- expressed. > > > >> Since I don't know the underlying implement of topTable or eBayes, I > > > want > > > >> to > > > >> make sure if my method is right. Or you have some better ideas. > > Thanks. > > > >> > > > >> [[alternative HTML version deleted]] > > > >> > > > >> _______________________________________________ > > > >> Bioconductor mailing list > > > >> Bioconductor@stat.math.ethz.ch > > > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > > > >> Search the archives: > > > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > >> > > > >> > > > >> > > > > > > > > > > [[alternative HTML version deleted]] > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor@stat.math.ethz.ch > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: > > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > The University of Aberdeen is a charity registered in Scotland, No > > SC013683. > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > Jenny Drnevich, Ph.D. > > Functional Genomics Bioinformatics Specialist > W.M. Keck Center for Comparative and Functional Genomics > Roy J. Carver Biotechnology Center > University of Illinois, Urbana-Champaign > > 330 ERML > 1201 W. Gregory Dr. > Urbana, IL 61801 > USA > > ph: 217-244-7355 > fax: 217-265-5066 > e-mail: drnevich@illinois.edu > > Jenny Drnevich, Ph.D. > > Functional Genomics Bioinformatics Specialist > W.M. Keck Center for Comparative and Functional Genomics > Roy J. Carver Biotechnology Center > University of Illinois, Urbana-Champaign > > 330 ERML > 1201 W. Gregory Dr. > Urbana, IL 61801 > USA > > ph: 217-244-7355 > fax: 217-265-5066 > e-mail: drnevich@illinois.edu > [[alternative HTML version deleted]]
ADD REPLYlink written 9.9 years ago by Roger Liu260
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 517 users visited in the last hour