Question

Limma B-statistics

0

Entering edit mode

Brian Lane ▴ 20

@brian-lane-976

Last seen 11.3 years ago

Hi, I need some help with the interpretation of B statistics generated by eBayes in the limma package. I want to compare gene expression in three groups of Affy samples. The probe level data was generated from .cel files (ReadAffy()), an exprSet object was generated using mas5 (scaled to 100) and a linear model fitted to the data using a design based on the three groups (6, 5, and 5 samples in each group, respectively). I have then made 3 contrasts to cover all possible comparisons within the data set, and generated empirical Bayes statistics using eBayes. I've then used classifyTestsF to classify each gene according to the contrasts. The results of all this are 23 significantly differentially expressed genes. The moderated t-values for all these 23 genes have p<0.01. However, all the B-values are <0 (average -3!). In fact, a volcano plot of log- odds and fold-change in the three contrasts show that all the B-values are negative. My understanding is that B<0 implies the gene is more likely to not be differentially expressed than to be differentially expressed. If this is the case, should I take the "significant genes" seriously? If not, is there any reason why the B-values should all be negative or does this simply reflect the fact that there is little evidence of differential expression in the data set as a whole? Regards, Brian Lane Dept of Haematology Liverpool University

affy limma affy limma • 2.9k views

ADD COMMENT • link updated 21.2 years ago by Gordon Smyth 53k • written 21.2 years ago by Brian Lane ▴ 20

score 0 · Answer 1 · 2004-10-23

> Date: Fri, 22 Oct 2004 11:06:15 +0100 > From: Brian Lane <bsl8096@liverpool.ac.uk> > Subject: [BioC] Limma B-statistics > To: bioconductor@stat.math.ethz.ch > Message-ID: <e12c817ec5c08fd1b9bf7823@182105-93607r.liv.ac.uk> > Content-Type: text/plain; charset=us-ascii; format=flowed > > Hi, > I need some help with the interpretation of B statistics generated by > eBayes in the limma package. > > I want to compare gene expression in three groups of Affy samples. The > probe level data was generated from .cel files (ReadAffy()), an exprSet > object was generated using mas5 (scaled to 100) and a linear model fitted > to the data using a design based on the three groups (6, 5, and 5 samples > in each group, respectively). I have then made 3 contrasts to cover all > possible comparisons within the data set, and generated empirical Bayes > statistics using eBayes. I've then used classifyTestsF to classify each > gene according to the contrasts. > > The results of all this are 23 significantly differentially expressed > genes. The moderated t-values for all these 23 genes have p<0.01. However, > all the B-values are <0 (average -3!). In fact, a volcano plot of log-odds > and fold-change in the three contrasts show that all the B-values are > negative. > > My understanding is that B<0 implies the gene is more likely to not be > differentially expressed than to be differentially expressed. If this is > the case, should I take the "significant genes" seriously? If not, is there > any reason why the B-values should all be negative or does this simply > reflect the fact that there is little evidence of differential expression > in the data set as a whole? Yes, this is supposed to indicate little evidence of differential expression. I think the problem is likely to be that you have used classifyTestsF() without any adjustment for multiple testing. Please note that classifyTestsF() does not adjust for multiple testing across probe sets. You are supposed to compute a low p-value yourself (lower than 0.01!) to give classifyTestsF() which reflects the number of probe sets. See the Ecoli case study in the User's Guide for example. I have found that that this aspect of classifyTestsF() is often mis- understood, so I recommend that you switch to decideTests() in the newer version of limma instead of classifyTestsF(). You might find the section "Statistics for Differential Expression" in the User's Guide helpful. Gordon > Regards, > Brian Lane > Dept of Haematology > Liverpool University

score 0 · Answer 2 · 2004-10-25

I have had the same problem as described below, but have not applied the functions classifyTestsF() or decideTests() Here are my commands. They are almost exactly as in the the limma- tutorial festuca.norm0 is an object of class marrayNorm. The experiment is a dye.swop experiment with 2 arrays (one with each labelling). There are two different types of samples on the array, and the goal is to find the differential expressed genes. There are 5 replicateded spots of each gene on each array (556 genes in total). Only spots with printed genes are included in the analysis. gene is a logical vector for if a spot is a gene or not. f.cor<-duplicateCorrelation(maM(festuca.norm0)[gene,],design=c(1,-1),n dups=5) fit <- lmFit(festuca.norm0[gene,],design=c(1,-1),ndups=5,correlation=f .cor$cor) eb <- eBayes(fit) toptable(number = 25,genelist = gnames,fit = fit, eb = eb, adjust = "fdr") plot(fit$coef,eb$lods,xlab="Log2 Fold Change",ylab="Log Odds",pch=16,cex=0.2) I'm a beginner with R, Bioconductor (and microarrays), so I hope any answers will give simple explanations/comments ---------------------------------------------------------------------- -------- Ingunn Berget Agricultural University of Norway Department of Animal and Aquacultural Sciences Date: Fri, 22 Oct 2004 11:06:15 +0100 > From: Brian Lane <bsl8096 at="" liverpool.ac.uk=""> > Subject: [BioC] Limma B-statistics > To: bioconductor at stat.math.ethz.ch > Message-ID: <e12c817ec5c08fd1b9bf7823 at="" 182105-93607r.liv.ac.uk=""> > Content-Type: text/plain; charset=us-ascii; format=flowed > > Hi, > I need some help with the interpretation of B statistics generated by > eBayes in the limma package. > > I want to compare gene expression in three groups of Affy samples. The > probe level data was generated from .cel files (ReadAffy()), an exprSet > object was generated using mas5 (scaled to 100) and a linear model fitted > to the data using a design based on the three groups (6, 5, and 5 samples > in each group, respectively). I have then made 3 contrasts to cover all > possible comparisons within the data set, and generated empirical Bayes > statistics using eBayes. I've then used classifyTestsF to classify each > gene according to the contrasts. > > The results of all this are 23 significantly differentially expressed > genes. The moderated t-values for all these 23 genes have p<0.01. However, > all the B-values are <0 (average -3!). In fact, a volcano plot of log-odds > and fold-change in the three contrasts show that all the B-values are > negative. > > My understanding is that B<0 implies the gene is more likely to not be > differentially expressed than to be differentially expressed. If this is > the case, should I take the "significant genes" seriously? If not, is there > any reason why the B-values should all be negative or does this simply > reflect the fact that there is little evidence of differential expression > in the data set as a whole? Yes, this is supposed to indicate little evidence of differential expression. I think the problem is likely to be that you have used classifyTestsF() without any adjustment for multiple testing. Please note that classifyTestsF() does not adjust for multiple testing across probe sets. You are supposed to compute a low p-value yourself (lower than 0.01!) to give classifyTestsF() which reflects the number of probe sets. See the Ecoli case study in the User's Guide for example. I have found that that this aspect of classifyTestsF() is often mis- understood, so I recommend that you switch to decideTests() in the newer version of limma instead of classifyTestsF(). You might find the section "Statistics for Differential Expression" in the User's Guide helpful. Gordon > Regards, > Brian Lane > Dept of Haematology > Liverpool University ---------------------------------------------------------------------- ---------- [[alternative HTML version deleted]]

score 0 · Answer 3 · 2004-10-25

>Date: Mon, 25 Oct 2004 10:59:43 +0200 >From: "Ingunn Berget" <ingunn.berget@iha.nlh.no> >Subject: [BioC] Limma B-statistics >To: <bioconductor@stat.math.ethz.ch> >Message-ID: <005a01c4ba70$f82e1690$9fb12780@ihf4651> >Content-Type: text/plain > >I have had the same problem as described below, but have not applied the >functions classifyTestsF() or decideTests() > >Here are my commands. They are almost exactly as in the the limma- tutorial > >festuca.norm0 is an object of class marrayNorm. >The experiment is a dye.swop experiment with 2 arrays (one with each >labelling). There are two different types of samples on the array, and the >goal is to find the differential expressed genes. >There are 5 replicateded spots of each gene on each array (556 genes in >total). Only spots with printed genes are included in the analysis. >gene is a logical vector for if a spot is a gene or not. > >f.cor<-duplicateCorrelation(maM(festuca.norm0)[gene,],design=c(1,-1), ndups=5) >fit <- >lmFit(festuca.norm0[gene,],design=c(1,-1),ndups=5,correlation=f.cor$c or) >eb <- eBayes(fit) >toptable(number = 25,genelist = gnames,fit = fit, eb = eb, adjust = "fdr") >plot(fit$coef,eb$lods,xlab="Log2 Fold Change",ylab="Log Odds",pch=16,cex=0.2) >I'm a beginner with R, Bioconductor (and microarrays), so I hope any >answers will give simple explanations/comments You need to explain exactly why you think that there is a problem before we can help you. As far as we can tell, the commands you give here have worked correctly with no errors. If the problem is simply that you're not finding any significant differential expression, that is not in itself an indication of a software error! Gordon >--------------------------------------------------------------------- --------- >Ingunn Berget >Agricultural University of Norway >Department of Animal and Aquacultural Sciences