Question: How is LIMMA actually calculating the average expression value
1
6.5 years ago by
Gordon Smyth39k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth39k wrote:
probe limma gcrma • 3.2k views
modified 6.5 years ago by Garcia Orellana,Miriam150 • written 6.5 years ago by Gordon Smyth39k
Answer: How is LIMMA actually calculating the average expression value
0
6.5 years ago by
Garcia Orellana,Miriam150 wrote:
Dear Dr. Smyth. Thanks for your reply and let me apologize for probably miscalling the terms and/or not being clear enough when posting my question, microarrays data analysis is just a part of my study but definitely is giving me hard time to interpret my factorial with orthogonal contrast analysis. Before posting my previous question, I read the LIMMA user guide section 12.1 which states the the concept you are mentioning for AveEXpr in top table and you are right, the concept is clear with respect to the top table. However the average expression I mentioned is the one I got when using all the next code: rawData <- ReadAffy() esetgcrma <- gcrma(rawData) eset1 <- exprs(esetgcrma) gene.rm<-(strsplit2(rownames(eset1),"-")[,1]=="AFFX") eset2 <- eset1[!gene.rm,] GCRMA values per each array for probe: ? Bt.17364.1.A1_at 4367_SFA_HLA.CEL 4368_CTL_HLA.CEL 4381_SFA_LLA.CEL 4384_EFA_HLA.CEL 4387_EFA_HLA.CEL 4388_EFA_LLA.CEL 11.340 8.709 10.048 10.167 9.589 12.664 4394_CTL_HLA.CEL 4395_CTL_LLA.CEL 4396_SFA_HLA.CEL 4398_EFA_LLA.CEL 4399_SFA_LLA.CEL 4400_CTL_HLA.CEL 10.829 11.232 11.044 10.238 9.575 8.164 4402_EFA_HLA.CEL 4404_CTL_LLA.CEL 4409_EFA_LLA.CEL 4410_SFA_HLA.CEL 4413_CTL_LLA.CEL 4429_SFA_LLA.CEL 10.828 9.296 11.651 11.522 10.007 9.460 ############################################################### phenoDiet <- read.table("designDD_MR.txt",header=T,sep="\t") write.exprs(esetgcrma, file="affy_ALL.gcrma.xls") TS <- paste(phenoDiet$DD, phenoDiet$MR, sep=".") TS TS <- factor(TS, levels=c("Ctl.LLA", "Ctl.HLA","SFA.LLA","SFA.HLA","EFA.LLA", "EFA.HLA")) design <- model.matrix(~0+TS) colnames(design) <- levels(TS) fit <- lmFit(eset2, design, method="robust", maxit=1000000) efit <- eBayes(fit) ? ? ?? #Contrast results MatContrast=makeContrasts(FAT=(SFA.LLA + SFA.HLA + EFA.LLA + EFA.HLA)/4 - (Ctl.LLA + Ctl.HLA)/2, ? ? ? ? ? ? ? ? ? ? ? ? ? FA=(EFA.LLA + EFA.HLA)/2 - (SFA.LLA + SFA.HLA)/2, ? ? ? ? ? ? ? ? ? ? ? ? ? MR=(EFA.HLA+SFA.HLA+Ctl.HLA)/3 - (EFA.LLA+SFA.LLA+Ctl.LLA)/3, ? ? ? ? ? ? ? ? ? ? ? ? ? FATbyMR=((EFA.HLA+SFA.HLA)/2 - Ctl.HLA) - ((EFA.LLA+SFA.LLA)/2-Ctl.LLA),? ? ? ? ? ? ? ? ? ? ? ? ? ? FAbyMR=(EFA.HLA-SFA.HLA)-(EFA.LLA - SFA.LLA), ? ? ? ? ? ? ? ? ? ? ? ? ? levels=design) fitMat<-contrasts.fit(fit,MatContrast) fit2=eBayes(fitMat) top=24016 #commands for tables to save the table of results: This commands resulted in the top tables containing: **** I am listing an example for a specific probe***** TopContrastALL=topTable(fit2,coef=NULL,number=top) ? write.table(TopContrastALL,"All_logFC_contrasts.xls",sep="\t") ID FAT FA MR FATbyMR FAbyMR AveExpr F P.Value adj.P.Val Bt.17364.1.A1_at 1.3331 0.4183 -0.3757 1.3361 -3.0755 10.3534 14.5472 0.0001 0.0009 TopContrast1=topTable(fit2,coef=1,number=top) write.table(TopContrast1,"FAT_contrast.xls",sep="\t") ID logFC AveExpr t P.Value adj.P.Val B 5483 Bt.17364.1.A1_at 1.333 10.353 5.369 0.0002 0.0026 -2.337 TopContrast2=topTable(fit2,coef=2,number=top) write.table(TopContrast2,"FA_contrast.xls",sep="\t") ID logFC AveExpr t P.Value adj.P.Val B 5483 Bt.17364.1.A1_at 0.418 10.353 1.530 0.15 0.59 -9.04 TopContrast3=topTable(fit2,coef=3,number=top) write.table(TopContrast3,"MR_contrast.xls",sep="\t") TopContastr4=topTable(fit2,coef=4,number=top) write.table(TopContastr4,"FATbyMR_contrast.xls",sep="\t") TopContrast5=topTable(fit2,coef=5,number=top) write.table(TopContrast5,"FAbyMR_contrast.xls",sep="\t") #Tables for the values of F statistics summ.allcontrast=topTable(efit, coef=NULL,adjust.method="BH",number=top) ? ? # table of differentially expressed probesets write.table(summ.allcontrast,"Fstat_AveExpr_allTRT.xls",sep="\t") ID Ctl.LLA Ctl.HLA SFA.LLA SFA.HLA EFA.LLA EFA.HLA AveExpr F P.Value adj.P.Val Bt.17364.1.A1_at 10.007 8.741 9.694 11.302 11.651 10.182 10.353 1384.0 1.01E-16 1.30E-16 summ.FAT=topTable(efit, coef=1,adjust.method="BH",number=top) write.table(summ.FAT,"Fstat_FATseparate.xls",sep="\t") ID logFC AveExpr t P.Value adj.P.Val B 5483 Bt.17364.1.A1_at 10.007 10.353 34.372 1.24E-13 1.62E-13 18.270 summ.FA=topTable(efit, coef=2,adjust.method="BH",number=top) write.table(summ.FA,"Fstat_FAseparate.xls",sep="\t") ID logFC AveExpr t P.Value adj.P.Val B 5483 Bt.17364.1.A1_at 8.741 10.353 29.622 7.6E-13 9.4E-13 1.6E+01 summ.MR=topTable(efit, coef=3,adjust.method="BH",number=top) write.tablesumm.MR,"Fstat_MRseparate.xls",sep="\t") summ.FATbyMR=topTable(efit, coef=4,adjust.method="BH",number=top) write.table(summ.FAT,"Fstat_FATbyMRseparate.xls",sep="\t") summ.FAbyMR=topTable(efit, coef=5,adjust.method="BH",number=top) write.table(summ.FAbyMR,"Fstat_FAbyMRseparate.xls",sep="\t") ALSO,?Thanks to your recommendation, I review my code and rerun the data using 2 options of topeTable one with the fit2 command and other with the efit as posted with the example I am not sure how the average expression value as presented with the command summ.allcontrast=topTable(efit,coef=NULL,adjust.metho d="BH",number=top).? Calculating the AveExpr for each group of 3 arrays with the GCRMA log2 values, it give me quite similar results as that on the summary table for some of the six treatments but for others average expression valus simply do not match. In addition when running specific contrast as it is the FAT contrast when using efit or fit2, both approaches give me the same AveExpr value but log FC, T value, B, and P values. The use of fit2 give the FC values as if I were manually calculating with the AvExpr obtained in the later table above. So I think the right one table to use if that generate with the fit2, but so I am wonder what is the Ftable with efit calculating as log FC??? Thanks so much for all,? Regards, Miriam ******************************** Miriam Garcia, MS, PhD Department of Animal Sciences University of Florida ________________________________________ From: Gordon K Smyth [smyth@wehi.EDU.AU] Sent: Monday, June 10, 2013 8:36 PM To: Garcia Orellana,Miriam Cc: Bioconductor mailing list Subject: How is LIMMA actually calculating the average expression value Dear Miriam, > Date: Sun, 9 Jun 2013 10:28:41 +0000 > From: "Garcia Orellana,Miriam" <mgarciao at="" ufl.edu=""> > To: Bioconductor mailing list ?[bioconductor at r-project.org]? > <bioconductor at="" r-project.org=""> > Subject: [BioC] How is LIMMA actually calculating the average > expression value? > > Dear all: > I have found quite the same question being asked at least twice but I > still not have clear answer about how the ebayes method in LIMMA > calculate the average expression value for a given experimental group. The limma package does not and never has computed the expression level for individual experimental groups. The AveExpr columin is the average of all arrays (ie for all groups not for one group). That is clearly documented, or so it seems to me. The limma User's Guide gives in Section 13.1 a description of all quantities output by topTable. It says "The AveExpr column gives the average log2-expression level for that gene across all the arrays and channels in the experiment." That seems to be me to be completely unambiguous. What is unclear about it? The help page ?topTable says that AveExpr is "average log2-expression for the probe over all arrays and channels, same as Amean in the MarrayLM object" The help page ?"MArrayLM-class" says that Amean is a "numeric vector containing the average log-intensity for each probe over all the arrays in the original linear model fit. Note this vector does not change when a contrast is applied to the fit using contrasts.fit." Again this seem to me to be unambiguous. I've said the same thing in response to questions on this list several times. What is unclear? > I have used GCRMA to normalize my affimetrix values and then obtained > the log2 expression values as (values below do not necessarily > correspond to the same probe): > > 4395_CTL_LLA.CEL :7.89 > 4404_CTL_LLA.CEL: 8.21 > 4413_CTL_LLA.CEL: 8.07 > I have calculated by excel: > Simple mean = 8.055 > Geometric mean = 8.054 > Whereas the top table for average expression of these 3 values gave me: 8.055 There is no such thing as a "top table for average expression". Top tables are always for comparisons between groups. I have no idea what you are trying to do. Could you please read the documentation, and have a look at the posting guide? If you post again, please give the whole code leading to this output, and give expression for all arrays in your experiment, not just three. Best wishes Gordon > This values are quite the same regardless of calculation method, However > when more variability is among values the calculated average expression > differs differs quite largely: > 4368_CTL_HLA.CEL: 8.26 > 4394_CTL_HLA.CEL: 7.17 > 4400_CTL_HLA.CEL: 8.70 > Simple mean = 8.042 > Geometric mean = 8.015 > Whereas the top table for average expression of these 3 values gave me: 8.263. In this case this average expression value seems to be the median but on the next set of samples not. > 4368_CTL_HLA.CEL: 10.758 > 4394_CTL_HLA.CEL: 10.907 > 4400_CTL_HLA.CEL: 7.634 > Simple mean = 8.92 > Geometric mean = 9.766 > Whereas the top table for average expression of these 3 values gave me: 9.862, which in this case is not at all close to the median. > > I will appreciate any help on this matter. It will also be appreciated, > any additional though on whether the adjusted average expression > (whichever the method is) is well enough to correct for expression > variability within a given treatment, so I do not need to be worry for > any potential outlier expression value or should I be concerned about? > > Regards, > Miriam > > ******************************** > Miriam Garcia, MS, PhD > Department of Animal Sciences > University of Florida > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}
Dear Garcia, I'm afraid that I still don't understand what is not clear. The average normalized expression value for the example probe (Bt.17364.1.A1_at) is 10.353. That is a simple average of the 18 values. Every table you give shows the same value for AveExpr: 10.353. It is the same in every table, just as the documentation says it will be. This seems to me to be simplicity itself. What is the problem? Best wishes Gordon On Tue, 11 Jun 2013, Garcia Orellana,Miriam wrote: > Dear Dr. Smyth. > Thanks for your reply and let me apologize for probably miscalling the > terms and/or not being clear enough when posting my question, > microarrays data analysis is just a part of my study but definitely is > giving me hard time to interpret my factorial with orthogonal contrast > analysis. > Before posting my previous question, I read the LIMMA user guide section > 12.1 which states the the concept you are mentioning for AveEXpr in top > table and you are right, the concept is clear with respect to the top > table. > However the average expression I mentioned is the one I got when using > all the next code: rawData <- ReadAffy() esetgcrma <- gcrma(rawData) eset1 <- exprs(esetgcrma) gene.rm<-(strsplit2(rownames(eset1),"-")[,1]=="AFFX") eset2 <- eset1[!gene.rm,] GCRMA values per each array for probe: ? Bt.17364.1.A1_at 4367_SFA_HLA.CEL 4368_CTL_HLA.CEL 4381_SFA_LLA.CEL 4384_EFA_HLA.CEL 4387_EFA_HLA.CEL 4388_EFA_LLA.CEL 11.340 8.709 10.048 10.167 9.589 12.664 4394_CTL_HLA.CEL 4395_CTL_LLA.CEL 4396_SFA_HLA.CEL 4398_EFA_LLA.CEL 4399_SFA_LLA.CEL 4400_CTL_HLA.CEL 10.829 11.232 11.044 10.238 9.575 8.164 4402_EFA_HLA.CEL 4404_CTL_LLA.CEL 4409_EFA_LLA.CEL 4410_SFA_HLA.CEL 4413_CTL_LLA.CEL 4429_SFA_LLA.CEL 10.828 9.296 11.651 11.522 10.007 9.460 ############################################################### phenoDiet <- read.table("designDD_MR.txt",header=T,sep="\t") write.exprs(esetgcrma, file="affy_ALL.gcrma.xls") TS <- paste(phenoDiet$DD, phenoDiet$MR, sep=".") TS TS <- factor(TS, levels=c("Ctl.LLA", "Ctl.HLA","SFA.LLA","SFA.HLA","EFA.LLA", "EFA.HLA")) design <- model.matrix(~0+TS) colnames(design) <- levels(TS) fit <- lmFit(eset2, design, method="robust", maxit=1000000) efit <- eBayes(fit) ? ? ?? #Contrast results MatContrast=makeContrasts(FAT=(SFA.LLA + SFA.HLA + EFA.LLA + EFA.HLA)/4 - (Ctl.LLA + Ctl.HLA)/2, ? ? ? ? ? ? ? ? ? ? ? ? ? FA=(EFA.LLA + EFA.HLA)/2 - (SFA.LLA + SFA.HLA)/2, ? ? ? ? ? ? ? ? ? ? ? ? ? MR=(EFA.HLA+SFA.HLA+Ctl.HLA)/3 - (EFA.LLA+SFA.LLA+Ctl.LLA)/3, ? ? ? ? ? ? ? ? ? ? ? ? ? FATbyMR=((EFA.HLA+SFA.HLA)/2 - Ctl.HLA) - ((EFA.LLA+SFA.LLA)/2-Ctl.LLA),? ? ? ? ? ? ? ? ? ? ? ? ? ? FAbyMR=(EFA.HLA-SFA.HLA)-(EFA.LLA - SFA.LLA), ? ? ? ? ? ? ? ? ? ? ? ? ? levels=design) fitMat<-contrasts.fit(fit,MatContrast) fit2=eBayes(fitMat) top=24016 #commands for tables to save the table of results: This commands resulted in the top tables containing: **** I am listing an example for a specific probe***** TopContrastALL=topTable(fit2,coef=NULL,number=top) ? write.table(TopContrastALL,"All_logFC_contrasts.xls",sep="\t") ID FAT FA MR FATbyMR FAbyMR AveExpr F P.Value adj.P.Val Bt.17364.1.A1_at 1.3331 0.4183 -0.3757 1.3361 -3.0755 10.3534 14.5472 0.0001 0.0009 TopContrast1=topTable(fit2,coef=1,number=top) write.table(TopContrast1,"FAT_contrast.xls",sep="\t") ID logFC AveExpr t P.Value adj.P.Val B 5483 Bt.17364.1.A1_at 1.333 10.353 5.369 0.0002 0.0026 -2.337 TopContrast2=topTable(fit2,coef=2,number=top) write.table(TopContrast2,"FA_contrast.xls",sep="\t") ID logFC AveExpr t P.Value adj.P.Val B 5483 Bt.17364.1.A1_at 0.418 10.353 1.530 0.15 0.59 -9.04 TopContrast3=topTable(fit2,coef=3,number=top) write.table(TopContrast3,"MR_contrast.xls",sep="\t") TopContastr4=topTable(fit2,coef=4,number=top) write.table(TopContastr4,"FATbyMR_contrast.xls",sep="\t") TopContrast5=topTable(fit2,coef=5,number=top) write.table(TopContrast5,"FAbyMR_contrast.xls",sep="\t") #Tables for the values of F statistics summ.allcontrast=topTable(efit, coef=NULL,adjust.method="BH",number=top) ? ? # table of differentially expressed probesets write.table(summ.allcontrast,"Fstat_AveExpr_allTRT.xls",sep="\t") ID Ctl.LLA Ctl.HLA SFA.LLA SFA.HLA EFA.LLA EFA.HLA AveExpr F P.Value adj.P.Val Bt.17364.1.A1_at 10.007 8.741 9.694 11.302 11.651 10.182 10.353 1384.0 1.01E-16 1.30E-16 summ.FAT=topTable(efit, coef=1,adjust.method="BH",number=top) write.table(summ.FAT,"Fstat_FATseparate.xls",sep="\t") ID logFC AveExpr t P.Value adj.P.Val B 5483 Bt.17364.1.A1_at 10.007 10.353 34.372 1.24E-13 1.62E-13 18.270 summ.FA=topTable(efit, coef=2,adjust.method="BH",number=top) write.table(summ.FA,"Fstat_FAseparate.xls",sep="\t") ID logFC AveExpr t P.Value adj.P.Val B 5483 Bt.17364.1.A1_at 8.741 10.353 29.622 7.6E-13 9.4E-13 1.6E+01 summ.MR=topTable(efit, coef=3,adjust.method="BH",number=top) write.tablesumm.MR,"Fstat_MRseparate.xls",sep="\t") summ.FATbyMR=topTable(efit, coef=4,adjust.method="BH",number=top) write.table(summ.FAT,"Fstat_FATbyMRseparate.xls",sep="\t") summ.FAbyMR=topTable(efit, coef=5,adjust.method="BH",number=top) write.table(summ.FAbyMR,"Fstat_FAbyMRseparate.xls",sep="\t") ALSO,?Thanks to your recommendation, I review my code and rerun the data using 2 options of topeTable one with the fit2 command and other with the efit as posted with the example I am not sure how the average expression value as presented with the command summ.allcontrast=topTable(efit,coef=NULL,adjust.method="BH",number=top ).? Calculating the AveExpr for each group of 3 arrays with the GCRMA log2 values, it give me quite similar results as that on the summary table for some of the six treatments but for others average expression valus simply do not match. In addition when running specific contrast as it is the FAT contrast when using efit or fit2, both approaches give me the same AveExpr value but log FC, T value, B, and P values. The use of fit2 give the FC values as if I were manually calculating with the AvExpr obtained in the later table above. So I think the right one table to use if that generate with the fit2, but so I am wonder what is the Ftable with efit calculating as log FC??? Thanks so much for all,? Regards, Miriam ******************************** Miriam Garcia, MS, PhD Department of Animal Sciences University of Florida ________________________________________ From: Gordon K Smyth [smyth@wehi.EDU.AU] Sent: Monday, June 10, 2013 8:36 PM To: Garcia Orellana,Miriam Cc: Bioconductor mailing list Subject: How is LIMMA actually calculating the average expression value Dear Miriam, > Date: Sun, 9 Jun 2013 10:28:41 +0000 > From: "Garcia Orellana,Miriam" <mgarciao at="" ufl.edu=""> > To: Bioconductor mailing list ?[bioconductor at r-project.org]? > <bioconductor at="" r-project.org=""> > Subject: [BioC] How is LIMMA actually calculating the average > expression value? > > Dear all: > I have found quite the same question being asked at least twice but I > still not have clear answer about how the ebayes method in LIMMA > calculate the average expression value for a given experimental group. The limma package does not and never has computed the expression level for individual experimental groups. The AveExpr columin is the average of all arrays (ie for all groups not for one group). That is clearly documented, or so it seems to me. The limma User's Guide gives in Section 13.1 a description of all quantities output by topTable. It says "The AveExpr column gives the average log2-expression level for that gene across all the arrays and channels in the experiment." That seems to be me to be completely unambiguous. What is unclear about it? The help page ?topTable says that AveExpr is "average log2-expression for the probe over all arrays and channels, same as Amean in the MarrayLM object" The help page ?"MArrayLM-class" says that Amean is a "numeric vector containing the average log-intensity for each probe over all the arrays in the original linear model fit. Note this vector does not change when a contrast is applied to the fit using contrasts.fit." Again this seem to me to be unambiguous. I've said the same thing in response to questions on this list several times. What is unclear? > I have used GCRMA to normalize my affimetrix values and then obtained > the log2 expression values as (values below do not necessarily > correspond to the same probe): > > 4395_CTL_LLA.CEL :7.89 > 4404_CTL_LLA.CEL: 8.21 > 4413_CTL_LLA.CEL: 8.07 > I have calculated by excel: > Simple mean = 8.055 > Geometric mean = 8.054 > Whereas the top table for average expression of these 3 values gave me: 8.055 There is no such thing as a "top table for average expression". Top tables are always for comparisons between groups. I have no idea what you are trying to do. Could you please read the documentation, and have a look at the posting guide? If you post again, please give the whole code leading to this output, and give expression for all arrays in your experiment, not just three. Best wishes Gordon > This values are quite the same regardless of calculation method, However > when more variability is among values the calculated average expression > differs differs quite largely: > 4368_CTL_HLA.CEL: 8.26 > 4394_CTL_HLA.CEL: 7.17 > 4400_CTL_HLA.CEL: 8.70 > Simple mean = 8.042 > Geometric mean = 8.015 > Whereas the top table for average expression of these 3 values gave me: 8.263. In this case this average expression value seems to be the median but on the next set of samples not. > 4368_CTL_HLA.CEL: 10.758 > 4394_CTL_HLA.CEL: 10.907 > 4400_CTL_HLA.CEL: 7.634 > Simple mean = 8.92 > Geometric mean = 9.766 > Whereas the top table for average expression of these 3 values gave me: 9.862, which in this case is not at all close to the median. > > I will appreciate any help on this matter. It will also be appreciated, > any additional though on whether the adjusted average expression > (whichever the method is) is well enough to correct for expression > variability within a given treatment, so I do not need to be worry for > any potential outlier expression value or should I be concerned about? > > Regards, > Miriam > > ******************************** > Miriam Garcia, MS, PhD > Department of Animal Sciences > University of Florida ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:5}}
Dear Dr. Smyth. You are completely right the average expression is exact the same (10.353) either using the fit2 or efit command but my new concern came when I found out that the logFC differs ie. 1.333 for FAT contrast with fit2 but 10.007 for same contrast when using efit command, or I am havign a wrong interpretation?? When I calculated by hand the logFC with the AveExpr given by the table generated with the command summ.FAT=topTable(efit, coef=1,adjust.method="BH",number=top) I was able to replicate the logFC of 1.333 instead of 10.007 for the FAT contrast., Interesting to me is that these AveExpr per treatments is given when using the efit command. From this table is that came my first original question, I mean, the average of these expression values that came from the average of 3 arrays match for those that have high similarity values among the three arrays but when they are not as similar, the values do not match when comparing to a simple arithmetic mean of the GCRMA log2 values. Thanks very much for helping me with this, I deeply appreciate it. Miriam ******************************** Miriam Garcia, MS, PhD Department of Animal Sciences University of Florida ________________________________________ From: Gordon K Smyth [smyth@wehi.EDU.AU] Sent: Tuesday, June 11, 2013 4:51 AM To: Garcia Orellana,Miriam Cc: Bioconductor mailing list Subject: RE: How is LIMMA actually calculating the average expression value Dear Garcia, I'm afraid that I still don't understand what is not clear. The average normalized expression value for the example probe (Bt.17364.1.A1_at) is 10.353. That is a simple average of the 18 values. Every table you give shows the same value for AveExpr: 10.353. It is the same in every table, just as the documentation says it will be. This seems to me to be simplicity itself. What is the problem? Best wishes Gordon On Tue, 11 Jun 2013, Garcia Orellana,Miriam wrote: > Dear Dr. Smyth. > Thanks for your reply and let me apologize for probably miscalling the > terms and/or not being clear enough when posting my question, > microarrays data analysis is just a part of my study but definitely is > giving me hard time to interpret my factorial with orthogonal contrast > analysis. > Before posting my previous question, I read the LIMMA user guide section > 12.1 which states the the concept you are mentioning for AveEXpr in top > table and you are right, the concept is clear with respect to the top > table. > However the average expression I mentioned is the one I got when using > all the next code: rawData <- ReadAffy() esetgcrma <- gcrma(rawData) eset1 <- exprs(esetgcrma) gene.rm<-(strsplit2(rownames(eset1),"-")[,1]=="AFFX") eset2 <- eset1[!gene.rm,] GCRMA values per each array for probe: Bt.17364.1.A1_at 4367_SFA_HLA.CEL 4368_CTL_HLA.CEL 4381_SFA_LLA.CEL 4384_EFA_HLA.CEL 4387_EFA_HLA.CEL 4388_EFA_LLA.CEL 11.340 8.709 10.048 10.167 9.589 12.664 4394_CTL_HLA.CEL 4395_CTL_LLA.CEL 4396_SFA_HLA.CEL 4398_EFA_LLA.CEL 4399_SFA_LLA.CEL 4400_CTL_HLA.CEL 10.829 11.232 11.044 10.238 9.575 8.164 4402_EFA_HLA.CEL 4404_CTL_LLA.CEL 4409_EFA_LLA.CEL 4410_SFA_HLA.CEL 4413_CTL_LLA.CEL 4429_SFA_LLA.CEL 10.828 9.296 11.651 11.522 10.007 9.460 ############################################################### phenoDiet <- read.table("designDD_MR.txt",header=T,sep="\t") write.exprs(esetgcrma, file="affy_ALL.gcrma.xls") TS <- paste(phenoDiet$DD, phenoDiet$MR, sep=".") TS TS <- factor(TS, levels=c("Ctl.LLA", "Ctl.HLA","SFA.LLA","SFA.HLA","EFA.LLA", "EFA.HLA")) design <- model.matrix(~0+TS) colnames(design) <- levels(TS) fit <- lmFit(eset2, design, method="robust", maxit=1000000) efit <- eBayes(fit) #Contrast results MatContrast=makeContrasts(FAT=(SFA.LLA + SFA.HLA + EFA.LLA + EFA.HLA)/4 - (Ctl.LLA + Ctl.HLA)/2, FA=(EFA.LLA + EFA.HLA)/2 - (SFA.LLA + SFA.HLA)/2, MR=(EFA.HLA+SFA.HLA+Ctl.HLA)/3 - (EFA.LLA+SFA.LLA+Ctl.LLA)/3, FATbyMR=((EFA.HLA+SFA.HLA)/2 - Ctl.HLA) - ((EFA.LLA+SFA.LLA)/2-Ctl.LLA), FAbyMR=(EFA.HLA-SFA.HLA)-(EFA.LLA - SFA.LLA), levels=design) fitMat<-contrasts.fit(fit,MatContrast) fit2=eBayes(fitMat) top=24016 #commands for tables to save the table of results: This commands resulted in the top tables containing: **** I am listing an example for a specific probe***** TopContrastALL=topTable(fit2,coef=NULL,number=top) write.table(TopContrastALL,"All_logFC_contrasts.xls",sep="\t") ID FAT FA MR FATbyMR FAbyMR AveExpr F P.Value adj.P.Val Bt.17364.1.A1_at 1.3331 0.4183 -0.3757 1.3361 -3.0755 10.3534 14.5472 0.0001 0.0009 TopContrast1=topTable(fit2,coef=1,number=top) write.table(TopContrast1,"FAT_contrast.xls",sep="\t") ID logFC AveExpr t P.Value adj.P.Val B 5483 Bt.17364.1.A1_at 1.333 10.353 5.369 0.0002 0.0026 -2.337 TopContrast2=topTable(fit2,coef=2,number=top) write.table(TopContrast2,"FA_contrast.xls",sep="\t") ID logFC AveExpr t P.Value adj.P.Val B 5483 Bt.17364.1.A1_at 0.418 10.353 1.530 0.15 0.59 -9.04 TopContrast3=topTable(fit2,coef=3,number=top) write.table(TopContrast3,"MR_contrast.xls",sep="\t") TopContastr4=topTable(fit2,coef=4,number=top) write.table(TopContastr4,"FATbyMR_contrast.xls",sep="\t") TopContrast5=topTable(fit2,coef=5,number=top) write.table(TopContrast5,"FAbyMR_contrast.xls",sep="\t") #Tables for the values of F statistics summ.allcontrast=topTable(efit, coef=NULL,adjust.method="BH",number=top) # table of differentially expressed probesets write.table(summ.allcontrast,"Fstat_AveExpr_allTRT.xls",sep="\t") ID Ctl.LLA Ctl.HLA SFA.LLA SFA.HLA EFA.LLA EFA.HLA AveExpr F P.Value adj.P.Val Bt.17364.1.A1_at 10.007 8.741 9.694 11.302 11.651 10.182 10.353 1384.0 1.01E-16 1.30E-16 summ.FAT=topTable(efit, coef=1,adjust.method="BH",number=top) write.table(summ.FAT,"Fstat_FATseparate.xls",sep="\t") ID logFC AveExpr t P.Value adj.P.Val B 5483 Bt.17364.1.A1_at 10.007 10.353 34.372 1.24E-13 1.62E-13 18.270 summ.FA=topTable(efit, coef=2,adjust.method="BH",number=top) write.table(summ.FA,"Fstat_FAseparate.xls",sep="\t") ID logFC AveExpr t P.Value adj.P.Val B 5483 Bt.17364.1.A1_at 8.741 10.353 29.622 7.6E-13 9.4E-13 1.6E+01 summ.MR=topTable(efit, coef=3,adjust.method="BH",number=top) write.tablesumm.MR,"Fstat_MRseparate.xls",sep="\t") summ.FATbyMR=topTable(efit, coef=4,adjust.method="BH",number=top) write.table(summ.FAT,"Fstat_FATbyMRseparate.xls",sep="\t") summ.FAbyMR=topTable(efit, coef=5,adjust.method="BH",number=top) write.table(summ.FAbyMR,"Fstat_FAbyMRseparate.xls",sep="\t") ALSO, Thanks to your recommendation, I review my code and rerun the data using 2 options of topeTable one with the fit2 command and other with the efit as posted with the example I am not sure how the average expression value as presented with the command summ.allcontrast=topTable(efit,coef=NULL,adjust.method="BH",number=top ). Calculating the AveExpr for each group of 3 arrays with the GCRMA log2 values, it give me quite similar results as that on the summary table for some of the six treatments but for others average expression valus simply do not match. In addition when running specific contrast as it is the FAT contrast when using efit or fit2, both approaches give me the same AveExpr value but log FC, T value, B, and P values. The use of fit2 give the FC values as if I were manually calculating with the AvExpr obtained in the later table above. So I think the right one table to use if that generate with the fit2, but so I am wonder what is the Ftable with efit calculating as log FC??? Thanks so much for all, Regards, Miriam ******************************** Miriam Garcia, MS, PhD Department of Animal Sciences University of Florida ________________________________________ From: Gordon K Smyth [smyth@wehi.EDU.AU] Sent: Monday, June 10, 2013 8:36 PM To: Garcia Orellana,Miriam Cc: Bioconductor mailing list Subject: How is LIMMA actually calculating the average expression value Dear Miriam, > Date: Sun, 9 Jun 2013 10:28:41 +0000 > From: "Garcia Orellana,Miriam" <mgarciao at="" ufl.edu=""> > To: Bioconductor mailing list ?[bioconductor at r-project.org]? > <bioconductor at="" r-project.org=""> > Subject: [BioC] How is LIMMA actually calculating the average > expression value? > > Dear all: > I have found quite the same question being asked at least twice but I > still not have clear answer about how the ebayes method in LIMMA > calculate the average expression value for a given experimental group. The limma package does not and never has computed the expression level for individual experimental groups. The AveExpr columin is the average of all arrays (ie for all groups not for one group). That is clearly documented, or so it seems to me. The limma User's Guide gives in Section 13.1 a description of all quantities output by topTable. It says "The AveExpr column gives the average log2-expression level for that gene across all the arrays and channels in the experiment." That seems to be me to be completely unambiguous. What is unclear about it? The help page ?topTable says that AveExpr is "average log2-expression for the probe over all arrays and channels, same as Amean in the MarrayLM object" The help page ?"MArrayLM-class" says that Amean is a "numeric vector containing the average log-intensity for each probe over all the arrays in the original linear model fit. Note this vector does not change when a contrast is applied to the fit using contrasts.fit." Again this seem to me to be unambiguous. I've said the same thing in response to questions on this list several times. What is unclear? > I have used GCRMA to normalize my affimetrix values and then obtained > the log2 expression values as (values below do not necessarily > correspond to the same probe): > > 4395_CTL_LLA.CEL :7.89 > 4404_CTL_LLA.CEL: 8.21 > 4413_CTL_LLA.CEL: 8.07 > I have calculated by excel: > Simple mean = 8.055 > Geometric mean = 8.054 > Whereas the top table for average expression of these 3 values gave me: 8.055 There is no such thing as a "top table for average expression". Top tables are always for comparisons between groups. I have no idea what you are trying to do. Could you please read the documentation, and have a look at the posting guide? If you post again, please give the whole code leading to this output, and give expression for all arrays in your experiment, not just three. Best wishes Gordon > This values are quite the same regardless of calculation method, However > when more variability is among values the calculated average expression > differs differs quite largely: > 4368_CTL_HLA.CEL: 8.26 > 4394_CTL_HLA.CEL: 7.17 > 4400_CTL_HLA.CEL: 8.70 > Simple mean = 8.042 > Geometric mean = 8.015 > Whereas the top table for average expression of these 3 values gave me: 8.263. In this case this average expression value seems to be the median but on the next set of samples not. > 4368_CTL_HLA.CEL: 10.758 > 4394_CTL_HLA.CEL: 10.907 > 4400_CTL_HLA.CEL: 7.634 > Simple mean = 8.92 > Geometric mean = 9.766 > Whereas the top table for average expression of these 3 values gave me: 9.862, which in this case is not at all close to the median. > > I will appreciate any help on this matter. It will also be appreciated, > any additional though on whether the adjusted average expression > (whichever the method is) is well enough to correct for expression > variability within a given treatment, so I do not need to be worry for > any potential outlier expression value or should I be concerned about? > > Regards, > Miriam > > ******************************** > Miriam Garcia, MS, PhD > Department of Animal Sciences > University of Florida ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}
Dear Miriam > On Tue, 11 Jun 2013, Garcia Orellana,Miriam wrote: > Dear Dr. Smyth. > You are completely right the average expression is exact the same > (10.353) either using the fit2 or efit command but my new concern came > when I found out that the logFC differs ie. 1.333 for FAT contrast with > fit2 but 10.007 for same contrast when using efit command, or I am > havign a wrong interpretation?? Your interpretation is wrong. The same comparison always gives the same logFC. > When I calculated by hand the logFC with the AveExpr given by the table > generated with the command summ.FAT=topTable(efit, > coef=1,adjust.method="BH",number=top) I was able to replicate the logFC > of 1.333 instead of 10.007 for the FAT contrast., Interesting to me is > that these AveExpr per treatments is given when using the efit command. There is no command called "efit". Perhaps you are refering to the use of contrast.fit? In any case, there is no command which computes "AveExpr per treatments", as I told you in my previous email. > From this table is that came my first original question, I mean, the > average of these expression values that came from the average of 3 > arrays match for those that have high similarity values among the three > arrays but when they are not as similar, the values do not match when > comparing to a simple arithmetic mean of the GCRMA log2 values. You are making a claim that is incorrect. AveExpr is just the average of the log-expression values. > Thanks very much for helping me with this, I deeply appreciate it. I have done my best, but I still have little idea what your actual problem might be. Best wishes Gordon > Miriam > ******************************** > Miriam Garcia, MS, PhD > Department of Animal Sciences > University of Florida ________________________________________ From: Gordon K Smyth [smyth@wehi.EDU.AU] Sent: Tuesday, June 11, 2013 4:51 AM To: Garcia Orellana,Miriam Cc: Bioconductor mailing list Subject: RE: How is LIMMA actually calculating the average expression value Dear Garcia, I'm afraid that I still don't understand what is not clear. The average normalized expression value for the example probe (Bt.17364.1.A1_at) is 10.353. That is a simple average of the 18 values. Every table you give shows the same value for AveExpr: 10.353. It is the same in every table, just as the documentation says it will be. This seems to me to be simplicity itself. What is the problem? Best wishes Gordon On Tue, 11 Jun 2013, Garcia Orellana,Miriam wrote: > Dear Dr. Smyth. > Thanks for your reply and let me apologize for probably miscalling the > terms and/or not being clear enough when posting my question, > microarrays data analysis is just a part of my study but definitely is > giving me hard time to interpret my factorial with orthogonal contrast > analysis. > Before posting my previous question, I read the LIMMA user guide section > 12.1 which states the the concept you are mentioning for AveEXpr in top > table and you are right, the concept is clear with respect to the top > table. > However the average expression I mentioned is the one I got when using > all the next code: rawData <- ReadAffy() esetgcrma <- gcrma(rawData) eset1 <- exprs(esetgcrma) gene.rm<-(strsplit2(rownames(eset1),"-")[,1]=="AFFX") eset2 <- eset1[!gene.rm,] GCRMA values per each array for probe: Bt.17364.1.A1_at 4367_SFA_HLA.CEL 4368_CTL_HLA.CEL 4381_SFA_LLA.CEL 4384_EFA_HLA.CEL 4387_EFA_HLA.CEL 4388_EFA_LLA.CEL 11.340 8.709 10.048 10.167 9.589 12.664 4394_CTL_HLA.CEL 4395_CTL_LLA.CEL 4396_SFA_HLA.CEL 4398_EFA_LLA.CEL 4399_SFA_LLA.CEL 4400_CTL_HLA.CEL 10.829 11.232 11.044 10.238 9.575 8.164 4402_EFA_HLA.CEL 4404_CTL_LLA.CEL 4409_EFA_LLA.CEL 4410_SFA_HLA.CEL 4413_CTL_LLA.CEL 4429_SFA_LLA.CEL 10.828 9.296 11.651 11.522 10.007 9.460 ############################################################### phenoDiet <- read.table("designDD_MR.txt",header=T,sep="\t") write.exprs(esetgcrma, file="affy_ALL.gcrma.xls") TS <- paste(phenoDiet$DD, phenoDiet$MR, sep=".") TS TS <- factor(TS, levels=c("Ctl.LLA", "Ctl.HLA","SFA.LLA","SFA.HLA","EFA.LLA", "EFA.HLA")) design <- model.matrix(~0+TS) colnames(design) <- levels(TS) fit <- lmFit(eset2, design, method="robust", maxit=1000000) efit <- eBayes(fit) #Contrast results MatContrast=makeContrasts(FAT=(SFA.LLA + SFA.HLA + EFA.LLA + EFA.HLA)/4 - (Ctl.LLA + Ctl.HLA)/2, FA=(EFA.LLA + EFA.HLA)/2 - (SFA.LLA + SFA.HLA)/2, MR=(EFA.HLA+SFA.HLA+Ctl.HLA)/3 - (EFA.LLA+SFA.LLA+Ctl.LLA)/3, FATbyMR=((EFA.HLA+SFA.HLA)/2 - Ctl.HLA) - ((EFA.LLA+SFA.LLA)/2-Ctl.LLA), FAbyMR=(EFA.HLA-SFA.HLA)-(EFA.LLA - SFA.LLA), levels=design) fitMat<-contrasts.fit(fit,MatContrast) fit2=eBayes(fitMat) top=24016 #commands for tables to save the table of results: This commands resulted in the top tables containing: **** I am listing an example for a specific probe***** TopContrastALL=topTable(fit2,coef=NULL,number=top) write.table(TopContrastALL,"All_logFC_contrasts.xls",sep="\t") ID FAT FA MR FATbyMR FAbyMR AveExpr F P.Value adj.P.Val Bt.17364.1.A1_at 1.3331 0.4183 -0.3757 1.3361 -3.0755 10.3534 14.5472 0.0001 0.0009 TopContrast1=topTable(fit2,coef=1,number=top) write.table(TopContrast1,"FAT_contrast.xls",sep="\t") ID logFC AveExpr t P.Value adj.P.Val B 5483 Bt.17364.1.A1_at 1.333 10.353 5.369 0.0002 0.0026 -2.337 TopContrast2=topTable(fit2,coef=2,number=top) write.table(TopContrast2,"FA_contrast.xls",sep="\t") ID logFC AveExpr t P.Value adj.P.Val B 5483 Bt.17364.1.A1_at 0.418 10.353 1.530 0.15 0.59 -9.04 TopContrast3=topTable(fit2,coef=3,number=top) write.table(TopContrast3,"MR_contrast.xls",sep="\t") TopContastr4=topTable(fit2,coef=4,number=top) write.table(TopContastr4,"FATbyMR_contrast.xls",sep="\t") TopContrast5=topTable(fit2,coef=5,number=top) write.table(TopContrast5,"FAbyMR_contrast.xls",sep="\t") #Tables for the values of F statistics summ.allcontrast=topTable(efit, coef=NULL,adjust.method="BH",number=top) # table of differentially expressed probesets write.table(summ.allcontrast,"Fstat_AveExpr_allTRT.xls",sep="\t") ID Ctl.LLA Ctl.HLA SFA.LLA SFA.HLA EFA.LLA EFA.HLA AveExpr F P.Value adj.P.Val Bt.17364.1.A1_at 10.007 8.741 9.694 11.302 11.651 10.182 10.353 1384.0 1.01E-16 1.30E-16 summ.FAT=topTable(efit, coef=1,adjust.method="BH",number=top) write.table(summ.FAT,"Fstat_FATseparate.xls",sep="\t") ID logFC AveExpr t P.Value adj.P.Val B 5483 Bt.17364.1.A1_at 10.007 10.353 34.372 1.24E-13 1.62E-13 18.270 summ.FA=topTable(efit, coef=2,adjust.method="BH",number=top) write.table(summ.FA,"Fstat_FAseparate.xls",sep="\t") ID logFC AveExpr t P.Value adj.P.Val B 5483 Bt.17364.1.A1_at 8.741 10.353 29.622 7.6E-13 9.4E-13 1.6E+01 summ.MR=topTable(efit, coef=3,adjust.method="BH",number=top) write.tablesumm.MR,"Fstat_MRseparate.xls",sep="\t") summ.FATbyMR=topTable(efit, coef=4,adjust.method="BH",number=top) write.table(summ.FAT,"Fstat_FATbyMRseparate.xls",sep="\t") summ.FAbyMR=topTable(efit, coef=5,adjust.method="BH",number=top) write.table(summ.FAbyMR,"Fstat_FAbyMRseparate.xls",sep="\t") ALSO, Thanks to your recommendation, I review my code and rerun the data using 2 options of topeTable one with the fit2 command and other with the efit as posted with the example I am not sure how the average expression value as presented with the command summ.allcontrast=topTable(efit,coef=NULL,adjust.method="BH",number=top ). Calculating the AveExpr for each group of 3 arrays with the GCRMA log2 values, it give me quite similar results as that on the summary table for some of the six treatments but for others average expression valus simply do not match. In addition when running specific contrast as it is the FAT contrast when using efit or fit2, both approaches give me the same AveExpr value but log FC, T value, B, and P values. The use of fit2 give the FC values as if I were manually calculating with the AvExpr obtained in the later table above. So I think the right one table to use if that generate with the fit2, but so I am wonder what is the Ftable with efit calculating as log FC??? Thanks so much for all, Regards, Miriam ******************************** Miriam Garcia, MS, PhD Department of Animal Sciences University of Florida ________________________________________ From: Gordon K Smyth [smyth@wehi.EDU.AU] Sent: Monday, June 10, 2013 8:36 PM To: Garcia Orellana,Miriam Cc: Bioconductor mailing list Subject: How is LIMMA actually calculating the average expression value Dear Miriam, > Date: Sun, 9 Jun 2013 10:28:41 +0000 > From: "Garcia Orellana,Miriam" <mgarciao at="" ufl.edu=""> > To: Bioconductor mailing list ?[bioconductor at r-project.org]? > <bioconductor at="" r-project.org=""> > Subject: [BioC] How is LIMMA actually calculating the average > expression value? > > Dear all: > I have found quite the same question being asked at least twice but I > still not have clear answer about how the ebayes method in LIMMA > calculate the average expression value for a given experimental group. The limma package does not and never has computed the expression level for individual experimental groups. The AveExpr columin is the average of all arrays (ie for all groups not for one group). That is clearly documented, or so it seems to me. The limma User's Guide gives in Section 13.1 a description of all quantities output by topTable. It says "The AveExpr column gives the average log2-expression level for that gene across all the arrays and channels in the experiment." That seems to be me to be completely unambiguous. What is unclear about it? The help page ?topTable says that AveExpr is "average log2-expression for the probe over all arrays and channels, same as Amean in the MarrayLM object" The help page ?"MArrayLM-class" says that Amean is a "numeric vector containing the average log-intensity for each probe over all the arrays in the original linear model fit. Note this vector does not change when a contrast is applied to the fit using contrasts.fit." Again this seem to me to be unambiguous. I've said the same thing in response to questions on this list several times. What is unclear? > I have used GCRMA to normalize my affimetrix values and then obtained > the log2 expression values as (values below do not necessarily > correspond to the same probe): > > 4395_CTL_LLA.CEL :7.89 > 4404_CTL_LLA.CEL: 8.21 > 4413_CTL_LLA.CEL: 8.07 > I have calculated by excel: > Simple mean = 8.055 > Geometric mean = 8.054 > Whereas the top table for average expression of these 3 values gave me: 8.055 There is no such thing as a "top table for average expression". Top tables are always for comparisons between groups. I have no idea what you are trying to do. Could you please read the documentation, and have a look at the posting guide? If you post again, please give the whole code leading to this output, and give expression for all arrays in your experiment, not just three. Best wishes Gordon > This values are quite the same regardless of calculation method, However > when more variability is among values the calculated average expression > differs differs quite largely: > 4368_CTL_HLA.CEL: 8.26 > 4394_CTL_HLA.CEL: 7.17 > 4400_CTL_HLA.CEL: 8.70 > Simple mean = 8.042 > Geometric mean = 8.015 > Whereas the top table for average expression of these 3 values gave me: 8.263. In this case this average expression value seems to be the median but on the next set of samples not. > 4368_CTL_HLA.CEL: 10.758 > 4394_CTL_HLA.CEL: 10.907 > 4400_CTL_HLA.CEL: 7.634 > Simple mean = 8.92 > Geometric mean = 9.766 > Whereas the top table for average expression of these 3 values gave me: 9.862, which in this case is not at all close to the median. > > I will appreciate any help on this matter. It will also be appreciated, > any additional though on whether the adjusted average expression > (whichever the method is) is well enough to correct for expression > variability within a given treatment, so I do not need to be worry for > any potential outlier expression value or should I be concerned about? > > Regards, > Miriam > > ******************************** > Miriam Garcia, MS, PhD > Department of Animal Sciences > University of Florida ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:9}}