Question

Limma weights, P values and differential expression

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

Dear Alfredo, At 08:00 PM 25/08/2006, bioconductor-request at stat.math.ethz.ch wrote: >Date: Thu, 24 Aug 2006 07:29:07 -0700 (PDT) >From: Alfredo Juncal <alfjuncal at="" yahoo.com=""> >Subject: [BioC] Limma weights, P values and differential expression >To: bioconductor at stat.math.ethz.ch > >Dear Limma users, > > Although I have read the manual and searched the mailing list, I > still do not fully understandf how weights are used when applying > the linear model. In limma, weights are always treated as relative weights in model fits. This approach is customary in regression analysis and is explained in most statistics textbooks. A consequence of this approach is that the absolute size of the weights is immaterial: you will get the same differential expression result if you set all the weights to 100, or all the weights to 0.01, or all the weights to 1. At the normalization stage, a loess curve is fitted for each array. Here the relative sizes of the weights for different probes are used. Probes with lower weights for that array get less weight. At the differential expression stage, a linear model is fitted for each probe. Here it is the relative sizes of the weights for different arrays for that probe which are relevant. Arrays with lower weights get downweighted. But note, if all the weights for a given probe are the same size, it doesn't matter what that size is. If a particular probe gets weight 0.001 on every array, then the differential expression analysis will be exactly the same as if that probe got weight 1000 on every array. The weights are only used to differentially weight the different arrays. Weights are not used to downweight one probe relative to another in the differential expression analysis, at least not unless the weights are actually zero. > In our case we have four cDNA microarrays (2 biological replicates > with 2 dye-swaps for each) in which we compare a mutant with a wt > sample. The microarrays have each gene printed once. We use > different spot quality filters and we apply a 0.01 weight to the > spots which do not meet these criteria or which are controls. We > then apply lmFit with weights and use BH to adjust for multiple > testing. These downweighted spots do not affect the Loess print-tip > normalization, but how are they treated in the linear model? > > For example, some of the genes with 3 replicate spots with weight > 0.01 and 1 with weight 1 have an adjusted P value < 0.05. Is Limma > "taking into consideration" that 3 of the spots are suboptimal when > calculating the Adj. P.Vals.? Could this gene be considered fairly > reliable because although 3 of the spots are suboptimal the fact > that they have a similar ratio than the one which is optimal "adds > strength" to the differential expression of that gene? Yes, and that seems reasonable to me. Also note that weights c(1,0.01,0.01,0.01) are will give the same result as c(100,1,1,1). It is only the relative size which is important. > If we set the 0.01 weights to 0.00 after the normalization, Adj.P > Values are not calculated (as expected displays NA) for those genes > with all four weights being 0. However, P values are calculated for > those genes with only one weight being 1. How come there is a Pval > in the latter if there is only 1 optimal observation? ... because the Bayesian approach supplies a standard deviation estimate even when the data for that gene does not. >The M values are almost identical doing it this way or leaving the >weights at 0.01, but the Adj.P.Vals are somewhat different (Example: >3000 genes with P<0.05 doing it either way but around 1000 genes are >< 0.05 only by doing it one way or the other). However, no >differences with genes over log2ratio 1 or -1 and very slight ones >(12 genes) with genes log2ratio < 0.58 or -0.58. > >In summary, after obtaining the Limma results list, would you set an >extra filter to select only those genes which have N spots meeting >the quality criteria? Is it necessary? Which value should N be? >Thank you very much, I don't see why you would, and from the results you report is seems that limma is doing reasonable things with your data already. It seems to me that if you really do not want a particular gene to be appearing in your gene list, even when the data for that gene seem perfectly reasonable, then you should remove that gene from the analysis or give the spots zero weight, rather than adding an extra ad hoc step after the analysis. Best wishes Gordon >A. Juncal

Normalization Bayesian Regression probe limma Normalization Bayesian Regression probe • 1.9k views

ADD COMMENT • link 17.7 years ago Gordon Smyth 50k