Question: BH vs BY in p.adjust
0
13.2 years ago by
lemerle@embl.de50 wrote:
Quoting Wolfgang Huber <huber at="" ebi.ac.uk="">: > Hi Caroline, > > what does > > hist(fit2$p.value[,1], breaks=100, col="orange") > > look like? hi wolfgang, well.... not at all uniform: > x <- hist(fit2$p.value[,1], breaks=30, > col="orange",main="distribution of raw p-values",labels=T) > cbind(x$mids,x$counts) [,1] [,2] [1,] 0.025 1891 [2,] 0.075 270 [3,] 0.125 209 [4,] 0.175 123 [5,] 0.225 100 [6,] 0.275 101 [7,] 0.325 79 [8,] 0.375 79 [9,] 0.425 85 [10,] 0.475 57 ..... but from here on, the distribution is uniform (around 50 in every bin until p-val=1). so there are a lot of differential probesets in this contrast. but between 519 and 1032 as estimated from BY and BH adjustments with 1% FDR, there's quite a difference.... or can i estimate it directly from this histogram .....substracting the baseline gives me 2439 probesets, almost 70% of the whole set: > baseline <- mean(x$counts[11:20]) > sum(x$counts-baseline) [1] 2439 how safe is this ? Assuming that the p-values are uniform distributed under > the null hypothesis (which they should be if they are worth their > name), then the shape of this distribution can give you an idea of > the proportion of differential vs non-differential probesets. by the way, in cases that it's not uniformly distributed, from the range values of the over-represented bins on the histogram, can we not get an idea of the effect size associated with the differential probesets responsible for this non-uniformity ? or the other way around, if i happened to know that there were differential probesets but all of only moderate effect size, i might expect a bulge at moderate p-values, while lower ones could well instead be uniformly distributed, right? but then if that were the case, could it also be that if all differential probesets had similar p-values, say 0.2, they could more easily be discovered than the same number associated to a lower but wider ranger of p-values, only because they would add significance to each other? this doesn't quite sound right if it's true that the adjustment procedure preserves the rank that the genes have from the p-value. thanks for the help and comments caroline > > Best wishes > Wolfgang > > -- > ------------------------------------------------------------------ > Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber > >> dear list, >> >> i am analysing a set of affymetrix chips using limma to get lists of >> differentially expressed probesets. >> concerning BH and BY, the two methods available with p.adjust that adjust >> p-values controlling for the false discovery rate, i was surprised to find a >> large discrepancy in the number of differentially expressed genes in >> the case >> of one contrast of interest: >> >>> fit <- lmFit(eset.f,design) >>> fit2 <- contrasts.fit(fit,c.matn) >>> fit2 <- eBayes(fit2) >> >> # adjustment for multiple testing of the set of p-values derived >> from the first >> contrast: >> >>> p.value <- fit2$p.value[,1] >>> bh <- p.adjust(p.value, method = "BH") >>> by <- p.adjust(p.value, method = "BY") >> >> # number of differentially expressed genes below the cutoff value of >> 0.01 for >> the 2 methods >> >>> length(which(bh<0.01)) >> >> [1] 1032 >> >>> length(which(by<0.01)) >> >> [1] 519 >> >> # size of the original set: 3549 probesets (with 15 observations for each) >> >>> dim(exprs(eset.f)) >> >> [1] 3549 15 >> >> i don't think there is a bug anywhere (but i'd gladly send you my data to >> reproduce this), i'm more inclined to think that i know too little to make >> sense of this at this stage.... >> >> could this difference be due to the fact that these numbers correspond to a >> large fraction of the whole set tested and that the two methods behave >> differently with respect to this? >> >> if estimates of false positives at this cutoff were reliable with >> both methods, >> wouldn't this imply that BH gives a far better coverage of the set of truly >> differentially expressed genes than BY (TP/(TP+FN)), for the same >> control over >> false discovery (here, 1%)? >> >> i've seen mentioned but only vaguely that the magnitude of change is >> an issue. >> that the methods work best when it is low.... if you think that may be it, >> could you point me to something i might read to understand this better? >> >> a related question i have is about the interpretation of FDR values >> close to 1, >> in such cases where there are many changes, ie when the fraction of >> non-differentially expressed genes in the whole set is far below 1... >> >> i've come to understand that methods controlling fdr substitute the >> p-value of >> each gene for a value that corresponds to the minimum cut-off to >> apply in order >> for that gene to be called positive, with the added info that such a set >> obtained by applying this minimum cutoff can be expected to contain that >> percentage of false positives. >> but as the fdr rate ranges from 0 to 1, the chosen cutoff can fall above the >> true percentage of non-differentially expressed sets and i don't see >> how such >> an estimate can still be reliable up to 1 if the true number of >> false positives >> is far lower.... unless it represents only an upper bound of how many false >> positives to expect... is this the case ? >> >> thanks for any comment, i'm confused about this.... >> >> caroline >> > > -- Caroline Lemerle PhD student, lab of Luis Serrano Structural Biology and Biocomputing Dept tel 00-49-6221-387-8335 -- coverage limma safe • 1.1k views ADD COMMENTlink modified 13.2 years ago by Wolfgang Huber13k • written 13.2 years ago by lemerle@embl.de50 Answer: BH vs BY in p.adjust 0 13.2 years ago by EMBL European Molecular Biology Laboratory Wolfgang Huber13k wrote: Hi Caroline, > hi wolfgang, > well.... not at all uniform: That is good - the distribution that you see is expected to be a mixture of uniform (for the non differentially expressed genes) and something which is concentrated near p=0 (for the differentially expressed genes). The power of your test (e.g. the sample size) determines how well the differentially expressed genes indeed get p-values close to 0. >> x <- hist(fit2$p.value[,1], breaks=30, col="orange",main="distribution >> of raw p-values",labels=T) > >> cbind(x$mids,x$counts) > [,1] [,2] > [1,] 0.025 1891 > [2,] 0.075 270 > [3,] 0.125 209 > [4,] 0.175 123 > [5,] 0.225 100 > [6,] 0.275 101 > [7,] 0.325 79 > [8,] 0.375 79 > [9,] 0.425 85 > [10,] 0.475 57 ..... > > but from here on, the distribution is uniform (around 50 in every bin until > p-val=1). so there are a lot of differential probesets in this contrast. > but > between 519 and 1032 as estimated from BY and BH adjustments with 1% FDR, > there's quite a difference.... or can i estimate it directly from this > histogram .....substracting the baseline gives me 2439 probesets, almost > 70% of > the whole set: > >> baseline <- mean(x$counts[11:20]) >> sum(x$counts-baseline) > [1] 2439 > > how safe is this ? This is a good estimate of the number of differentially expressed genes if - your p-values are indeed uniformly distributed for those genes that fall under the null hypothesis - your test has an OK power to find the alternatives and of course it is more difficult to decide which ones they are. > by the way, in cases that it's not uniformly distributed, from the range > values > of the over-represented bins on the histogram, can we not get an idea of > the > effect size associated with the differential probesets responsible for this > non-uniformity ? > or the other way around, if i happened to know that there were differential > probesets but all of only moderate effect size, i might expect a bulge at > moderate p-values, while lower ones could well instead be uniformly > distributed, right? In principle yes, but that would mean that your test is underpowered. Also, the p-value is (generally) the result of two things: effect size and sample size. > but then if that were the case, could it also be that if all differential > probesets had similar p-values, say 0.2, they could more easily be > discovered > than the same number associated to a lower but wider ranger of p-values, > only > because they would add significance to each other? This seems like a very artificial scenario, and unlikely due to stochastic effects. > this doesn't quite sound right if it's true that the adjustment procedure > preserves the rank that the genes have from the p-value. > Best wishes ------------------------------------------------------------------ Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber
>> by the way, in cases that it's not uniformly distributed, from the >> range values >> of the over-represented bins on the histogram, can we not get an idea of the >> effect size associated with the differential probesets responsible for this >> non-uniformity ? >> or the other way around, if i happened to know that there were differential >> probesets but all of only moderate effect size, i might expect a bulge at >> moderate p-values, while lower ones could well instead be uniformly >> distributed, right? > > In principle yes, but that would mean that your test is underpowered. > Also, the p-value is (generally) the result of two things: effect > size and sample size. > >> but then if that were the case, could it also be that if all differential >> probesets had similar p-values, say 0.2, they could more easily be >> discovered >> than the same number associated to a lower but wider ranger of >> p-values, only >> because they would add significance to each other? > > This seems like a very artificial scenario, and unlikely due to > stochastic effects. > >> this doesn't quite sound right if it's true that the adjustment procedure >> preserves the rank that the genes have from the p-value. > > a very small p-value is no guarantee for being differentially > expressed, it could still be by chance (from the uniform > distribution), it is just very unlikely, > hi wolfgang, ok. what i was naively alluding to with hypothetical bulgy p-value distributions was more related to your present comment that it'd be difficult to identify which are the differential probesets in the over-represented range of p-values. as i understand it, it's actually impossible: the only probeset- specific information used for calculating the adjusted distribution is the p-values and since, by definition, they don't differ in that respect, this forces ranks to be preserved. so a real differential probeset with the same p-value as a non-differential one cannot, after adjustment, get a smaller value than the latter, ie cannot be picked up without the other being picked up too. this is easily seen by plotting raw p-values against adjusted values: the curve is strictly monotonous (apologies that this may be very obvious to you and many others) but about the bulge, i think i am again making a mistake in thinking these might arise in a distribution of p-values from selected datasets... i tried to have a look at that empirically, by generating datasets with a fraction of expression values of a subset of 'probesets' drawn from a normal distribution with a slightly higher mean, to see for what magnitude of change i started detecting off-sets from the uniform distribution and in what range of p-values. and what i see is that the only trend emerging is always that the smallest range is the most populated one with counts decreasing until the distribution is again uniform. does anybody confirm that this is in theory not expected? what sorts of shapes can one expect from a p-value distribution when it isn't uniform? sorry that this is no longer directly related to bioc but thanks in advance for comments, caroline Quoting Wolfgang Huber <huber at="" ebi.ac.uk="">: > Hi Caroline, > >> hi wolfgang, >> well.... not at all uniform: > > That is good - the distribution that you see is expected to be a > mixture of uniform (for the non differentially expressed genes) and > something which is concentrated near p=0 (for the differentially > expressed genes). The power of your test (e.g. the sample size) > determines how well the differentially expressed genes indeed get > p-values close to 0. > >>> x <- hist(fit2$p.value[,1], breaks=30, >>> col="orange",main="distribution of raw p-values",labels=T) >> >>> cbind(x$mids,x$counts) >> [,1] [,2] >> [1,] 0.025 1891 >> [2,] 0.075 270 >> [3,] 0.125 209 >> [4,] 0.175 123 >> [5,] 0.225 100 >> [6,] 0.275 101 >> [7,] 0.325 79 >> [8,] 0.375 79 >> [9,] 0.425 85 >> [10,] 0.475 57 ..... >> >> but from here on, the distribution is uniform (around 50 in every bin until >> p-val=1). so there are a lot of differential probesets in this contrast. but >> between 519 and 1032 as estimated from BY and BH adjustments with 1% FDR, >> there's quite a difference.... or can i estimate it directly from this >> histogram .....substracting the baseline gives me 2439 probesets, >> almost 70% of >> the whole set: >> >>> baseline <- mean(x$counts[11:20]) >>> sum(x\$counts-baseline) >> [1] 2439 >> >> how safe is this ? > > This is a good estimate of the number of differentially expressed genes if > > - your p-values are indeed uniformly distributed for those genes that > fall under the null hypothesis > - your test has an OK power to find the alternatives > > and of course it is more difficult to decide which ones they are. > > Cheers > Wolfgang > >>