Question

BH vs BY in p.adjust

0

Entering edit mode

lemerle@embl.de ▴ 50

@lemerleemblde-1669

Last seen 9.6 years ago

Quoting Wolfgang Huber <huber at="" ebi.ac.uk="">: > Hi Caroline, > > what does > > hist(fit2$p.value[,1], breaks=100, col="orange") > > look like? hi wolfgang, well.... not at all uniform: > x <- hist(fit2$p.value[,1], breaks=30, > col="orange",main="distribution of raw p-values",labels=T) > cbind(x$mids,x$counts) [,1] [,2] [1,] 0.025 1891 [2,] 0.075 270 [3,] 0.125 209 [4,] 0.175 123 [5,] 0.225 100 [6,] 0.275 101 [7,] 0.325 79 [8,] 0.375 79 [9,] 0.425 85 [10,] 0.475 57 ..... but from here on, the distribution is uniform (around 50 in every bin until p-val=1). so there are a lot of differential probesets in this contrast. but between 519 and 1032 as estimated from BY and BH adjustments with 1% FDR, there's quite a difference.... or can i estimate it directly from this histogram .....substracting the baseline gives me 2439 probesets, almost 70% of the whole set: > baseline <- mean(x$counts[11:20]) > sum(x$counts-baseline) [1] 2439 how safe is this ? Assuming that the p-values are uniform distributed under > the null hypothesis (which they should be if they are worth their > name), then the shape of this distribution can give you an idea of > the proportion of differential vs non-differential probesets. by the way, in cases that it's not uniformly distributed, from the range values of the over-represented bins on the histogram, can we not get an idea of the effect size associated with the differential probesets responsible for this non-uniformity ? or the other way around, if i happened to know that there were differential probesets but all of only moderate effect size, i might expect a bulge at moderate p-values, while lower ones could well instead be uniformly distributed, right? but then if that were the case, could it also be that if all differential probesets had similar p-values, say 0.2, they could more easily be discovered than the same number associated to a lower but wider ranger of p-values, only because they would add significance to each other? this doesn't quite sound right if it's true that the adjustment procedure preserves the rank that the genes have from the p-value. thanks for the help and comments caroline > > Best wishes > Wolfgang > > -- > ------------------------------------------------------------------ > Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber > >> dear list, >> >> i am analysing a set of affymetrix chips using limma to get lists of >> differentially expressed probesets. >> concerning BH and BY, the two methods available with p.adjust that adjust >> p-values controlling for the false discovery rate, i was surprised to find a >> large discrepancy in the number of differentially expressed genes in >> the case >> of one contrast of interest: >> >>> fit <- lmFit(eset.f,design) >>> fit2 <- contrasts.fit(fit,c.matn) >>> fit2 <- eBayes(fit2) >> >> # adjustment for multiple testing of the set of p-values derived >> from the first >> contrast: >> >>> p.value <- fit2$p.value[,1] >>> bh <- p.adjust(p.value, method = "BH") >>> by <- p.adjust(p.value, method = "BY") >> >> # number of differentially expressed genes below the cutoff value of >> 0.01 for >> the 2 methods >> >>> length(which(bh<0.01)) >> >> [1] 1032 >> >>> length(which(by<0.01)) >> >> [1] 519 >> >> # size of the original set: 3549 probesets (with 15 observations for each) >> >>> dim(exprs(eset.f)) >> >> [1] 3549 15 >> >> i don't think there is a bug anywhere (but i'd gladly send you my data to >> reproduce this), i'm more inclined to think that i know too little to make >> sense of this at this stage.... >> >> could this difference be due to the fact that these numbers correspond to a >> large fraction of the whole set tested and that the two methods behave >> differently with respect to this? >> >> if estimates of false positives at this cutoff were reliable with >> both methods, >> wouldn't this imply that BH gives a far better coverage of the set of truly >> differentially expressed genes than BY (TP/(TP+FN)), for the same >> control over >> false discovery (here, 1%)? >> >> i've seen mentioned but only vaguely that the magnitude of change is >> an issue. >> that the methods work best when it is low.... if you think that may be it, >> could you point me to something i might read to understand this better? >> >> a related question i have is about the interpretation of FDR values >> close to 1, >> in such cases where there are many changes, ie when the fraction of >> non-differentially expressed genes in the whole set is far below 1... >> >> i've come to understand that methods controlling fdr substitute the >> p-value of >> each gene for a value that corresponds to the minimum cut-off to >> apply in order >> for that gene to be called positive, with the added info that such a set >> obtained by applying this minimum cutoff can be expected to contain that >> percentage of false positives. >> but as the fdr rate ranges from 0 to 1, the chosen cutoff can fall above the >> true percentage of non-differentially expressed sets and i don't see >> how such >> an estimate can still be reliable up to 1 if the true number of >> false positives >> is far lower.... unless it represents only an upper bound of how many false >> positives to expect... is this the case ? >> >> thanks for any comment, i'm confused about this.... >> >> caroline >> > > -- Caroline Lemerle PhD student, lab of Luis Serrano Structural Biology and Biocomputing Dept tel 00-49-6221-387-8335 --

Coverage limma safe Coverage limma safe • 1.6k views

ADD COMMENT • link updated 17.7 years ago by Wolfgang Huber ★ 13k • written 17.7 years ago by lemerle@embl.de ▴ 50

score 0 · Answer 1 · 2006-07-29

Hi Caroline, > hi wolfgang, > well.... not at all uniform: That is good - the distribution that you see is expected to be a mixture of uniform (for the non differentially expressed genes) and something which is concentrated near p=0 (for the differentially expressed genes). The power of your test (e.g. the sample size) determines how well the differentially expressed genes indeed get p-values close to 0. >> x <- hist(fit2$p.value[,1], breaks=30, col="orange",main="distribution >> of raw p-values",labels=T) > >> cbind(x$mids,x$counts) > [,1] [,2] > [1,] 0.025 1891 > [2,] 0.075 270 > [3,] 0.125 209 > [4,] 0.175 123 > [5,] 0.225 100 > [6,] 0.275 101 > [7,] 0.325 79 > [8,] 0.375 79 > [9,] 0.425 85 > [10,] 0.475 57 ..... > > but from here on, the distribution is uniform (around 50 in every bin until > p-val=1). so there are a lot of differential probesets in this contrast. > but > between 519 and 1032 as estimated from BY and BH adjustments with 1% FDR, > there's quite a difference.... or can i estimate it directly from this > histogram .....substracting the baseline gives me 2439 probesets, almost > 70% of > the whole set: > >> baseline <- mean(x$counts[11:20]) >> sum(x$counts-baseline) > [1] 2439 > > how safe is this ? This is a good estimate of the number of differentially expressed genes if - your p-values are indeed uniformly distributed for those genes that fall under the null hypothesis - your test has an OK power to find the alternatives and of course it is more difficult to decide which ones they are. > by the way, in cases that it's not uniformly distributed, from the range > values > of the over-represented bins on the histogram, can we not get an idea of > the > effect size associated with the differential probesets responsible for this > non-uniformity ? > or the other way around, if i happened to know that there were differential > probesets but all of only moderate effect size, i might expect a bulge at > moderate p-values, while lower ones could well instead be uniformly > distributed, right? In principle yes, but that would mean that your test is underpowered. Also, the p-value is (generally) the result of two things: effect size and sample size. > but then if that were the case, could it also be that if all differential > probesets had similar p-values, say 0.2, they could more easily be > discovered > than the same number associated to a lower but wider ranger of p-values, > only > because they would add significance to each other? This seems like a very artificial scenario, and unlikely due to stochastic effects. > this doesn't quite sound right if it's true that the adjustment procedure > preserves the rank that the genes have from the p-value. > Best wishes ------------------------------------------------------------------ Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber