Question

bumphunter output clarification: p values achieved with b=1000: further correction needed?

0

Entering edit mode

chelsey.ju • 0

@chelseyju-12100

Last seen 7.2 years ago

Hello,

I have performed the following bumphunter script on my data set:

dmrs <- bumphunter(gset, design = designMatrix, cutoff = 0.05, B=1000, type="M")

I am wondering if the p values received from this need further corrections, or was that already done with the B=1000?

furthermore, I did not include cutoffQ; if I put cutoffQ=0.2 is that the equivalent to having an FDR of 0.2?

thanks!

methylation dmr analysis bumphunter • 3.2k views

ADD COMMENT • link updated 7.6 years ago by James W. MacDonald 66k • written 7.6 years ago by chelsey.ju • 0

score 3 · Answer 1 · 2017-01-06

When you permute (or draw bootstraps) you are trying to estimate the null distribution for each bump and are then comparing your observed value to the null distribution. A p-value estimates the probability of getting your observed result (or larger) under the null distribution, and when you permute you compute that as # permuted bumps > observed bump / # permutations. So that's just an unadjusted p-value.

There are two p-values, for two measures of the bump. The p.value is based on the average log fold change between your two samples (e.g., the average 'height' of the bump), and the p.valueArea is based on the area under the bump, which takes into account both the height and the length of the bump.

There is a multiplicity adjusted p-value for both measures, the fwer column(s), which try to adjust for multiple comparisons by comparing the observed value for each bump to all the permuted bumps, at each permutation round. In other words, for the 'regular' p-value, at each permutation the observed value for bump #1 is compared to the permuted value for bump #1, and if the permuted value is larger, you increment the numerator of the statistic. For the fwer, at each permutation the observed value for bump #1 is compared to the permuted value for bump #1 as well as all the other bumps. If any of the permuted bumps are larger than the observed value for bump #1, then you increment the numerator of the statistic. This is repeated for all the bumps. This is obviously much more conservative.

And this is where all the cutoff arguments come in (cutoff, pickCutoff, pickCutoffQ). These arguments are used to say what is and isn't a bump. If you use the default (which is to use pickCutoffQ = 0.99), then bumphunter will generate the 1000 permutations or bootstraps and then use the 99th percentile of the permuted bumps as the cutoff, and any observed bump that is larger than that will then be considered a 'real' bump, and compared to the permuted distribution.

You absolutely don't want to do something like pickCutoffQ = 0.2. You want to be pretty close to 1, like 0.99 or 0.95. If you used 0.2 you would almost surely have like 13 bazillion bumps, most of which would be really small, and it would take forever to compute.