Search
Question: bumphunter output clarification: p values achieved with b=1000: further correction needed?
0
17 months ago by
chelsey.ju0 wrote:

Hello,

I have performed the following bumphunter script on my data set:

dmrs <- bumphunter(gset, design = designMatrix, cutoff = 0.05, B=1000, type="M")

I am wondering if the p values received from this need further corrections, or was that already done with the B=1000?

furthermore, I did not include cutoffQ; if I put cutoffQ=0.2 is that the equivalent to having an FDR of 0.2?

thanks!

modified 17 months ago by James W. MacDonald46k • written 17 months ago by chelsey.ju0
3
17 months ago by
United States
James W. MacDonald46k wrote:

When you permute (or draw bootstraps) you are trying to estimate the null distribution for each bump and are then comparing your observed value to the null distribution. A p-value estimates the probability of getting your observed result (or larger) under the null distribution, and when you permute you compute that as # permuted bumps > observed bump / # permutations. So that's just an unadjusted p-value.

There are two p-values, for two measures of the bump. The p.value is based on the average log fold change between your two samples (e.g., the average 'height' of the bump), and the p.valueArea is based on the area under the bump, which takes into account both the height and the length of the bump.

There is a multiplicity adjusted p-value for both measures, the fwer column(s), which try to adjust for multiple comparisons by comparing the observed value for each bump to all the permuted bumps, at each permutation round. In other words, for the 'regular' p-value, at each permutation the observed value for bump #1 is compared to the permuted value for bump #1, and if the permuted value is larger, you increment the numerator of the statistic. For the fwer, at each permutation the observed value for bump #1 is compared to the permuted value for bump #1 as well as all the other bumps. If any of the permuted bumps are larger than the observed value for bump #1, then you increment the numerator of the statistic. This is repeated for all the bumps. This is obviously much more conservative.

And this is where all the cutoff arguments come in (cutoff, pickCutoff, pickCutoffQ). These arguments are used to say what is and isn't a bump. If you use the default (which is to use pickCutoffQ = 0.99), then bumphunter will generate the 1000 permutations or bootstraps and then use the 99th percentile of the permuted bumps as the cutoff, and any observed bump that is larger than that will then be considered a 'real' bump, and compared to the permuted distribution.

You absolutely don't want to do something like pickCutoffQ = 0.2. You want to be pretty close to 1, like 0.99 or 0.95. If you used 0.2 you would almost surely have like 13 bazillion bumps, most of which would be really small, and it would take forever to compute.

Hi James,

A wierd thing though happens that I am getting FWER < p-value which in ideal case you dont expect meaning p-value is being more stringent than corrected p-value. Any idea why it could happen would be really helpful.

1

It's easy to imagine how that would happen, just by re-reading what I said above. The p-value is computed by comparing the permuted bumps to the observed bump at that position. The FWER is computed by comparing the observed bump versus all the permuted bumps at all other positions. If the bump in question has an observed area that is much larger than all others (but not that much larger than the permuted bumps at that position), what do you think would happen?

I think the following section of the help page for bumphunter is instructive, particularly the last line.

    Uncertainty is assessed via permutations or bootstraps. Each of
the  B  permutations/bootstraps will produce an estimated  null
profile  from which we can define  null candidate regions . For
each observed candidate region we determine how many null regions
are  more extreme  (longer and higher average value). The
p.value  is the percent of candidate regions obtained from the
permutations/boostraps that are as extreme as the observed region.
These p-values should be interpreted with care as the theoretical
proporties are not well understood.