bumphunter output clarification: p values achieved with b=1000: further correction needed?
1
0
Entering edit mode
chelsey.ju • 0
@chelseyju-12100
Last seen 6.8 years ago

Hello,

I have performed the following bumphunter script on my data set:

dmrs <- bumphunter(gset, design = designMatrix, cutoff = 0.05, B=1000, type="M")

I am wondering if the p values received from this need further corrections, or was that already done with the B=1000?

furthermore, I did not include cutoffQ; if I put cutoffQ=0.2 is that the equivalent to having an FDR of 0.2?

thanks!

methylation dmr analysis bumphunter • 2.9k views
3
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States

When you permute (or draw bootstraps) you are trying to estimate the null distribution for each bump and are then comparing your observed value to the null distribution. A p-value estimates the probability of getting your observed result (or larger) under the null distribution, and when you permute you compute that as # permuted bumps > observed bump / # permutations. So that's just an unadjusted p-value.

There are two p-values, for two measures of the bump. The p.value is based on the average log fold change between your two samples (e.g., the average 'height' of the bump), and the p.valueArea is based on the area under the bump, which takes into account both the height and the length of the bump.

There is a multiplicity adjusted p-value for both measures, the fwer column(s), which try to adjust for multiple comparisons by comparing the observed value for each bump to all the permuted bumps, at each permutation round. In other words, for the 'regular' p-value, at each permutation the observed value for bump #1 is compared to the permuted value for bump #1, and if the permuted value is larger, you increment the numerator of the statistic. For the fwer, at each permutation the observed value for bump #1 is compared to the permuted value for bump #1 as well as all the other bumps. If any of the permuted bumps are larger than the observed value for bump #1, then you increment the numerator of the statistic. This is repeated for all the bumps. This is obviously much more conservative.

And this is where all the cutoff arguments come in (cutoff, pickCutoff, pickCutoffQ). These arguments are used to say what is and isn't a bump. If you use the default (which is to use pickCutoffQ = 0.99), then bumphunter will generate the 1000 permutations or bootstraps and then use the 99th percentile of the permuted bumps as the cutoff, and any observed bump that is larger than that will then be considered a 'real' bump, and compared to the permuted distribution.

You absolutely don't want to do something like pickCutoffQ = 0.2. You want to be pretty close to 1, like 0.99 or 0.95. If you used 0.2 you would almost surely have like 13 bazillion bumps, most of which would be really small, and it would take forever to compute.

0
Entering edit mode

Hi James,

A wierd thing though happens that I am getting FWER < p-value which in ideal case you dont expect meaning p-value is being more stringent than corrected p-value. Any idea why it could happen would be really helpful.

1
Entering edit mode

It's easy to imagine how that would happen, just by re-reading what I said above. The p-value is computed by comparing the permuted bumps to the observed bump at that position. The FWER is computed by comparing the observed bump versus all the permuted bumps at all other positions. If the bump in question has an observed area that is much larger than all others (but not that much larger than the permuted bumps at that position), what do you think would happen?

I think the following section of the help page for bumphunter is instructive, particularly the last line.

    Uncertainty is assessed via permutations or bootstraps. Each of
the  B  permutations/bootstraps will produce an estimated  null
profile  from which we can define  null candidate regions . For
each observed candidate region we determine how many null regions
are  more extreme  (longer and higher average value). The
p.value  is the percent of candidate regions obtained from the
permutations/boostraps that are as extreme as the observed region.
These p-values should be interpreted with care as the theoretical
proporties are not well understood.
0
Entering edit mode

Thanks for the reply and it really helps me to understand it better. But coming to the point that the observed bump has greater area than other bumps should ideally be true for only few bumps and not for a general majority which seems in my case in case they have to be significant.