Question

Bumphunter Algorithm Returns Regions with very few CpGs

0

Entering edit mode

graggsd • 0

@graggsd-11168

Last seen 9.2 years ago

Topic: I have a question related to the appropriateness of making a slight modification to the `bumphunter` function for my analysis.

Scenario: I have an Illumina 450K dataset that has been QC'd and normalized implementing the `minfi` package and have used the `bumphunter` function to search for DMRs. When sorting the results table by descending FWER, the top two regions appear to contain 3 and 2 CpGs respectively (up to 13 CpGs in the entire cluster). When visually plotting the CpG-wise methylation values and model coefficients for the corresponding regions, the results do not match well with that of the `bumphunter` output table. For example, for the 3-CpG region mentioned earlier, bumphunter reports an effect size of -0.20, but CpG-wise plots reveals that 2 of these CpGs have a positive coefficient value, one has a negative value, and none approach an absolute value 0.20. This leads me to believe that the reported effect size for these top two regions (and perhaps others containing only a small number of CpGs) might be less accurate than for those regions with a larger number of CpGs (which I have plotted on a per-CpG basis and can confirm that they accurately reflect the results from bumphunter).

Question: I have cloned the bumphunter project directory from github, and modified the bumphunter algorithm to remove regions with 5 or fewer CpGs by eliminating the appropriate rows in the `nulltabs` and `tabs` dataframes prior to calculation of FWERs. The algorithm runs fine, leaving regions for which I have greater confidence in the effect-size estimate, and (predictably) decreasing the FWER for the remaining regions. This last fact, however, leaves me concerned as I do not wish to be perceived as fishing for positive results. Is this a methodologically appropriate approach?

bumphunter minfi • 1.6k views

ADD COMMENT • link written 9.3 years ago by graggsd • 0