Bumphunter Algorithm Returns Regions with very few CpGs
0
0
Entering edit mode
graggsd • 0
@graggsd-11168
Last seen 5.4 years ago

Topic: I have a question related to the appropriateness of making a slight modification to the bumphunter function for my analysis.

Scenario: I have an Illumina 450K dataset that has been QC'd and normalized implementing the minfi package and have used the bumphunter function to search for DMRs. When sorting the results table by descending FWER, the top two regions appear to contain 3 and 2 CpGs respectively (up to 13 CpGs in the entire cluster). When visually plotting the CpG-wise methylation values and model coefficients for the corresponding regions, the results do not match well with that of the bumphunter output table. For example, for the 3-CpG region mentioned earlier, bumphunter reports an effect size of -0.20, but CpG-wise plots reveals that 2 of these CpGs have a positive coefficient value, one has a negative value, and none approach an absolute value 0.20. This leads me to believe that the reported effect size for these top two regions (and perhaps others containing only a small number of CpGs) might be less accurate than for those regions with a larger number of CpGs (which I have plotted on a per-CpG basis and can confirm that they accurately reflect the results from bumphunter).

Question: I have cloned the bumphunter project directory from github, and modified the bumphunter algorithm to remove regions with 5 or fewer CpGs by eliminating the appropriate rows in the nulltabs and tabs dataframes prior to calculation of FWERs. The algorithm runs fine, leaving regions for which I have greater confidence in the effect-size estimate, and (predictably) decreasing the FWER for the remaining regions. This last fact, however, leaves me concerned as I do not wish to be perceived as fishing for positive results. Is this a methodologically appropriate approach?

bumphunter minfi • 638 views