I have 3 related questions about the Bumphunter implementation in Minfi. Help with one or all of them is much appreciated!
1. Clarification of what Bumphunter is doing:
After reading the bumphunter section of the Minfi tutorial:
which refers to this bump hunting paper:
... I got the impression that the Minfi implementation of bumphunter would automatically run SVA and include the SV covariates in models. However, after reading the Bumphunter User's Guide:
... it sounds like Minfi does NOT run SVA ("Notably, batch effect removal and the application of the bootstrap to linear models of Efron and Tibshirani  need additional code.").
Can someone clarify whether the Minfi bumphunter runs SVA?
2. Choice of nullMethod:
If I run SVA separately and then include the resulting SV covariates in my design matrix when running Bumphunter, using the default "permutation" nullMethod, I get a warning:
> dmrs.cd8.b1000.pickCutoff.sva <- bumphunter(GRSet.norm.na.good.noSnp.noXreact.CD4, design = designMatrix.cd4, pickCutoff=T, B=1000, type="M") [bumphunterEngine] The use of the permutation test is not recommended with multiple covariates, (ncol(design)>2). Consider changing 'nullMethod' changed to 'bootstrap' instead. See vignette for more information.
Why is the permutation method not recommended for models that include covariates? Here is what the bumphunter vignette says:
"However, when X has columns other than those representing an intercept term and the
covariate of interest, the permutation test approach is not recommended. The function will
run but give a warning. A method based on the bootstrap for linear models of Efron and
Tibshirani  may be more appropriate but this is not currently implemented."
However, when I run with the "bootstrap" nullMethod instead, I get a lot of "Inf" values and pvalues of 0 in the results table:
> dmrs.cd4.b1000.pickCutoff.sva.boot <- bumphunter(GRSet.norm.na.good.noSnp.noXreact.CD4, design = designMatrix.cd4, pickCutoff=T, B=1000, type="M", nullMethod='bootstrap') > head(dmrs.cd4.b1000.pickCutoff.sva.boot$table) chr start end value area cluster indexStart indexEnd L 222 chr1 101491640 101491640 Inf Inf 19234 39105 39105 1 518 chr10 106014410 106014410 Inf Inf 53151 434838 434838 1 713 chr11 70601971 70601971 Inf Inf 69955 472762 472762 1 716 chr11 71791563 71791563 Inf Inf 70239 473310 473310 1 742 chr11 93862060 93862060 Inf Inf 73067 478541 478541 1 1179 chr14 69444462 69444462 Inf Inf 116139 563513 563513 1 clusterL p.value fwer p.valueArea fwerArea 222 15 0 0 0 0 518 18 0 0 0 0 713 4 0 0 0 0 716 20 0 0 0 0 742 15 0 0 0 0 1179 22 0 0 0 0
It appears that something is going wrong with the bootstrap method when including covariates. How should I interpret the values and areas of "Inf"? The permutation method, in contrast, seems to work great; I just don't know whether there is a reason I shouldn't trust those results.
3. Different p values and error rates.
Finally, I am confused about the two different p values and family-wise error rates that are output by the program (
p.value, fwer, p.valueArea, fwerArea). Can someone please explain what the differences are? My impression is that "p.value" is the empirical p value from the permutation test (the fraction of null areas greater than the area that was actually observed), and that the "fwer" is that p.value adjusted for multiple hypotheses (by Bonferroni or another method?), but please let me know if this is incorrect.