I am trying to estimate sources of heterogeneity in methylation data in addition to some known sources (i.e., I have batch and age but would also like to correct for smoking, unmeasured technical artifacts, and cellular heterogeneity). When I use num.sv and the default "be" method, I get 12 SVs; when I specify the "leek" method, I get 0 SVs. Is there a reason why the two methods might behave so differently?
I am confused about whether one method is generally recommended over the other, as the SVA vignette shows an example with "leek": https://www.bioconductor.org/packages/devel/bioc/vignettes/sva/inst/doc/sva.pdf
...while the documentation for the SVA command defaults to "be" if a number is not specified and cautions that the "numSVmethod" parameter "... should not be adapted by the user unless they are an expert": https://www.rdocumentation.org/packages/sva/versions/3.20.0/topics/sva
My question is partially answered here: svaseq: how many and which surrogate variables to pick, and maybe there is not a "best" way to estimate the number of SVs to include. Still, I would like to better understand the differences between the two methods.