#### The support.bioconductor.org editor has been updated to markdown! Please see more info at: Tutorial: Updated Support Site Editor

Question: IHWpaper: choice of nbins=4 in proteomics_example_analysis.R
1
20 months ago by
paul.johnston10 wrote:

I noticed that nbins is set manually in the ihw call in proteomics_example_analysis.R from IHWpaper:

ihw_res <- ihw(proteomics_df$pvalue,proteomics_df$X..peptides, .1, nbins=4,

nsplits_internal=5, lambdas=seq(0,3,length=20))

If I change it to nbins = "auto", then I get the reduction to BH message:

Only 1 bin; IHW reduces to Benjamini Hochberg (uniform weights)

My question is how did you arrive at nbins=4? And how should I go about setting nbins for my own similar proteomics data where  nbins = "auto" also reduces to BH?

ihw • 262 views
modified 20 months ago by Nikos Ignatiadis160 • written 20 months ago by paul.johnston10
Answer: IHWpaper: choice of nbins=4 in proteomics_example_analysis.R
4
20 months ago by
Heidelberg

Hi Paul,

it's a good question. As is so often the case, this is just a bias/variance/computation time tradeoff. Here by bias-variance I refer to the estimation of the weight function from the rest of the folds and applying it to the held-out fold. In general, in the default choices in the IHW package we have opted for a more conservative route that will often shrink weights towards uniform, thus recovering results equal or very close to those of Benjamini-Hochberg. It is however important to note that this affects power, not FDR control.

By default (choice "auto") it is required that at least 1500 p-values are present in every bin. Through experience this leads to a good estimation of the underlying distribution. However, you might argue that even a somewhat noisier estimate of the distribution could lead to good estimation of the weight function, especially since we add a total variation ("fused lasso") type penalty. And this is indeed true -- but requires a good choice of the regularization parameter. To get a better handle on this we need to do cross validation nested within the fold splitting (specified by nsplits_interal). However, this increases computation time by quite a bit, which is why it is only done once by default (rather than e.g. 5 as in your example).

In any case, to answer your question: If you have a relatively small multiple testing situation (such as the proteomics example with only 2666 hypotheses), but still want to apply IHW, then you can use a larger number of bins than what is set by default. If you do this, I strongly encourage you to also increase nsplits_internal. Finally, I would not recommend using bins with less than 600 hypotheses or so.

Hope this helps,
Nikos