Question: IHWpaper: choice of `nbins=4` in `proteomics_example_analysis.R`
gravatar for paul.johnston
2.3 years ago by
paul.johnston10 wrote:

I noticed that nbins is set manually in the ihw call in proteomics_example_analysis.R from IHWpaper:

ihw_res <- ihw(proteomics_df$pvalue,proteomics_df$X..peptides, .1, nbins=4, 

                 nsplits_internal=5, lambdas=seq(0,3,length=20))

If I change it to nbins = "auto", then I get the reduction to BH message:

Only 1 bin; IHW reduces to Benjamini Hochberg (uniform weights)

My question is how did you arrive at nbins=4? And how should I go about setting nbins for my own similar proteomics data where  nbins = "auto" also reduces to BH?


ihw • 387 views
ADD COMMENTlink modified 2.3 years ago by Nikos Ignatiadis160 • written 2.3 years ago by paul.johnston10
Answer: IHWpaper: choice of `nbins=4` in `proteomics_example_analysis.R`
gravatar for Nikos Ignatiadis
2.3 years ago by
Nikos Ignatiadis160 wrote:

Hi Paul,

it's a good question. As is so often the case, this is just a bias/variance/computation time tradeoff. Here by bias-variance I refer to the estimation of the weight function from the rest of the folds and applying it to the held-out fold. In general, in the default choices in the IHW package we have opted for a more conservative route that will often shrink weights towards uniform, thus recovering results equal or very close to those of Benjamini-Hochberg. It is however important to note that this affects power, not FDR control.

By default (choice "auto") it is required that at least 1500 p-values are present in every bin. Through experience this leads to a good estimation of the underlying distribution. However, you might argue that even a somewhat noisier estimate of the distribution could lead to good estimation of the weight function, especially since we add a total variation ("fused lasso") type penalty. And this is indeed true -- but requires a good choice of the regularization parameter. To get a better handle on this we need to do cross validation nested within the fold splitting (specified by `nsplits_interal`). However, this increases computation time by quite a bit, which is why it is only done once by default (rather than e.g. 5 as in your example).

In any case, to answer your question: If you have a relatively small multiple testing situation (such as the proteomics example with only 2666 hypotheses), but still want to apply IHW, then you can use a larger number of bins than what is set by default. If you do this, I strongly encourage you to also increase `nsplits_internal`. Finally, I would not recommend using bins with less than 600 hypotheses or so.

Hope this helps,

ADD COMMENTlink written 2.3 years ago by Nikos Ignatiadis160
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 188 users visited in the last hour