IHWpaper: choice of `nbins=4` in `proteomics_example_analysis.R`
1
1
Entering edit mode
@pauljohnston-9518
Last seen 3 months ago
United Kingdom

I noticed that nbins is set manually in the ihw call in proteomics_example_analysis.R from IHWpaper:

ihw_res <- ihw(proteomics_df$pvalue,proteomics_df$X..peptides, .1, nbins=4, 

                 nsplits_internal=5, lambdas=seq(0,3,length=20))

If I change it to nbins = "auto", then I get the reduction to BH message:

Only 1 bin; IHW reduces to Benjamini Hochberg (uniform weights)

My question is how did you arrive at nbins=4? And how should I go about setting nbins for my own similar proteomics data where  nbins = "auto" also reduces to BH?

 

ihw • 1.2k views
ADD COMMENT
4
Entering edit mode
@nikos-ignatiadis-8823
Last seen 4.9 years ago
Heidelberg

Hi Paul,

it's a good question. As is so often the case, this is just a bias/variance/computation time tradeoff. Here by bias-variance I refer to the estimation of the weight function from the rest of the folds and applying it to the held-out fold. In general, in the default choices in the IHW package we have opted for a more conservative route that will often shrink weights towards uniform, thus recovering results equal or very close to those of Benjamini-Hochberg. It is however important to note that this affects power, not FDR control.

By default (choice "auto") it is required that at least 1500 p-values are present in every bin. Through experience this leads to a good estimation of the underlying distribution. However, you might argue that even a somewhat noisier estimate of the distribution could lead to good estimation of the weight function, especially since we add a total variation ("fused lasso") type penalty. And this is indeed true -- but requires a good choice of the regularization parameter. To get a better handle on this we need to do cross validation nested within the fold splitting (specified by `nsplits_interal`). However, this increases computation time by quite a bit, which is why it is only done once by default (rather than e.g. 5 as in your example).

In any case, to answer your question: If you have a relatively small multiple testing situation (such as the proteomics example with only 2666 hypotheses), but still want to apply IHW, then you can use a larger number of bins than what is set by default. If you do this, I strongly encourage you to also increase `nsplits_internal`. Finally, I would not recommend using bins with less than 600 hypotheses or so.

Hope this helps,
Nikos

ADD COMMENT

Login before adding your answer.

Traffic: 483 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6