Question

CSAW: getBestTest p-value

0

Entering edit mode

sergio.espeso-gil • 0

@sergioespeso-gil-6997

Last seen 4.2 years ago

New York

Hi Aaron,

Sorry for being so annoying asking questions, at the end I have decided that predefined regions is not worth it by now (at least), maybe at the future. I have arrived to a good filtering normalisation options for my ChIPs (I think), and I am just merging and getting the windows with strongest differential binding as follows:

merged_ab<- mergeWindows(rowRanges(filtered.data_ab),tol=1000L)
tab.best_ab<-getBestTest(merged_ab$id, results_ab$table)
ofile<-gzfile("/results_ab.gz", open="w")
write.table(data.frame(as.data.frame(merged_ab$region)[,1:3], tab.best_ab), file=ofile, row.names=FALSE, quote=FALSE, sep="\t")
close(ofile)

As I understood getBestTest is giving the best p-value per cluster , right? But it doesn't mean that is significant, right? I need to sort out those that are not.

Can I also do it for the FDR?

Thanks a lot!! You will not get rid of me XD , I will put you in acknowledgements in paper , promised :-)

Sergio

csaw • 938 views

ADD COMMENT • link updated 8.6 years ago by Gordon Smyth 50k • written 8.6 years ago by sergio.espeso-gil • 0

score 2 · Answer 1 · 2015-09-16

To assess the significance of a cluster, you should be using combineTests, rather than getBestTest. The latter will use the most significant window in the cluster, but this requires use of a Bonferroni correction to maintain type I error control. This is much more conservative than the Simes' method in combineTests, especially for correlated windows. As such, I use combineTests to get the p-value for each cluster, and only use getBestTest to identify the location and log-fold change of the best window for descriptive purposes.

To answer your other questions; getBestTest will only identify the lowest p-value in each cluster, it makes no guarantees as to whether it's significant or not. You can identify the significant clusters by asking for those with an FDR below some threshold, e.g., 5%. Also, there's no point doing this for the FDR, as the window-level FDR is not the relevant statistic for region-based analyses. What you want is the region-level FDR, which is computed after you've already gotten the best or combined p-value (from getBestTest or combineTests, respectively) for each cluster/region.