CSAW: getBestTest p-value
1
0
Entering edit mode
@sergioespeso-gil-6997
Last seen 4.8 years ago
New York

Hi Aaron, 

Sorry for being so annoying asking questions, at the end I have decided that predefined regions is not worth it by now (at least), maybe at the future. I have arrived to a good filtering normalisation options for my ChIPs (I think), and I am just merging and getting the windows with strongest differential binding as follows:

merged_ab<- mergeWindows(rowRanges(filtered.data_ab),tol=1000L)
tab.best_ab<-getBestTest(merged_ab$id, results_ab$table)
ofile<-gzfile("/results_ab.gz", open="w")
write.table(data.frame(as.data.frame(merged_ab$region)[,1:3], tab.best_ab), file=ofile, row.names=FALSE, quote=FALSE, sep="\t")
close(ofile)

As I understood getBestTest is giving the best p-value per cluster , right? But it doesn't mean that is significant, right? I need to sort out those that are not.

Can I also do it for the FDR? 

Thanks a lot!! You will not get rid of me XD , I will put you in acknowledgements in paper , promised :-)

Sergio

csaw • 1.1k views
ADD COMMENT
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 9 hours ago
The city by the bay

To assess the significance of a cluster, you should be using combineTests, rather than getBestTest. The latter will use the most significant window in the cluster, but this requires use of a Bonferroni correction to maintain type I error control. This is much more conservative than the Simes' method in combineTests, especially for correlated windows. As such, I use combineTests to get the p-value for each cluster, and only use getBestTest to identify the location and log-fold change of the best window for descriptive purposes.

To answer your other questions; getBestTest will only identify the lowest p-value in each cluster, it makes no guarantees as to whether it's significant or not. You can identify the significant clusters by asking for those with an FDR below some threshold, e.g., 5%. Also, there's no point doing this for the FDR, as the window-level FDR is not the relevant statistic for region-based analyses. What you want is the region-level FDR, which is computed after you've already gotten the best or combined p-value (from getBestTest or combineTests, respectively) for each cluster/region.

ADD COMMENT
0
Entering edit mode

Ok, I see. For whatever reason I though that they could also sort by significance, but in fact they both report a p-value per cluster following different strategies. To sort by significance I need to do section 7.2, rigth? 

ADD REPLY
0
Entering edit mode

Yes, that's right.

ADD REPLY

Login before adding your answer.

Traffic: 788 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6