FDR calculation by combineTests in csaw
1
0
Entering edit mode
renanec ▴ 20
@renanec-9080
Last seen 7.3 years ago
United States

Hi there,

Thanks for putting together a really nice tutorial for " From reads to regions: a Bioconductor workflow to detect differential binding in ChIP-seq data". I had a quick question regarding the function combineTests.

When I run the example piece of code I get the exact same FDR for very different p-values (see below). In the most extreme cases with some of my data all FDR values are equal independently of the p-value. Can somebody clarify this?

Many Thanks,

Renan

library(csaw)

ids <- round(runif(100, 1, 10)) 
tab <- data.frame(logFC=rnorm(100), logCPM=rnorm(100), PValue=rbeta(100, 1, 2))
head(tab)
combined <- combineTests(ids, tab)
head(combined)

Output

nWindows logFC.up logFC.down     PValue       FDR
1        3        0          1 0.29353630 0.4050539
2       11        4          3 0.06942862 0.3471431
3       14        8          4 0.23326239 0.3887707
4       11        6          2 0.51039737 0.5103974
5        9        1          2 0.14480150 0.3877034
6        6        1          4 0.35435897 0.4050539

`

Session Info

R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.1 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] csaw_1.4.0                 SummarizedExperiment_1.0.1 Biobase_2.30.0             GenomicRanges_1.22.1       GenomeInfoDb_1.6.1        
[6] IRanges_2.4.1              S4Vectors_0.8.2            BiocGenerics_0.16.1       

loaded via a namespace (and not attached):
 [1] AnnotationDbi_1.32.0    XVector_0.10.0          edgeR_3.12.0            zlibbioc_1.16.0         GenomicAlignments_1.6.1
 [6] BiocParallel_1.4.0      tools_3.2.2             DBI_0.3.1               lambda.r_1.1.7          futile.logger_1.4.1    
[11] rtracklayer_1.30.1      futile.options_1.0.0    bitops_1.0-6            biomaRt_2.26.0          RCurl_1.95-4.7         
[16] RSQLite_1.0.0           limma_3.26.3            GenomicFeatures_1.22.4  Rsamtools_1.22.0        Biostrings_2.38.1      
[21] XML_3.98-1.3
csaw • 879 views
ADD COMMENT
1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 19 hours ago
The city by the bay

This is a consequence of the BH method to control the FDR (or specifically, the reverse method in p.adjust that computes the adjusted p-values corresponding to some FDR threshold). By itself, the raw FDR calculation is not a monotonic transformation of the p-values; it is defined as the p-value*N/r, where N is the total number of tests and r is the rank of the p-value (1 for the smallest p-value, N for the largest). You can easily imagine cases where this transformation results in a larger adjusted p-value for, e.g., the smallest p-value compared to the second-smallest p-value (if the latter is less than two-fold larger than the former). This would be silly, as you'd expect the smallest p-value to yield a smaller adjusted p-value/FDR than the second-smallest one.

To avoid such situations, a monotonic transformation is applied whereby the adjusted value for each p-value is replaced with the minimum of the adjusted p-values all of the larger p-values, if that minimum is smaller than its adjusted value. This results in what you observed, where blocks of tests (i.e., genes, regions, whatever) with different p-values have the same adjusted p-value/FDR. So in short, it's not a bug, it's just a little oddity stemming from how the FDR is calculated.

ADD COMMENT

Login before adding your answer.

Traffic: 358 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6