I am aware that non-reproducible questions are annoying. However, I am not sure how to reproduce my problem without my original data (and consequently to large to be included here).
I have two groups of genomic ranges, 'Nre' and 'Re', and I compared separately how random are their overlap with CpG sites in genome with regioneR package.
library(regioneR) ptNre <- overlapPermTest(A=Nre, B=CpG, ntimes=100, mc.cores=8, genome=genome, force.parallel=TRUE, mc.set.seed=FALSE, non.overlapping=FALSE) ptRe <- overlapPermTest(A=Re, B=CpG, ntimes=100, mc.cores=8, genome=genome, force.parallel=TRUE, mc.set.seed=FALSE, non.overlapping=FALSE)
Then I checked in a loop of simulations what I could expect by random using the same function as in
overlapPermTest (randomizeRegions):
library(foreach)
library(doMC)
RanNreNumOv <- GRangesList()
RanReNumOv <- GRangesList()
RanNreNumOv <- foreach(i=1:100) %dopar% {
length(subsetByOverlaps(FEATURE, randomizeRegions(Nre, genome=genome, non.overlapping=TRUE), ignore.strand=TRUE))}
RanReNumOv <- foreach(i=1:100) %dopar% {
length(subsetByOverlaps(FEATURE, randomizeRegions(Re, genome=genome, non.overlapping=TRUE), ignore.strand=TRUE))}
> ptNre
[[1]]
P-value: 0.0008999100089991
Z-score: -3.0158
Number of iterations: 10000
Alternative: less
Evaluation of the original region set: 44678
Evaluation function: numOverlaps
Randomization function: randomizeRegions
> mean(unlist(RanNreNumOv))
[1] 43016.93
> ptRe
[[1]]
P-value: 9.99900009999e-05
Z-score: 9.9826
Number of iterations: 10000
Alternative: greater
Evaluation of the original region set: 11950
Evaluation function: numOverlaps
Randomization function: randomizeRegions
> mean(unlist(RanReNumOv))
[1] 7151.644
Both sets of genomic ranges displayed higher number of overlaps than the average of those that I simulated by chance. However, in the 'Nre' set the alternative was less and in the 'Re' was greater in the overlapPermTest.
Am I missing something? I would be grateful for any help to interpret the results here.
