I got permTest working thanks to my questions about how to use permTest here permutation test on one set of data to get p-value of highest peaks. permTest? How? but now I need to understand why my plotted Evobs bar moves.
red is alpha=0.05 (tail), the significance limit
green is Evobs, average distance of QKI binding sites to intron-exon and exon-intron junction before randomization
black is Evperm, average distance of randomized QKI binding sites to intron-exon boundary
Thus, if the green bar is in the red shaded region it means that the original evaluation is extremely unlikely and so the p-value will be significant.
https://bioconductor.org/packages/3.7/bioc/vignettes/regioneR/inst/doc/regioneR.pdf
I expected for my green bar to be far from the red bar because QKI should be a lot closer to the intron-exon and exon-intron junctions than by chance because QKI is involved in exon splicing. But I do not understand why my green lines are the same only when ntimes is the same (1000) and different when ntimes is different (10). The greeen line is the original data not the randomized data, right? So why would ntimes make a difference?
Also what form should exonintronJunctions be? I am going to use a bed file but what should columns 2 and 3 be?
What I used originally:
chr1 30039 30563 uc057aty.1_intron_0_0_chr1_30040_f 0 +
chr1 30667 30975 uc057aty.1_intron_1_0_chr1_30668_f 0 +
or I could try this later:
chr1 30039 30039 uc057aty.1_intron_0_0_chr1_30040_f 0 +
chr1 30563 30563 uc057aty.1_intron_0_0_chr1_30040_f 0 +
chr1 30667 30667 uc057aty.1_intron_1_0_chr1_30668_f 0 +
chr1 30975 30975 uc057aty.1_intron_1_0_chr1_30668_f 0 +
permTest alternative="auto" alternative chosen to be greater.
pt = permTest(A=QKI, B=exonintronJunctions, ntimes=1000, genome=hg38genome, mask=myMask, randomize.function=randomizeRegions, evaluate.function=meanDistance, alternative="auto", min.parallel=1000, force.parallel=TRUE, randomize.function.name=NULL, evaluate.function.name=NULL, verbose=FALSE)
Image of resulting plot: https://drive.google.com/open?id=1fxWQYKJWP6RqhoI3sEHBzCk2oNUTyd3O
permTest alternative="less" shows why I should use greater not less. Note the pt2 green line has the same meanDistance as pt
pt2 = permTest(A=QKI, B=exonintronJunctions, ntimes=1000, genome=hg38genome, mask=myMask, randomize.function=randomizeRegions, evaluate.function=meanDistance, alternative="less", min.parallel=1000, force.parallel=TRUE, randomize.function.name=NULL, evaluate.function.name=NULL, verbose=FALSE)
Image of resulting plot: https://drive.google.com/open?id=1re8JP3y-mxSJI0xoGT_1nENxEIPggvdF
Note ptTest green line is ~6,000 nowhere near the other two green lines ~30,000
ptTest = permTest(A=QKI, B=exonintronJunctions, ntimes=10, genome=hg38genome, mask=myMask, randomize.function=randomizeRegions, evaluate.function=meanDistance, alternative="auto", min.parallel=1000, force.parallel=TRUE, randomize.function.name=NULL, evaluate.function.name=NULL, verbose=FALSE)
Image of resulting plot: https://drive.google.com/open?id=1YVvQjnilFXNYKFO8S_V0-Mizou11CoCk