Question

significance of ChIPseq peak overlap

0

Entering edit mode

Gyan Prakash Mishra ▴ 10

@gyan-prakash-mishra-8555

Last seen 21 months ago

INDIA

Hi,

I am using enrichPeakOverlap of ChIPseeker to find significance of peak overlap.

Here is my result of the overlap with no. of shuffle i.e nshuffle=10000

qSample	tSample	qLen	tLen	N_OL		pvalue	p.adjust
TF1_peaks.bed	Irf1_0.peaks.bed	13094	17119	2633	0.2010844662	0.00009999	0.00009999
TF1_peaks.bed	Irf4_0.peaks.bed	13094	13500	3420	0.2611883305	0.00009999	0.00009999
TF1_peaks.bed	Junb_0.peaks.bed	13094	13460	2387	0.1822972354	0.00009999	0.00009999
TF1_peaks.bed	K27Ac_0.peaks.bed	13094	48386	8275	0.6319688407	0.00009999	0.00009999
TF1_peaks.bed	PU1_0.peaks.bed	13094	68692	7110	0.5429967924	0.00009999	0.00009999
TF1_peaks.bed	Rela_0.peaks.bed	13094	3544	350	0.0267297999	0.00009999	0.00009999
TF1_peaks.bed	Runx_0.peaks.bed	13094	2558	765	0.0584237055	0.00009999	0.00009999
TF1_peaks.bed	Stat1_0.peaks.bed	13094	423	21	0.001603788	0.00009999	0.00009999
TF1_peaks.bed	Stat2_0.peaks.bed	13094	69	14	0.001069192	0.00009999	0.00009999

Here P value is same for all comparison . overlap of TF1 with peaks having highest overlap is coming out to be as significant as lowest overlap with other peaks. I could not able to find the reason for same . I would like to understand what could be the possible reason.

Thanks

chipseeker • 2.7k views

ADD COMMENT • link updated 9.6 years ago by Guangchuang Yu ★ 1.2k • written 9.6 years ago by Gyan Prakash Mishra ▴ 10

0

Entering edit mode

Hello, I have the same question

When I perform enrichPeakoverlap on your files I get this output

ARmo_1nM GSM1174480_ARmo_0M_peaks.bed.gz GSM1174481_ARmo_1nM_peaks.bed.gz 812 2296 26
ARmo_100nM GSM1174480_ARmo_0M_peaks.bed.gz GSM1174482_ARmo_100nM_peaks.bed.gz 812 1359 24
CBX6_BF GSM1174480_ARmo_0M_peaks.bed.gz GSM1295076_CBX6_BF_ChipSeq_mergedReps_peaks.bed.gz 812 1331 0
CBX7_BF GSM1174480_ARmo_0M_peaks.bed.gz GSM1295077_CBX7_BF_ChipSeq_mergedReps_peaks.bed.gz 812 1663 0
pvalue p.adjust
ARmo_1nM 0.000999001 0.001998002
ARmo_100nM 0.000999001 0.001998002
CBX6_BF 0.390609391 0.390609391
CBX7_BF 0.390609391 0.390609391

And then when I perform mine

PGP1vs1D4<- "Combined_PGP1vs1D4_H3K4me1_2fold.regions.bed"
PGP1vs1G6<- "Combined_PGP1vs1G6_H3K4me1_2fold.regions.bed"
PGP1vs1G1<- "Combined_PGP1vs1G1_H3K4me1_2fold.regions.bed"
PGP1vsD2355N<- "Combined_PGP1vsD2355N_H3K4me1_2fold.regions.bed"
PGP1vsR2111W<- "Combined_PGP1vsR2111W_H3K4me1_2fold.regions.bed"

test<- enrichPeakOverlap(queryPeak = PGP1vs1D4, targetPeak = unlist(c(PGP1vs1G6, PGP1vs1G1, PGP1vsD2355N, PGP1vsR2111W)) , TxDb = NULL, pAdjustMethod = "BH", nShuffle = 1000, chainFile = NULL, verbose = TRUE)

test

1 Combined_PGP1vs1D4_H3K4me1_2fold.regions.bed Combined_PGP1vs1G6_H3K4me1_2fold.regions.bed 1557 1626 254
2 Combined_PGP1vs1D4_H3K4me1_2fold.regions.bed Combined_PGP1vs1G1_H3K4me1_2fold.regions.bed 1557 967 57
3 Combined_PGP1vs1D4_H3K4me1_2fold.regions.bed Combined_PGP1vsD2355N_H3K4me1_2fold.regions.bed 1557 703 167
4 Combined_PGP1vs1D4_H3K4me1_2fold.regions.bed Combined_PGP1vsR2111W_H3K4me1_2fold.regions.bed 1557 608 23
pvalue p.adjust
1 0.000999001 0.000999001
2 0.000999001 0.000999001
3 0.000999001 0.000999001
4 0.000999001 0.000999001

Why are all my p values and adjusted p values the same? Please help

ADD REPLY • link 8.3 years ago dr.lauren.wasson • 0

0

Entering edit mode

try to increase the number of nShuffle, e.g. using 10000

Add comment

ADD REPLY • link 8.3 years ago Guangchuang Yu ★ 1.2k

0

Entering edit mode

Thank you for your quick response. When I do that, I get the same result in that all my ps and adjusted ps are the same. They're just a lower p value (9.99e-5)

1 Combined_PGP1vs1D4_H3K4me1_2fold.regions.bed Combined_PGP1vs1G6_H3K4me1_2fold.regions.bed 1557 1626 254
2 Combined_PGP1vs1D4_H3K4me1_2fold.regions.bed Combined_PGP1vs1G1_H3K4me1_2fold.regions.bed 1557 967 57
3 Combined_PGP1vs1D4_H3K4me1_2fold.regions.bed Combined_PGP1vsD2355N_H3K4me1_2fold.regions.bed 1557 703 167
4 Combined_PGP1vs1D4_H3K4me1_2fold.regions.bed Combined_PGP1vsR2111W_H3K4me1_2fold.regions.bed 1557 608 23
pvalue p.adjust
1 9.999e-05 9.999e-05
2 9.999e-05 9.999e-05
3 9.999e-05 9.999e-05
4 9.999e-05 9.999e-05

ADD REPLY • link 8.3 years ago dr.lauren.wasson • 0

0

Entering edit mode

Hi, did you ever find a solution to this?

I have the same problem, I am always getting the same p-values which is essentially just nshuffle/(nshuffle+1). I have a simple test case below:

> t.gr<-GRanges(seqnames = "chr1",ranges = IRanges(start=seq(1,1000000,1000),width = 100),strand = "+")
> t2.gr<-GRanges(seqnames = "chr2",ranges = IRanges(start=seq(1,1000000,1000),width = 100),strand = "+")
> enrichPeakOverlap(t.gr,list(t.gr,t2.gr),TxDb = txdb,nShuffle = 100,verbose = T)
>> permutation test of peak overlap...		 2018-03-30 20:50:07 
  |                                                                         |   0%
    qSample     tSample qLen tLen N_OL     pvalue   p.adjust
1 queryPeak targetPeak1 1000 1000 1000 0.00990099 0.00990099
2 queryPeak targetPeak2 1000 1000    0 0.00990099 0.00990099

> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS

Matrix products: default
BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] stats4    parallel  grid      stats     graphics  grDevices utils    
 [8] datasets  methods   base     

other attached packages:
 [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
 [2] GenomicFeatures_1.30.3                 
 [3] AnnotationDbi_1.40.0                   
 [4] Biobase_2.38.0                         
 [5] GenomicRanges_1.30.3                   
 [6] GenomeInfoDb_1.14.0                    
 [7] IRanges_2.12.0                         
 [8] S4Vectors_0.16.0                       
 [9] BiocGenerics_0.24.0                    
[10] genomation_1.11.3                      
[11] ChIPseeker_1.14.2                      

loaded via a namespace (and not attached):
 [1] httr_1.3.1                 RMySQL_0.10.14            
 [3] bit64_0.9-7                splines_3.4.4             
 [5] gtools_3.5.0               assertthat_0.2.0          
 [7] DO.db_2.9                  rvcheck_0.0.9             
 [9] blob_1.1.0                 BSgenome_1.46.0           
[11] GenomeInfoDbData_1.0.0     Rsamtools_1.30.0          
[13] impute_1.52.0              yaml_2.1.18               
[15] progress_1.1.2             pillar_1.2.1              
[17] RSQLite_2.0                lattice_0.20-35           
[19] glue_1.2.0                 digest_0.6.15             
[21] RColorBrewer_1.1-2         XVector_0.18.0            
[23] qvalue_2.10.0              colorspace_1.3-2          
[25] Matrix_1.2-12              plyr_1.8.4                
[27] XML_3.98-1.10              pkgconfig_2.0.1           
[29] biomaRt_2.34.2             zlibbioc_1.24.0           
[31] GO.db_3.5.0                scales_0.5.0.9000         
[33] gdata_2.18.0               BiocParallel_1.12.0       
[35] tibble_1.4.2               ggplot2_2.2.1.9000        
[37] seqPattern_1.10.0          UpSetR_1.3.3              
[39] SummarizedExperiment_1.8.1 lazyeval_0.2.1            
[41] magrittr_1.5               memoise_1.1.0             
[43] DOSE_3.4.0                 gplots_3.0.1              
[45] tools_3.4.4                data.table_1.10.4-3       
[47] prettyunits_1.0.2          hms_0.4.2                 
[49] gridBase_0.4-7             matrixStats_0.53.1        
[51] stringr_1.3.0              munsell_0.4.3             
[53] plotrix_3.7                DelayedArray_0.4.1        
[55] bindrcpp_0.2               Biostrings_2.46.0         
[57] compiler_3.4.4             caTools_1.17.1            
[59] rlang_0.2.0.9000           RCurl_1.95-4.10           
[61] igraph_1.2.1               bitops_1.0-6              
[63] boot_1.3-20                gtable_0.2.0              
[65] DBI_0.8                    reshape2_1.4.3            
[67] R6_2.2.2                   GenomicAlignments_1.14.1  
[69] gridExtra_2.3              dplyr_0.7.4               
[71] rtracklayer_1.38.3         bit_1.1-12                
[73] bindr_0.1.1                fastmatch_1.1-0           
[75] fgsea_1.4.1                readr_1.1.1               
[77] KernSmooth_2.23-15         GOSemSim_2.4.1            
[79] stringi_1.1.7              Rcpp_0.12.16

ADD REPLY • link 7.9 years ago shaunmwebb • 0

0

Entering edit mode

I also can't get pool=F to work when using granges, I think this is because targetFiles[i] is used as input recursively rather than target.gr[[i]]?

> enrichPeakOverlap(t.gr,list(t.gr,t2.gr),TxDb = txdb,nShuffle = 100,verbose = T,pool=F)
Error in file.exists(dir) : invalid 'file' argument

ADD REPLY • link 7.9 years ago shaunmwebb • 0

score 0 · Answer 1 · 2016-07-06

0

Entering edit mode

Guangchuang Yu ★ 1.2k

@guangchuang-yu-5419

Last seen 4 months ago

China/Guangzhou/Southern Medical Univer…

Your output is weird, not only all pvalues and p.adj are identical, but also it contains an additional column.

You need to make a reproducible example, otherwise, I can't help.

require(ChIPseeker)

ff= getSampleFiles()

enrichPeakOverlap(ff[[1]], unlist(ff[-1])) -> x

> x
                                   qSample
ARmo_1nM   GSM1174480_ARmo_0M_peaks.bed.gz
ARmo_100nM GSM1174480_ARmo_0M_peaks.bed.gz
CBX6_BF    GSM1174480_ARmo_0M_peaks.bed.gz
CBX7_BF    GSM1174480_ARmo_0M_peaks.bed.gz
                                                      tSample qLen tLen N_OL
ARmo_1nM                     GSM1174481_ARmo_1nM_peaks.bed.gz  812 2296   26
ARmo_100nM                 GSM1174482_ARmo_100nM_peaks.bed.gz  812 1359   24
CBX6_BF    GSM1295076_CBX6_BF_ChipSeq_mergedReps_peaks.bed.gz  812 1331    0
CBX7_BF    GSM1295077_CBX7_BF_ChipSeq_mergedReps_peaks.bed.gz  812 1663    0
               pvalue   p.adjust
ARmo_1nM   0.04495504 0.08991009
ARmo_100nM 0.04495504 0.08991009
CBX6_BF    0.47452547 0.47452547
CBX7_BF    0.47452547 0.47452547

ADD COMMENT • link 9.6 years ago Guangchuang Yu ★ 1.2k

0

Entering edit mode

Oh sorry about that. I add that additional column manually.

Here is the exact output.

	qSample	tSample	qLen	tLen	N_OL	pvalue	p.adjust
1	TF1_peaks.bed	Irf1_0.peaks.bed	13094	17119	2633	0.00009999	0.00009999
2	TF1_peaks.bed	Irf4_0.peaks.bed	13094	13500	3420	0.00009999	0.00009999
3	TF1_peaks.bed	Junb_0.peaks.bed	13094	13460	2387	0.00009999	0.00009999
4	TF1_peaks.bed	K27Ac_0.peaks.bed	13094	48386	8275	0.00009999	0.00009999
5	TF1_peaks.bed	PU1_0.peaks.bed	13094	68692	7110	0.00009999	0.00009999
6	TF1_peaks.bed	Rela_0.peaks.bed	13094	3544	350	0.00009999	0.00009999
7	TF1_peaks.bed	Runx_0.peaks.bed	13094	2558	765	0.00009999	0.00009999
8	TF1_peaks.bed	Stat1_0.peaks.bed	13094	423	21	0.00009999	0.00009999
9	TF1_peaks.bed	Stat2_0.peaks.bed	13094	69	14	0.00009999	0.00009999

ADD REPLY • link 9.6 years ago Gyan Prakash Mishra ▴ 10

0

Entering edit mode

I tried to reproduce the same but I am getting different p and p.adjust values

	qSample	tSample	qLen	tLen	N_OL	pvalue	p.adjust
ARmo_1nM	GSM1174480_ARmo_0M_peaks.bed.gz	GSM1174481_ARmo_1nM_peaks.bed.gz	812	2296	26	0.000999001	0.001998002
ARmo_100nM	GSM1174480_ARmo_0M_peaks.bed.gz	GSM1174482_ARmo_100nM_peaks.bed.gz	812	1359	24	0.000999001	0.001998002
CBX6_BF	GSM1174480_ARmo_0M_peaks.bed.gz	GSM1295076_CBX6_BF_ChipSeq_mergedReps_peaks.bed.gz	812	1331	0	0.4085914086	0.4085914086
CBX7_BF	GSM1174480_ARmo_0M_peaks.bed.gz	GSM1295077_CBX7_BF_ChipSeq_mergedReps_peaks.bed.gz	812	1663	0	0.4085914086	0.4085914086

ADD REPLY • link 9.6 years ago Gyan Prakash Mishra ▴ 10