Search
Question: significance of ChIPseq peak overlap
0
gravatar for Gyan Prakash Mishra
22 months ago by
INDIA
Gyan Prakash Mishra0 wrote:

Hi,

I am using enrichPeakOverlap of ChIPseeker to find significance of peak overlap.

Here is my result of the overlap with no. of shuffle i.e nshuffle=10000

qSample tSample qLen tLen N_OL   pvalue p.adjust
TF1_peaks.bed Irf1_0.peaks.bed 13094 17119 2633 0.2010844662 0.00009999 0.00009999
TF1_peaks.bed Irf4_0.peaks.bed 13094 13500 3420 0.2611883305 0.00009999 0.00009999
TF1_peaks.bed Junb_0.peaks.bed 13094 13460 2387 0.1822972354 0.00009999 0.00009999
TF1_peaks.bed K27Ac_0.peaks.bed 13094 48386 8275 0.6319688407 0.00009999 0.00009999
TF1_peaks.bed PU1_0.peaks.bed 13094 68692 7110 0.5429967924 0.00009999 0.00009999
TF1_peaks.bed Rela_0.peaks.bed 13094 3544 350 0.0267297999 0.00009999 0.00009999
TF1_peaks.bed Runx_0.peaks.bed 13094 2558 765 0.0584237055 0.00009999 0.00009999
TF1_peaks.bed Stat1_0.peaks.bed 13094 423 21 0.001603788 0.00009999 0.00009999
TF1_peaks.bed Stat2_0.peaks.bed 13094 69 14 0.001069192 0.00009999 0.00009999

Here P value is  same for all comparison . overlap of TF1  with peaks having highest overlap is coming out to be  as significant as lowest overlap with other peaks. I could not able to find the reason for same . I would like to understand what could be the possible reason. 

Thanks

ADD COMMENTlink modified 22 months ago by Guangchuang Yu970 • written 22 months ago by Gyan Prakash Mishra0

Hello, I have the same question

When I perform enrichPeakoverlap on your files I get this output

ARmo_1nM   GSM1174480_ARmo_0M_peaks.bed.gz                   GSM1174481_ARmo_1nM_peaks.bed.gz  812 2296   26
ARmo_100nM GSM1174480_ARmo_0M_peaks.bed.gz                 GSM1174482_ARmo_100nM_peaks.bed.gz  812 1359   24
CBX6_BF    GSM1174480_ARmo_0M_peaks.bed.gz GSM1295076_CBX6_BF_ChipSeq_mergedReps_peaks.bed.gz  812 1331    0
CBX7_BF    GSM1174480_ARmo_0M_peaks.bed.gz GSM1295077_CBX7_BF_ChipSeq_mergedReps_peaks.bed.gz  812 1663    0
                pvalue    p.adjust
ARmo_1nM   0.000999001 0.001998002
ARmo_100nM 0.000999001 0.001998002
CBX6_BF    0.390609391 0.390609391
CBX7_BF    0.390609391 0.390609391
 

And then when I perform mine

PGP1vs1D4<- "Combined_PGP1vs1D4_H3K4me1_2fold.regions.bed"
PGP1vs1G6<- "Combined_PGP1vs1G6_H3K4me1_2fold.regions.bed"
PGP1vs1G1<- "Combined_PGP1vs1G1_H3K4me1_2fold.regions.bed"
PGP1vsD2355N<- "Combined_PGP1vsD2355N_H3K4me1_2fold.regions.bed"
PGP1vsR2111W<- "Combined_PGP1vsR2111W_H3K4me1_2fold.regions.bed"

test<- enrichPeakOverlap(queryPeak = PGP1vs1D4, targetPeak = unlist(c(PGP1vs1G6, PGP1vs1G1, PGP1vsD2355N, PGP1vsR2111W)) , TxDb = NULL, pAdjustMethod = "BH", nShuffle = 1000, chainFile = NULL, verbose = TRUE)

test

1 Combined_PGP1vs1D4_H3K4me1_2fold.regions.bed    Combined_PGP1vs1G6_H3K4me1_2fold.regions.bed 1557 1626  254
2 Combined_PGP1vs1D4_H3K4me1_2fold.regions.bed    Combined_PGP1vs1G1_H3K4me1_2fold.regions.bed 1557  967   57
3 Combined_PGP1vs1D4_H3K4me1_2fold.regions.bed Combined_PGP1vsD2355N_H3K4me1_2fold.regions.bed 1557  703  167
4 Combined_PGP1vs1D4_H3K4me1_2fold.regions.bed Combined_PGP1vsR2111W_H3K4me1_2fold.regions.bed 1557  608   23
       pvalue    p.adjust
1 0.000999001 0.000999001
2 0.000999001 0.000999001
3 0.000999001 0.000999001
4 0.000999001 0.000999001

Why are all my p values and adjusted p values the same? Please help

 

ADD REPLYlink written 7 months ago by dr.lauren.wasson0

try to increase the number of nShuffle, e.g. using 10000

ADD REPLYlink written 7 months ago by Guangchuang Yu970

Thank you for your quick response. When I do that, I get the same result in that all my ps and adjusted ps are the same. They're just a lower p value (9.99e-5)

 

1 Combined_PGP1vs1D4_H3K4me1_2fold.regions.bed    Combined_PGP1vs1G6_H3K4me1_2fold.regions.bed 1557 1626  254
2 Combined_PGP1vs1D4_H3K4me1_2fold.regions.bed    Combined_PGP1vs1G1_H3K4me1_2fold.regions.bed 1557  967   57
3 Combined_PGP1vs1D4_H3K4me1_2fold.regions.bed Combined_PGP1vsD2355N_H3K4me1_2fold.regions.bed 1557  703  167
4 Combined_PGP1vs1D4_H3K4me1_2fold.regions.bed Combined_PGP1vsR2111W_H3K4me1_2fold.regions.bed 1557  608   23
     pvalue  p.adjust
1 9.999e-05 9.999e-05
2 9.999e-05 9.999e-05
3 9.999e-05 9.999e-05
4 9.999e-05 9.999e-05

 

ADD REPLYlink written 7 months ago by dr.lauren.wasson0

Hi, did you ever find a solution to this?

I have the same problem, I am always getting the same p-values which is essentially just nshuffle/(nshuffle+1). I have a simple test case below:

> t.gr<-GRanges(seqnames = "chr1",ranges = IRanges(start=seq(1,1000000,1000),width = 100),strand = "+")
> t2.gr<-GRanges(seqnames = "chr2",ranges = IRanges(start=seq(1,1000000,1000),width = 100),strand = "+")
> enrichPeakOverlap(t.gr,list(t.gr,t2.gr),TxDb = txdb,nShuffle = 100,verbose = T)
>> permutation test of peak overlap...		 2018-03-30 20:50:07 
  |                                                                         |   0%
    qSample     tSample qLen tLen N_OL     pvalue   p.adjust
1 queryPeak targetPeak1 1000 1000 1000 0.00990099 0.00990099
2 queryPeak targetPeak2 1000 1000    0 0.00990099 0.00990099
> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS

Matrix products: default
BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] stats4    parallel  grid      stats     graphics  grDevices utils    
 [8] datasets  methods   base     

other attached packages:
 [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
 [2] GenomicFeatures_1.30.3                 
 [3] AnnotationDbi_1.40.0                   
 [4] Biobase_2.38.0                         
 [5] GenomicRanges_1.30.3                   
 [6] GenomeInfoDb_1.14.0                    
 [7] IRanges_2.12.0                         
 [8] S4Vectors_0.16.0                       
 [9] BiocGenerics_0.24.0                    
[10] genomation_1.11.3                      
[11] ChIPseeker_1.14.2                      

loaded via a namespace (and not attached):
 [1] httr_1.3.1                 RMySQL_0.10.14            
 [3] bit64_0.9-7                splines_3.4.4             
 [5] gtools_3.5.0               assertthat_0.2.0          
 [7] DO.db_2.9                  rvcheck_0.0.9             
 [9] blob_1.1.0                 BSgenome_1.46.0           
[11] GenomeInfoDbData_1.0.0     Rsamtools_1.30.0          
[13] impute_1.52.0              yaml_2.1.18               
[15] progress_1.1.2             pillar_1.2.1              
[17] RSQLite_2.0                lattice_0.20-35           
[19] glue_1.2.0                 digest_0.6.15             
[21] RColorBrewer_1.1-2         XVector_0.18.0            
[23] qvalue_2.10.0              colorspace_1.3-2          
[25] Matrix_1.2-12              plyr_1.8.4                
[27] XML_3.98-1.10              pkgconfig_2.0.1           
[29] biomaRt_2.34.2             zlibbioc_1.24.0           
[31] GO.db_3.5.0                scales_0.5.0.9000         
[33] gdata_2.18.0               BiocParallel_1.12.0       
[35] tibble_1.4.2               ggplot2_2.2.1.9000        
[37] seqPattern_1.10.0          UpSetR_1.3.3              
[39] SummarizedExperiment_1.8.1 lazyeval_0.2.1            
[41] magrittr_1.5               memoise_1.1.0             
[43] DOSE_3.4.0                 gplots_3.0.1              
[45] tools_3.4.4                data.table_1.10.4-3       
[47] prettyunits_1.0.2          hms_0.4.2                 
[49] gridBase_0.4-7             matrixStats_0.53.1        
[51] stringr_1.3.0              munsell_0.4.3             
[53] plotrix_3.7                DelayedArray_0.4.1        
[55] bindrcpp_0.2               Biostrings_2.46.0         
[57] compiler_3.4.4             caTools_1.17.1            
[59] rlang_0.2.0.9000           RCurl_1.95-4.10           
[61] igraph_1.2.1               bitops_1.0-6              
[63] boot_1.3-20                gtable_0.2.0              
[65] DBI_0.8                    reshape2_1.4.3            
[67] R6_2.2.2                   GenomicAlignments_1.14.1  
[69] gridExtra_2.3              dplyr_0.7.4               
[71] rtracklayer_1.38.3         bit_1.1-12                
[73] bindr_0.1.1                fastmatch_1.1-0           
[75] fgsea_1.4.1                readr_1.1.1               
[77] KernSmooth_2.23-15         GOSemSim_2.4.1            
[79] stringi_1.1.7              Rcpp_0.12.16 
ADD REPLYlink written 7 weeks ago by shaunmwebb0

I also can't get pool=F to work when using granges, I think this is because targetFiles[i] is used as input recursively rather than target.gr[[i]]?

> enrichPeakOverlap(t.gr,list(t.gr,t2.gr),TxDb = txdb,nShuffle = 100,verbose = T,pool=F)
Error in file.exists(dir) : invalid 'file' argument

 

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by shaunmwebb0
0
gravatar for Guangchuang Yu
22 months ago by
Hong Kong
Guangchuang Yu970 wrote:

Your output is weird, not only all pvalues and p.adj are identical, but also it contains an additional column.

You need to make a reproducible example, otherwise, I can't help.

 

require(ChIPseeker)

ff= getSampleFiles()

enrichPeakOverlap(ff[[1]], unlist(ff[-1])) -> x

> x
                                   qSample
ARmo_1nM   GSM1174480_ARmo_0M_peaks.bed.gz
ARmo_100nM GSM1174480_ARmo_0M_peaks.bed.gz
CBX6_BF    GSM1174480_ARmo_0M_peaks.bed.gz
CBX7_BF    GSM1174480_ARmo_0M_peaks.bed.gz
                                                      tSample qLen tLen N_OL
ARmo_1nM                     GSM1174481_ARmo_1nM_peaks.bed.gz  812 2296   26
ARmo_100nM                 GSM1174482_ARmo_100nM_peaks.bed.gz  812 1359   24
CBX6_BF    GSM1295076_CBX6_BF_ChipSeq_mergedReps_peaks.bed.gz  812 1331    0
CBX7_BF    GSM1295077_CBX7_BF_ChipSeq_mergedReps_peaks.bed.gz  812 1663    0
               pvalue   p.adjust
ARmo_1nM   0.04495504 0.08991009
ARmo_100nM 0.04495504 0.08991009
CBX6_BF    0.47452547 0.47452547
CBX7_BF    0.47452547 0.47452547
ADD COMMENTlink written 22 months ago by Guangchuang Yu970

Oh  sorry about that. I add that additional column manually.

Here is the exact output.

  qSample tSample qLen tLen N_OL pvalue p.adjust
1 TF1_peaks.bed Irf1_0.peaks.bed 13094 17119 2633 0.00009999 0.00009999
2 TF1_peaks.bed Irf4_0.peaks.bed 13094 13500 3420 0.00009999 0.00009999
3 TF1_peaks.bed Junb_0.peaks.bed 13094 13460 2387 0.00009999 0.00009999
4 TF1_peaks.bed K27Ac_0.peaks.bed 13094 48386 8275 0.00009999 0.00009999
5 TF1_peaks.bed PU1_0.peaks.bed 13094 68692 7110 0.00009999 0.00009999
6 TF1_peaks.bed Rela_0.peaks.bed 13094 3544 350 0.00009999 0.00009999
7 TF1_peaks.bed Runx_0.peaks.bed 13094 2558 765 0.00009999 0.00009999
8 TF1_peaks.bed Stat1_0.peaks.bed 13094 423 21 0.00009999 0.00009999
9 TF1_peaks.bed Stat2_0.peaks.bed 13094 69 14 0.00009999 0.00009999
ADD REPLYlink written 22 months ago by Gyan Prakash Mishra0

I tried to reproduce the same but I am getting different p and p.adjust values

  qSample tSample qLen  tLen N_OL   pvalue p.adjust
ARmo_1nM GSM1174480_ARmo_0M_peaks.bed.gz GSM1174481_ARmo_1nM_peaks.bed.gz 812 2296   26 0.000999001 0.001998002
ARmo_100nM GSM1174480_ARmo_0M_peaks.bed.gz GSM1174482_ARmo_100nM_peaks.bed.gz 812 1359   24 0.000999001 0.001998002
CBX6_BF GSM1174480_ARmo_0M_peaks.bed.gz GSM1295076_CBX6_BF_ChipSeq_mergedReps_peaks.bed.gz 812 1331    0 0.4085914086  0.4085914086
CBX7_BF GSM1174480_ARmo_0M_peaks.bed.gz GSM1295077_CBX7_BF_ChipSeq_mergedReps_peaks.bed.gz 812 1663    0 0.4085914086   0.4085914086

 

ADD REPLYlink modified 22 months ago • written 22 months ago by Gyan Prakash Mishra0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 296 users visited in the last hour