Question: how does "maxgap" in ChIPpeakAnno working?
1
6 months ago by
ngs0620
Sweden
ngs0620 wrote:

Hi, I tried to annotate my chipseq peak called regions with ChIPpeakAnno package using the following command. If i understand correctly : output=both, annotates to the nearest features (upstream and downstream) as well as the features overlapping within given maxgap (i.e 5000 bp ) distance.

final_anno<-annotatePeakInBatch(final,AnnotationData=ucsc.mm10.knownGene,output="both", maxgap=5000)


When I looked into the output results, the shortestdistance to Overlapping features is >5000bp (** in the below table). I think that features not overlapping within 5000 bp should be given NA. But looks like program is searching for greater than given maxgap distance. Can anyone help me in understanding this or how ChIPpeakAnno is making use to maxgap for overlapping features??

seqnames    start   end width   strand  peakNames   peak    feature start_position  end_position    feature_strand  insideFeature   distancetoFeature   shortestDistance    fromOverlappingOrNearest    symbol
chr1    5071994 5072969 976 +   c("peaks1_range__0002", "peaks1_range__0003", "peaks2_range__00005")    2   58175   4909576 5070285 -   upstream    -1709   1709    NearestLocation Rgs20
chr1    9772666 9773182 517 +   c("peaks1_range__0011", "peaks2_range__00014")  6   73331   9747648 9791922 +   inside  25018   **18740**   Overlapping 1700034P13Rik
chr1    10286194    10286710    517 +   c("peaks2_range__00017", "peaks1_range__0012")  7   211673  10137507    10232670    -   upstream    -53524  53524   NearestLocation Arfgef1
chr1    10337617    10338120    504 +   c("peaks2_range__00019", "peaks1_range__0014")  9   329093  10324719    10719945    -   inside  382328  12898   Overlapping Cpa6
chr1    10396872    10397407    536 +   c("peaks2_range__00020", "peaks1_range__0015")  10  211673  10137507    10232670    -   upstream    -164202 164202  NearestLocation Arfgef1
chr1    10396872    10397407    536 +   c("peaks2_range__00020", "peaks1_range__0015")  10  329093  10324719    10719945    -   inside  323073  **72153**   Overlapping Cpa6
chr1    16544879    16545397    519 +   c("peaks1_range__0036", "peaks2_range__00054")  28  66799   16540788    16619338    -   inside  74459   4091    Overlapping Ube2w

chippeakanno • 182 views
modified 6 months ago • written 6 months ago by ngs0620

NearestLocation will ignore the maxgap parameter. If you want all annotation within 5K of gene, you can filter it after annotation. See https://support.bioconductor.org/p/60971/

Jianhong.

Thanks for your reply Ou! But i am using "both" option not the "nearestLocation" . And according to the manual maxgap should be considered for this parameter right? "both" will output all the nearest features, in addition, will output any features that overlap the peak that is not the nearest features"

"both" means it will include all the results of nearestLocation and overlapping. I understand that this is a little confusion.

Yes, coming back to my main question. If "both" is considering overlapping features with maxgap:5000 then why does it reported the gene Cpa6 as its "overlapping" with distance of 72153 (which is > 5000 bp)

Example here:

seqnames    start   end width   strand  peakNames   peak    feature start_position  end_position    feature_strand  insideFeature   distancetoFeature   shortestDistance    fromOverlappingOrNearest    symbol
chr1    10396872    10397407    536 +   c("peaks2_range__00020", "peaks1_range__0015")  10  211673  10137507    10232670    -   upstream    -164202 164202  NearestLocation Arfgef1
chr1    10396872    10397407    536 +   c("peaks2_range__00020", "peaks1_range__0015")  10  329093  10324719    10719945    -   inside  323073  **72153**   Overlapping Cpa6


The peak is inside the feature. Therefore, it is considered overlapping even though the distance between starts/ends are greater than 5000.

Hope it makes sense to you.

Best regards,

Julie