ChIPpeakAnno annotatePeakInBatch output
0
0
Entering edit mode
igor ▴ 40
@igor
Last seen 10 months ago
United States

I am trying to understand the output argument of ChIPpeakAnno annotatePeakInBatch(). According to the reference manual:

nearestLocation (default): will output the nearest features calculated as Peak- LocForDistance - FeatureLocForDistance;

shortestDistance: will output nearest features;

Those two options sound identical to me, but I realize they behave differently. I am also not sure why shortestDistance would generate multiple annotations per peak.

chippeakanno • 1.6k views
ADD COMMENT
0
Entering edit mode

nearestLocation will output the nearest one from peak to feature start postion (or middle or end postion as you defined by FeatureLocForDistance)

and shortestDistance will output the nearest feature from the peak with shortest distance. Sometimes there are multiple outputs for one peak because of the select is set to 'all' for GenomicRanges::nearest.

ADD REPLY
0
Entering edit mode

1) I still don't understand "output the nearest feature from the peak with shortest distance". How can nearest not be shortest? Those two words have the same meaning when it comes to distance.

2) Shouldn't "nearest" mean one? Only one is nearest. The rest are further away. How can multiple outputs be nearest?

ADD REPLY
1
Entering edit mode

1) nearest should be shortest but not always.

For example,

#         aaaaaaaa     *
#      123456789012
#             CC       +
#       AAA  BB        - 

For peak a, the nearest annotation will be A, because the start position of A to peak a is 0. The shortest distance is 0 from peak a to feature A, B and C because they are overlapping. 


2) Nearest not mean only one but in most case it is. If there are two features shared the same start position or two features in the both end of peaks but with same distance, the outputs will be multiple.

#################################################

> feature <- GRanges("1", IRanges(start=c(2, 7, 8), end=c(4, 8, 9), names=LETTERS[1:3]), strand=c("-", "-", "+"))
> myPeak <- GRanges("1", IRanges(start=4, end=11, names="a"), strand="*")
> annotatePeakInBatch(myPeak, AnnotationData=feature, output="shortestDistance")
GRanges object with 3 ranges and 9 metadata columns:
      seqnames    ranges strand |        peak     feature start_position end_position feature_strand  insideFeature distancetoFeature
             |                                
  a.A     chr1   [4, 11]      * |           a           A              2            4              -   overlapStart                 0
  a.B     chr1   [4, 11]      * |           a           B              7            8              - includeFeature                 4
  a.C     chr1   [4, 11]      * |           a           C              8            9              + includeFeature                -4
      shortestDistance fromOverlappingOrNearest
                           
  a.A                0         shortestDistance
  a.B                3         shortestDistance
  a.C                2         shortestDistance
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> annotatePeakInBatch(myPeak, AnnotationData=feature, output="nearestLocation")
GRanges object with 1 range and 9 metadata columns:
      seqnames    ranges strand |        peak     feature start_position end_position feature_strand insideFeature distancetoFeature
             |                               
  a.A     chr1   [4, 11]      * |           a           A              2            4              -  overlapStart                 0
      shortestDistance fromOverlappingOrNearest
                           
  a.A                0          NearestLocation
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

ADD REPLY
1
Entering edit mode
Igor, Thanks for the feedback! I agree that the documentation is not clear. Hopefully the following toy example will make it clearer. Peak peak1 chr1 100 200 Annotation gene1 chr1 50 90 + gene2 chr1 210 250 + gene3 chr1 1000 2000 + Here is how shortest distance is calculated Shortest distance from a peak to a gene = minimum(abs(peak start � gene start), abs(peak start � gene end), abs(peak end � gene start), abs(peak end � gene end)) where abs standards for absolute value. Shortest distance from peak1 to gene1 = minimum(abs(100 � 50), abs(100 � 90), abs(200 � 50) , abs(200 � 90)) = 10 Shortest distance from peak1 to gene2 = minimum(abs(100 � 210), abs(100 � 210), abs(200� 210) , abs(200 � 250)) = 10 The shortest distance from peak1 to gene1 and that to gene2 are the same (10 bp). Therefore, both gene1 and gene2 will be in the output file when you set output = "shortestDistance". However, only gene1 will be in the output file if you set output = "nearestLocation" and FeatureLocForDistance = "TSS", detailed below. abs(peak1 start � gene1 TSS) = abs(100 � 50 )= 50 abs(peak1 start � gene2 TSS) = abs(100 � 210) = 210 Best regards, Julie From: "igor [bioc]" <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> Reply-To: "reply+836af89d+code@bioconductor.org<mailto:reply+836af89d+code@bioconductor.org>" <reply+836af89d+code@bioconductor.org<mailto:reply+836af89d+code@bioconductor.org>> Date: Thursday, January 21, 2016 6:13 PM To: Lihua Julie Zhu <julie.zhu@umassmed.edu<mailto:julie.zhu@umassmed.edu>> Subject: [bioc] C: ChIPpeakAnno annotatePeakInBatch output Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User igor<https: support.bioconductor.org="" u="" 7184=""/> wrote Comment: ChIPpeakAnno annotatePeakInBatch output<https: support.bioconductor.org="" p="" 77280="" #77286="">: 1) I still don't understand "output the nearest feature from the peak with shortest distance". How can nearest not be shortest? Those two words have the same meaning when it comes to distance. 2) Shouldn't "nearest" mean one? Only one is nearest. The rest are further away. How can multiple outputs be nearest? ________________________________ Post tags: chippeakanno You may reply via email or visit C: ChIPpeakAnno annotatePeakInBatch output
ADD REPLY

Login before adding your answer.

Traffic: 732 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6