Question: ChIPpeakAnno
0
6.4 years ago by
Julie Zhu4.1k
United States
Julie Zhu4.1k wrote:
Ann, Thanks for the feedback! Your function call is correct. However, there is a difference between maxgap and distancetoFeature (or shortestDistance). Maxgap specifies the maximum gap between two ranges instead of the distance between the ends. For example, when two ranges overlap, then the gap between the two ranges is 0 (no gap) although the distancetoFeature might be greater than 0 which is calculated as start of peak - the start of the feature. Here is a toy example peak: chr1:1000-1600 feature: chr1:300-2000 distance2Feature = 1000 - 300 = 700 shortestDistance = min(abs(1000-300), abs(1000-2000), abs(1600-300), abs(1600-2000)) = 400 where abs = absolute value Gap = 0 because these two ranges overlap Please let me know if this makes sense. Please CC bioconductor in the subsequent communications for others to input/benefit. Thanks! Best regards, Julie On 6/20/13 3:00 AM, "Ann Mongan" <amongan at="" quanticel.com=""> wrote: > Dear Julie, > Thank you for developing ChIPpeakAnno, I find it very useful. > Anyway, I?m using ChIPpeakAnno_2.2.0. I found some peculiarity with how my > peaks are assign to features that are outside of maxgap (example below). > Could you help me understand why I get these results? I suppose some > arguments must not be set correctly. > Thanks for your help. > Ann > > t1 = findOverlappingPeaks(ASR, refseqRanges, maxgap=5000, multiple=TRUE, > select='all',NameOfPeaks1='KDM5B',NameOfPeaks2='RefSeq') > >> head(t1$OverlappingPeaks[t1$OverlappingPeaks$shortestDistance >5000,]) > KDM5B chr RefSeq RefSeq_start RefSeq_end strand KDM5B_start KDM5B_end > strand1 overlapFeature shortestDistance > 62 00033 1 02323 860260 879955 + 870589 871263 > + inside 8692 > 63 00034 1 02323 860260 879955 + 871383 871883 > + inside 8072 > 64 00035 1 02323 860260 879955 + 873522 874033 > + inside 5922 > 120 00062 1 02363 955503 991496 + 964918 966100 > + inside 9415 > 121 00063 1 02363 955503 991496 + 975841 976296 > + inside 15200 > 138 00081 1 02398 1109264 1133315 + 1120693 1121410 > + inside 11429 > > > > p = annotatePeakInBatch(head(ASR,100), AnnotationData=refseqRanges, > output="both", maxgap=5000, > PeakLocForDistance="middle", FeatureLocForDistance="TSS",select="all") > >> head(as.data.frame(p)[p$distancetoFeature>5000,]) > space start end width names peak strand > feature start_position end_position insideFeature distancetoFeature > shortestDistance > 7 chr1 870589 871263 675 33 1244.NM_152486.SAMD11 33 + > 1244.NM_152486.SAMD11 861120 879961 inside > 9806 8698 > 8 chr1 871383 871883 501 34 1244.NM_152486.SAMD11 34 + > 1244.NM_152486.SAMD11 861120 879961 inside > 10513 8078 > 9 chr1 873522 874033 512 35 1244.NM_152486.SAMD11 35 + > 1244.NM_152486.SAMD11 861120 879961 inside > 12658 5928 > 10 chr1 874123 875130 1008 36 1244.NM_152486.SAMD11 36 + > 1244.NM_152486.SAMD11 861120 879961 inside > 13506 4831 > 11 chr1 875328 875693 366 37 1244.NM_152486.SAMD11 37 + > 1244.NM_152486.SAMD11 861120 879961 inside > 14390 4268 > 12 chr1 875720 879253 3534 38 1244.NM_152486.SAMD11 38 + > 1244.NM_152486.SAMD11 861120 879961 inside > 16366 708 > fromOverlappingOrNearest > 7 NearestStart > 8 NearestStart > 9 NearestStart > 10 NearestStart > 11 NearestStart > 12 NearestStart > > >
chippeakanno assign • 818 views
modified 6.4 years ago • written 6.4 years ago by Julie Zhu4.1k
0
6.4 years ago by
Julie Zhu4.1k
United States
Julie Zhu4.1k wrote:
Ann, Please see my response below. Thanks! Best regards, Julie On 6/20/13 10:47 AM, "Ann Mongan" <amongan at="" quanticel.com=""> wrote: > Hi Julie, > Thank you very much for your prompt response. I was sort of guessing that was > the case. However > 1) for my second call, I specified "PeakLocForDistance="middle", > FeatureLocForDistance="TSS", I would have thought that only peaks with 5 Kb > around the start position of the feature would be return? Since it's not, how > does my 2 calls differ? Your fist call is to find overlapping features that are not gapped by more than maxgap. Your second call is to find both nearest feature and overlapping features. In case, there is no overlapping features, you will still obtain nearest features with the distance calculated from PeakLocForDistance - FeatureLocForDistance Therefore, the features from your first call is a subset of the features of your second call. > 2) since I'm only interested in peaks within 5 Kb from TSS, I suppose I could > just filter out my previous result instead of running it again, right? For > future runs, should I just create a list features that is only 1 bp at the > TSS? I'm using refseq start site as TSS, would that be your recommendation, > too? Yes, you could just filter the results from your second call. I would recommend use the whole feature ranges, and filter later. Creating a list of features that is only 1 bp at the TSS might still requires you to filter for very wide peaks. Imaging the TSS lands inside the peak. For narrow peaks, please feel free to use this trick. > 3) by default, since the overlap calculation is bidirectional, wouldn't this > also cover cases for bidirectional promoters? There is actually a function for this purpose. Please type ?peaksNearBDP after loading ChIPpeakAnno > > Lastly, I don't know the email for the bioconductor list, is there a specific > list for this package? The bioconductor list is bioconductor at r-project.org ( http://www.bioconductor.org/help/mailing-list/). FYI, I will be away for two weeks starting tomorrow, so please email Jianhong Ou and cc Bioconductor for subsequent communications. Thanks! > > Have a great day! You too! > Ann > > > > > > On Thursday, June 20, 2013, Zhu, Lihua (Julie) wrote: >> Ann, >> >> Thanks for the feedback! >> >> Your function call is correct. However, there is a difference between maxgap >> and distancetoFeature (or shortestDistance). Maxgap specifies the maximum >> gap between two ranges instead of the distance between the ends. For >> example, when two ranges overlap, then the gap between the two ranges is 0 >> (no gap) although the distancetoFeature might be greater than 0 which is >> calculated as start of peak - the start of the feature. >> >> Here is a toy example >> peak: chr1:1000-1600 >> feature: chr1:300-2000 >> distance2Feature = 1000 - 300 = 700 >> shortestDistance = min(abs(1000-300), abs(1000-2000), abs(1600-300), >> abs(1600-2000)) = 400 where abs = absolute value >> Gap = 0 because these two ranges overlap >> >> Please let me know if this makes sense. >> >> Please CC bioconductor in the subsequent communications for others to >> input/benefit. Thanks! >> >> Best regards, >> >> Julie >> >> >> On 6/20/13 3:00 AM, "Ann Mongan" <amongan at="" quanticel.com=""> wrote: >> >>> Dear Julie, >>> Thank you for developing ChIPpeakAnno, I find it very useful. >>> Anyway, I?m using ChIPpeakAnno_2.2.0. I found some peculiarity with how my >>> peaks are assign to features that are outside of maxgap (example below). >>> Could you help me understand why I get these results? I suppose some >>> arguments must not be set correctly. >>> Thanks for your help. >>> Ann >>> >>> t1 = findOverlappingPeaks(ASR, refseqRanges, maxgap=5000, multiple=TRUE, >>> select='all',NameOfPeaks1='KDM5B',NameOfPeaks2='RefSeq') >>> >>>> head(t1$OverlappingPeaks[t1$OverlappingPeaks$shortestDistance >5000,]) >>> KDM5B chr RefSeq RefSeq_start RefSeq_end strand KDM5B_start KDM5B_end >>> strand1 overlapFeature shortestDistance >>> 62 00033 1 02323 860260 879955 + 870589 871263 >>> + inside 8692 >>> 63 00034 1 02323 860260 879955 + 871383 871883 >>> + inside 8072 >>> 64 00035 1 02323 860260 879955 + 873522 874033 >>> + inside 5922 >>> 120 00062 1 02363 955503 991496 + 964918 966100 >>> + inside 9415 >>> 121 00063 1 02363 955503 991496 + 975841 976296 >>> + inside 15200 >>> 138 00081 1 02398 1109264 1133315 + 1120693 1121410 >>> + inside 11429 >>> >>> >>> >>> p = annotatePeakInBatch(head(ASR,100), AnnotationData=refseqRanges, >>> output="both", maxgap=5000, >>> PeakLocForDistance="middle", >>> FeatureLocForDistance="TSS",select="all") >>> >>>> head(as.data.frame(p)[p$distancetoFeature>5000,]) >>> space start end width names peak strand >>> feature start_position end_position insideFeature distancetoFeature >>> shortestDistance >>> 7 chr1 870589 871263 675 33 1244.NM_152486.SAMD11 33 + >>> 1244.NM_152486.SAMD11 861120 879961 inside >>> 9806 8698 >>> 8 chr1 871383 871883 501 34 1244.NM_152486.SAMD11 34 + >>> 1244.NM_152486.SAMD11 861120 879961 inside >>> 10513 8078 >>> 9 chr1 873522 874033 512 35 1244.NM_152486.SAMD11 35 + >>> 1244.NM_152486.SAMD11 861120 879961 inside >>> 12658 5928 >>> 10 chr1 874123 875130 1008 36 1244.NM_152486.SAMD11 36 + >>> 1244.NM_152486.SAMD11 861120 879961 inside >>> 13506 4831 >>> 11 chr1 875328 875693 366 37 1244.NM_152486.SAMD11 37 + >>> 1244.NM_152486.SAMD11 861120 879961 inside >>> 14390 4268 >>> 12 chr1 875720 879253 3534 38 1244.NM_152486.SAMD11 38 + >>> 1244.NM_152486.SAMD11 861120 879961 inside >>> 16366 708 >>> fromOverlappingOrNearest >