ChIPpeakAnno annotatePeakInBatch output
2
0
Entering edit mode
@15877814
Last seen 9 months ago
Italy

Hi everyone, I am trying to understand the output of ChIPpeakAnno annotatePeakInBatch(). I have used the following code to annotate my peak list to the reference genome:

annotatedpeaks <- annotatePeakInBatch( peaks.GR, AnnotationData=annoData, output = c("both"), maxgap = 0, multiple=F)

I don't understand the tool's output multiple= "at most one overlapping feature for each peak" as stated in the manual. My idea is to have my peaks annotated both to their nearest position and to the overlapping ones. In particular in the case the peak overlaps with a feature that is not the nearest I want it in the output. This is becase I realized that some peaks are annotated to their nearest position eventhough they overlap with a feature that reside on the same strand. How can I solve this problem?

Thanks

ChIPpeakAnno • 1.0k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 11 hours ago
United States

The help page says that 'multiple' is kept for backwards compatibility, and that you should use 'select' instead. And if you use select = "all" (the default), you will get all peaks returned. Which seems to be what you want.

ADD COMMENT
0
Entering edit mode

Thanks James for your kind reply!. I read what you mentioned in your comment and I tried it. The point is that when the peak is overlapping a feature the tools give me both the overlapping feature and the nearest feature to the peak. I would like to have an output containg only nearest feature for those peaks that don not reside within peaks and only overlapping features for those peaks that are inside genes...

ADD REPLY
0
Entering edit mode

Hi Ilaria,

Thank you for your great question! To achieve your specific goal, you can utilize the insideFeature column in the output file. By setting the insideFeature to "inside," you can effectively isolate peaks that fall within features.

There are several other values for the insideFeature column:

  "upstream": Indicates peaks situated upstream of features.
  "includeFeature": Indicates peaks exactly matching features.
   "overlapStart": Indicates peaks overlapping with feature starts.
    "inside" (as mentioned earlier): Indicates peaks located entirely within features.
    "overlapEnd": Indicates peaks overlapping with feature ends.

   "downstream": Indicates peaks located downstream of features.

Hope this fits your needs.

Best regards,

Julie

ADD REPLY
0
Entering edit mode
Kai Hu ▴ 70
@kai
Last seen 7 months ago
Worcester

I read your concern again:

I would like to have an output containg only nearest feature for those peaks that don not reside within peaks and only overlapping features for those peaks that are inside genes...

Seems like you would like to assign only one type of feature (either "nearest" or "overlapping", and "overlapping" is preferred if the "nearest" feature is not "overlapping") to each peak. Like you mentioned, if you set output = "both", select = "all", the tool gives both "overlapping" and "nearest" features to peaks. To obtain what you want, I suggest three steps: first, annotate peaks to the overlapping features; second, annotate the peaks that don't have overlapping features to the nearest features; last, concatenate the two. Below is some example codes.

library(ensembldb)
library(EnsDb.Hsapiens.v75)
data(myPeakList)
annoData <- annoGR(EnsDb.Hsapiens.v75)

# Step1: annotate peaks to the overlapping features, if "select = 'all'", multiple features can be assigned to a single peak.
anno_overlapping <- annotatePeakInBatch(myPeakList, AnnotationData = annoData, 
                                        output = "overlapping", select = "first")
anno_overlapping_non_na <- anno_overlapping[!is.na(anno_overlapping$feature)]

# Step2: annotate peaks that are without overlapping features to nearest features
myPeakList_non_overlapping <- myPeakList[!(names(myPeakList) %in% anno_overlapping_non_na$peak)]  
anno_nearest <- annotatePeakInBatch(myPeakList_non_overlapping, 
                                    AnnotationData = annoData, 
                                    output = "nearestLocation", select = "first")

# Step3: concatenate the two
anno_final <- c(anno_overlapping_non_na, anno_nearest)

The above code assigns either "overlapping" or "nearest" feature to peak, and if "overlapping" feature is not the "nearest", only the "overlapping" one will be reported. Hope this is what you want.

ADD COMMENT
0
Entering edit mode

Thank you Kai Hu! This is exacly what I was looking for. Thank you again for the great explanation!

ADD REPLY

Login before adding your answer.

Traffic: 700 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6