ChIPpeakAnno findOverlapsOfPeaks error: "Error in FUN(X[[i]], ...) : Inputs contains duplicated ranges. please recheck your inputs."
1
0
Entering edit mode
@ed584650
Last seen 9 weeks ago
Canada

Hello! I keep getting the following error from running findOverlapsOfPeaks using ChIPpeakAnno and I would love advice!

I ran this same code on a different broadpeak file set I made from a dataframe and it worked fine, however, with this dataset it does not. I'm really not changing anything except the contents of the dataframe, but they each have the same columns and were built the same way.


#make a new df keeping only the columns for broadpeak BED format.
annoIDsNdf_upTSS8kb %>% select(seqnames, start, end, peak, score, strand, signalValue, pValue, qValue) -> upTSS8kb_N.broadpeak
upTSS8kb_N.broadpeak %>% 
  rename(
    chrom = seqnames,
    chromStart = start,
    chromEnd = end,
    name = peak,
  )
##done for all 4 samples

#makes gRanges from above files
upTSS8kb_Ngr <- toGRanges(upTSS8kb_N.broadpeak, format="broadPeak")
upTSS8kb_Hgr <- toGRanges(upTSS8kb_H.broadpeak, format="broadPeak")
upTSS8kb_8gr <- toGRanges(upTSS8kb_8.broadpeak, format="broadPeak")
upTSS8kb_5gr <- toGRanges(upTSS8kb_5.broadpeak, format="broadPeak")
#find overlapping peaks
ol_NHP_upTSS8kb <- findOverlapsOfPeaks(upTSS8kb_Ngr, upTSS8kb_Hgr, upTSS8kb_8gr, upTSS8kb_5gr)

output of the last line is:

Error in FUN(X[[i]], ...) : Inputs contains duplicated ranges. please recheck your inputs.

I appreciate any and all help. Thank you!

HELP ChIPpeakAnno • 203 views
ADD COMMENT
1
Entering edit mode
Kai Hu ▴ 30
@kai
Last seen 8 days ago
Worcester

As the error message suggests, some of your GRanges objects may contain duplicated ranges.

Basically, findOverlapsOfPeaks() calls privateUtil::trimPeakList(), which checks for duplicates and pops out this error if detected.

I think your next step is to double-check your GRanges objects and use unique() to filter out duplicates if any. Below is an example adapted the from usage example of findOverlapsOfPeaks(). Note that peaks3 is identical to peaks2 except that the 4th and 5th ranges are duplicated intentionally for demo.

peaks1 <- GRanges(seqnames=c(6,6,6,6,5),
                 IRanges(start=c(1543200,1557200,1563000,1569800,167889600),
                         end=c(1555199,1560599,1565199,1573799,167893599),
                         names=c("p1","p2","p3","p4","p5")),
                 strand="+")
peaks2 <- GRanges(seqnames=c(6,6,6,6,5),
                  IRanges(start=c(1549800,1554400,1565000,1569400,167888600),
                          end=c(1550599,1560799,1565399,1571199,167888999),
                          names=c("f1","f2","f3","f4","f5")),
                  strand="+")
peaks3 <- GRanges(seqnames=c(6,6,6,6,6),
                  IRanges(start=c(1549800,1554400,1565000,1569400,1569400),
                          end=c(1550599,1560799,1565399,1571199,1571199),
                          names=c("f1","f2","f3","f4","f4")),
                  strand="+")

# Below would pop out error since there are duplicates in peaks3:
t1 <- findOverlapsOfPeaks(peaks1, peaks2, peaks3, maxgap=1000)

Output:
Error in FUN(X[[i]], ...) : Inputs contains duplicated ranges.
             please recheck your inputs.

# To remove duplicates, simply:
peaks4 <- unique(peaks3)
# And below is okay now:
t1 <- findOverlapsOfPeaks(peaks1, peaks2, peaks4, maxgap=1000)

# To find out which rows are duplicated, the following codes may help:
df <- data.frame(peaks3)
df[duplicated(df), ]

Output:
  seqnames   start     end width strand
5        6 1569400 1571199  1800      +
ADD COMMENT
0
Entering edit mode

This fixed it! Thank you so much for the detailed example and the quick response!

ADD REPLY

Login before adding your answer.

Traffic: 326 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6