#### The support.bioconductor.org editor has been updated to markdown! Please see more info at: Tutorial: Updated Support Site Editor

Question: What are out-of-bound ranges? Is it necessary to get rid of them?
0
22 months ago by
salamandra0 wrote:

With the function:

overlaps.anno <- annotatePeakInBatch(overlaps, AnnotationData=annoData, output="nearestBiDirectionalPromoters",bindingRegion=c(-2000, 500))

got this warning message:

Annotate peaks by annoPeaks, see ?annoPeaks for details.
maxgap will be ignored.
Warning messages:
1: In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) :
GRanges object contains 5 out-of-bound ranges located on sequences GL000199.1 and
chrM. Note that only ranges located on a non-circular sequence whose length is not
NA can be considered out-of-bound (use seqlengths() and isCircular() to get the
lengths and circularity flags of the underlying sequences). You can use trim() to
trim these ranges. See ?trim,GenomicRanges-method for more information.

What are the out-of-bound ranges? Is it advisable that we get rid of them?

chippeakanno R • 1.0k views
modified 19 months ago • written 22 months ago by salamandra0

Please tag your message with the package name, ChIPpeakAnno, so the author is aware of the question.

ADD REPLYlink modified 22 months ago by Martin Morgan ♦♦ 23k • written 22 months ago by Valerie Obenchain6.7k
Answer: What are out-of-bound ranges? Is it necessary to get rid of them?
3
22 months ago by
Julie Zhu4.0k
United States
Julie Zhu4.0k wrote:

The out-of-bound ranges are ranges that are not valid coordinates in the chromosome. The warning message indicates that there are 5 such ranges in chrM and GL000199.1 from your data (overlaps). You can use trim(overlaps) instead of overlaps as input for annotatePeakInBatch. Here is a nice post to locate ranges that are out of bound https://www.biostars.org/p/98315/.

Best regards,

Julie

1

The seqlengths are stored in the GRanges object so there is no need to use a BSgenome package to get them. Therefore

which(end(mygr) > seqlengths(BSgenome.Hsapiens.UCSC.hg19)[as.character(seqnames(mygr))])

can be replaced with

which(end(mygr) > seqlengths(mygr)[as.character(seqnames(mygr))])

Note that ranges are considered out-of-bound only if the GRanges object contains the seqlengths information. If it doesn't contain the seqlengths information or contains it for some sequences only, the vectorized comparison (>) in the above code will generate NAs. However which() will ignore them so we're good.

Another thing is that in some rare situations, some ranges can start at a position < 1. These ranges are also considered out-of-bound. So a more accurate version of the above code would be something like

seqends <- seqlengths(mygr)[as.character(seqnames(mygr))]
which(start(mygr) < 1L | end(mygr) > seqends)

Finally ranges defined on a circular sequence (e.g. chrM) are never considered out-of-bound. So a completely accurate version of the above code would be even more complicated. It's actually implemented in internal utility GenomicRanges:::get_out_of_bound_index() which is used internally by the validity and "trim" methods for GRanges objects.

All this to say that the easiest and most reliable way of getting the index of out-of-bound ranges is probably to do which(mygr.original != mygr.trimmed), as suggested in the Biostar answer.

Cheers,

H.