Search
Question: What are out-of-bound ranges? Is it necessary to get rid of them?
0
14 months ago by
tania.l.barata0 wrote:

With the function:

overlaps.anno <- annotatePeakInBatch(overlaps, AnnotationData=annoData, output="nearestBiDirectionalPromoters",bindingRegion=c(-2000, 500))

got this warning message:

Annotate peaks by annoPeaks, see ?annoPeaks for details.
maxgap will be ignored.
Warning messages:
1: In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) :
GRanges object contains 5 out-of-bound ranges located on sequences GL000199.1 and
chrM. Note that only ranges located on a non-circular sequence whose length is not
NA can be considered out-of-bound (use seqlengths() and isCircular() to get the
lengths and circularity flags of the underlying sequences). You can use trim() to
trim these ranges. See ?trim,GenomicRanges-method for more information.

What are the out-of-bound ranges? Is it advisable that we get rid of them?

modified 11 months ago • written 14 months ago by tania.l.barata0

Please tag your message with the package name, ChIPpeakAnno, so the author is aware of the question.

ADD REPLYlink modified 14 months ago by Martin Morgan ♦♦ 21k • written 14 months ago by Valerie Obenchain ♦♦ 6.5k
3
14 months ago by
Julie Zhu3.8k
United States
Julie Zhu3.8k wrote:

The out-of-bound ranges are ranges that are not valid coordinates in the chromosome. The warning message indicates that there are 5 such ranges in chrM and GL000199.1 from your data (overlaps). You can use trim(overlaps) instead of overlaps as input for annotatePeakInBatch. Here is a nice post to locate ranges that are out of bound https://www.biostars.org/p/98315/.

Best regards,

Julie

1

The seqlengths are stored in the GRanges object so there is no need to use a BSgenome package to get them. Therefore

which(end(mygr) > seqlengths(BSgenome.Hsapiens.UCSC.hg19)[as.character(seqnames(mygr))])

can be replaced with

which(end(mygr) > seqlengths(mygr)[as.character(seqnames(mygr))])

Note that ranges are considered out-of-bound only if the GRanges object contains the seqlengths information. If it doesn't contain the seqlengths information or contains it for some sequences only, the vectorized comparison (>) in the above code will generate NAs. However which() will ignore them so we're good.

Another thing is that in some rare situations, some ranges can start at a position < 1. These ranges are also considered out-of-bound. So a more accurate version of the above code would be something like

seqends <- seqlengths(mygr)[as.character(seqnames(mygr))]
which(start(mygr) < 1L | end(mygr) > seqends)

Finally ranges defined on a circular sequence (e.g. chrM) are never considered out-of-bound. So a completely accurate version of the above code would be even more complicated. It's actually implemented in internal utility GenomicRanges:::get_out_of_bound_index() which is used internally by the validity and "trim" methods for GRanges objects.

All this to say that the easiest and most reliable way of getting the index of out-of-bound ranges is probably to do which(mygr.original != mygr.trimmed), as suggested in the Biostar answer.

Cheers,

H.

Herve, Thanks for such a thorough discussion on this topic! Best regards, Julie From: "Herv� Pag�s [bioc]" <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> Reply-To: "reply+9b1b6cf2+code@bioconductor.org<mailto:reply+9b1b6cf2+code@bioconductor.org>" <reply+9b1b6cf2+code@bioconductor.org<mailto:reply+9b1b6cf2+code@bioconductor.org>> Date: Friday, April 21, 2017 3:18 PM To: Lihua Julie Zhu <julie.zhu@umassmed.edu<mailto:julie.zhu@umassmed.edu>> Subject: [bioc] C: What are out-of-bound ranges? Is it necessary to get rid of them? Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User Herv� Pag�s<https: support.bioconductor.org="" u="" 1542=""/> wrote Comment: What are out-of-bound ranges? Is it necessary to get rid of them?<https: support.bioconductor.org="" p="" 94851="" #95158="">: Thanks Julie for the nice explanation. A couple of comments about the Biostar answer: The seqlengths are stored in the GRanges object so there is no need to use a BSgenome package to get them. Therefore which(end(mygr) > seqlengths(BSgenome.Hsapiens.UCSC.hg19)[as.character(seqnames(mygr))]) can be replaced with which(end(mygr) > seqlengths(mygr)[as.character(seqnames(mygr))]) Note that ranges are considered out-of-bound only if the GRanges object contains the seqlengths information. If it doesn't contain the seqlengths information or contains it for some sequences only, the vectorized comparison (>) in the above code will generate NAs. However which() will ignore them so we're good. Another thing is that in some rare situations, some ranges can start at a position < 1. These ranges are also considered out-of-bound. So a more accurate version of the above code would be something like seqends <- seqlengths(mygr)[as.character(seqnames(mygr))] which(start(mygr) < 1L | end(mygr) > seqends) Finally ranges defined on a circular sequence (e.g. chrM) are never considered out-of-bound. So a completely accurate version of the above code would be even more complicated. It's actually implemented in internal utility GenomicRanges:::get_out_of_bound_index() which is used internally by the validity and "trim" methods for GRanges objects. All this to say that the easiest and most reliable way of getting the index of out-of-bound ranges is probably to do which(mygr.original != mygr.trimmed), as suggested in the Biostar answer. Cheers, H. ________________________________ Post tags: r You may reply via email or visit C: What are out-of-bound ranges? Is it necessary to get rid of them?

Content
Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.