Search
Question: What are out-of-bound ranges? Is it necessary to get rid of them?
0
gravatar for tania.l.barata
5 months ago by
tania.l.barata0 wrote:

With the function:

overlaps.anno <- annotatePeakInBatch(overlaps, AnnotationData=annoData, output="nearestBiDirectionalPromoters",bindingRegion=c(-2000, 500)) 

got this warning message:

Annotate peaks by annoPeaks, see ?annoPeaks for details.
maxgap will be ignored.
Warning messages:
1: In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) :
  GRanges object contains 5 out-of-bound ranges located on sequences GL000199.1 and
  chrM. Note that only ranges located on a non-circular sequence whose length is not
  NA can be considered out-of-bound (use seqlengths() and isCircular() to get the
  lengths and circularity flags of the underlying sequences). You can use trim() to
  trim these ranges. See ?`trim,GenomicRanges-method` for more information.

What are the out-of-bound ranges? Is it advisable that we get rid of them?

ADD COMMENTlink modified 9 weeks ago • written 5 months ago by tania.l.barata0

Please tag your message with the package name, ChIPpeakAnno, so the author is aware of the question.

ADD REPLYlink modified 5 months ago by Martin Morgan ♦♦ 20k • written 5 months ago by Valerie Obenchain ♦♦ 6.2k
3
gravatar for Julie Zhu
5 months ago by
Julie Zhu3.7k
United States
Julie Zhu3.7k wrote:

The out-of-bound ranges are ranges that are not valid coordinates in the chromosome. The warning message indicates that there are 5 such ranges in chrM and GL000199.1 from your data (overlaps). You can use trim(overlaps) instead of overlaps as input for annotatePeakInBatch. Here is a nice post to locate ranges that are out of bound https://www.biostars.org/p/98315/.

Best regards,

Julie

ADD COMMENTlink written 5 months ago by Julie Zhu3.7k
1

Thanks Julie for the nice explanation. A couple of comments about the Biostar answer:

The seqlengths are stored in the GRanges object so there is no need to use a BSgenome package to get them. Therefore

which(end(mygr) > seqlengths(BSgenome.Hsapiens.UCSC.hg19)[as.character(seqnames(mygr))])

can be replaced with

which(end(mygr) > seqlengths(mygr)[as.character(seqnames(mygr))])

Note that ranges are considered out-of-bound only if the GRanges object contains the seqlengths information. If it doesn't contain the seqlengths information or contains it for some sequences only, the vectorized comparison (>) in the above code will generate NAs. However which() will ignore them so we're good.

Another thing is that in some rare situations, some ranges can start at a position < 1. These ranges are also considered out-of-bound. So a more accurate version of the above code would be something like

seqends <- seqlengths(mygr)[as.character(seqnames(mygr))]
which(start(mygr) < 1L | end(mygr) > seqends)

Finally ranges defined on a circular sequence (e.g. chrM) are never considered out-of-bound. So a completely accurate version of the above code would be even more complicated. It's actually implemented in internal utility GenomicRanges:::get_out_of_bound_index() which is used internally by the validity and "trim" methods for GRanges objects.

All this to say that the easiest and most reliable way of getting the index of out-of-bound ranges is probably to do which(mygr.original != mygr.trimmed), as suggested in the Biostar answer.

Cheers,

H.

ADD REPLYlink modified 5 months ago • written 5 months ago by Hervé Pagès ♦♦ 12k
Herve, Thanks for such a thorough discussion on this topic! Best regards, Julie From: "Herv� Pag�s [bioc]" <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> Reply-To: "reply+9b1b6cf2+code@bioconductor.org<mailto:reply+9b1b6cf2+code@bioconductor.org>" <reply+9b1b6cf2+code@bioconductor.org<mailto:reply+9b1b6cf2+code@bioconductor.org>> Date: Friday, April 21, 2017 3:18 PM To: Lihua Julie Zhu <julie.zhu@umassmed.edu<mailto:julie.zhu@umassmed.edu>> Subject: [bioc] C: What are out-of-bound ranges? Is it necessary to get rid of them? Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User Herv� Pag�s<https: support.bioconductor.org="" u="" 1542=""/> wrote Comment: What are out-of-bound ranges? Is it necessary to get rid of them?<https: support.bioconductor.org="" p="" 94851="" #95158="">: Thanks Julie for the nice explanation. A couple of comments about the Biostar answer: The seqlengths are stored in the GRanges object so there is no need to use a BSgenome package to get them. Therefore which(end(mygr) > seqlengths(BSgenome.Hsapiens.UCSC.hg19)[as.character(seqnames(mygr))]) can be replaced with which(end(mygr) > seqlengths(mygr)[as.character(seqnames(mygr))]) Note that ranges are considered out-of-bound only if the GRanges object contains the seqlengths information. If it doesn't contain the seqlengths information or contains it for some sequences only, the vectorized comparison (>) in the above code will generate NAs. However which() will ignore them so we're good. Another thing is that in some rare situations, some ranges can start at a position < 1. These ranges are also considered out-of-bound. So a more accurate version of the above code would be something like seqends <- seqlengths(mygr)[as.character(seqnames(mygr))] which(start(mygr) < 1L | end(mygr) > seqends) Finally ranges defined on a circular sequence (e.g. chrM) are never considered out-of-bound. So a completely accurate version of the above code would be even more complicated. It's actually implemented in internal utility GenomicRanges:::get_out_of_bound_index() which is used internally by the validity and "trim" methods for GRanges objects. All this to say that the easiest and most reliable way of getting the index of out-of-bound ranges is probably to do which(mygr.original != mygr.trimmed), as suggested in the Biostar answer. Cheers, H. ________________________________ Post tags: r You may reply via email or visit C: What are out-of-bound ranges? Is it necessary to get rid of them?
ADD REPLYlink written 5 months ago by Julie Zhu3.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 337 users visited in the last hour