What are out-of-bound ranges? Is it necessary to get rid of them?
1
0
Entering edit mode
salamandra ▴ 20
@salamandra-12825
Last seen 2.3 years ago
Portugal

With the function:

overlaps.anno <- annotatePeakInBatch(overlaps, AnnotationData=annoData, output="nearestBiDirectionalPromoters",bindingRegion=c(-2000, 500)) 

got this warning message:

Annotate peaks by annoPeaks, see ?annoPeaks for details.
maxgap will be ignored.
Warning messages:
1: In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) :
  GRanges object contains 5 out-of-bound ranges located on sequences GL000199.1 and
  chrM. Note that only ranges located on a non-circular sequence whose length is not
  NA can be considered out-of-bound (use seqlengths() and isCircular() to get the
  lengths and circularity flags of the underlying sequences). You can use trim() to
  trim these ranges. See ?`trim,GenomicRanges-method` for more information.

What are the out-of-bound ranges? Is it advisable that we get rid of them?

r ChIPpeakAnno • 4.7k views
ADD COMMENT
0
Entering edit mode

Please tag your message with the package name, ChIPpeakAnno, so the author is aware of the question.

ADD REPLY
4
Entering edit mode
Julie Zhu ★ 4.3k
@julie-zhu-3596
Last seen 5 months ago
United States

The out-of-bound ranges are ranges that are not valid coordinates in the chromosome. The warning message indicates that there are 5 such ranges in chrM and GL000199.1 from your data (overlaps). You can use trim(overlaps) instead of overlaps as input for annotatePeakInBatch. Here is a nice post to locate ranges that are out of bound https://www.biostars.org/p/98315/.

Best regards,

Julie

ADD COMMENT
1
Entering edit mode

Thanks Julie for the nice explanation. A couple of comments about the Biostar answer:

The seqlengths are stored in the GRanges object so there is no need to use a BSgenome package to get them. Therefore

which(end(mygr) > seqlengths(BSgenome.Hsapiens.UCSC.hg19)[as.character(seqnames(mygr))])

can be replaced with

which(end(mygr) > seqlengths(mygr)[as.character(seqnames(mygr))])

Note that ranges are considered out-of-bound only if the GRanges object contains the seqlengths information. If it doesn't contain the seqlengths information or contains it for some sequences only, the vectorized comparison (>) in the above code will generate NAs. However which() will ignore them so we're good.

Another thing is that in some rare situations, some ranges can start at a position < 1. These ranges are also considered out-of-bound. So a more accurate version of the above code would be something like

seqends <- seqlengths(mygr)[as.character(seqnames(mygr))]
which(start(mygr) < 1L | end(mygr) > seqends)

Finally ranges defined on a circular sequence (e.g. chrM) are never considered out-of-bound. So a completely accurate version of the above code would be even more complicated. It's actually implemented in internal utility GenomicRanges:::get_out_of_bound_index() which is used internally by the validity and "trim" methods for GRanges objects.

All this to say that the easiest and most reliable way of getting the index of out-of-bound ranges is probably to do which(mygr.original != mygr.trimmed), as suggested in the Biostar answer.

Cheers,

H.

ADD REPLY
0
Entering edit mode
Herve, Thanks for such a thorough discussion on this topic! Best regards, Julie From: "Herv� Pag�s [bioc]" <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> Reply-To: "reply+9b1b6cf2+code@bioconductor.org<mailto:reply+9b1b6cf2+code@bioconductor.org>" <reply+9b1b6cf2+code@bioconductor.org<mailto:reply+9b1b6cf2+code@bioconductor.org>> Date: Friday, April 21, 2017 3:18 PM To: Lihua Julie Zhu <julie.zhu@umassmed.edu<mailto:julie.zhu@umassmed.edu>> Subject: [bioc] C: What are out-of-bound ranges? Is it necessary to get rid of them? Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User Herv� Pag�s<https: support.bioconductor.org="" u="" 1542=""/> wrote Comment: What are out-of-bound ranges? Is it necessary to get rid of them?<https: support.bioconductor.org="" p="" 94851="" #95158="">: Thanks Julie for the nice explanation. A couple of comments about the Biostar answer: The seqlengths are stored in the GRanges object so there is no need to use a BSgenome package to get them. Therefore which(end(mygr) > seqlengths(BSgenome.Hsapiens.UCSC.hg19)[as.character(seqnames(mygr))]) can be replaced with which(end(mygr) > seqlengths(mygr)[as.character(seqnames(mygr))]) Note that ranges are considered out-of-bound only if the GRanges object contains the seqlengths information. If it doesn't contain the seqlengths information or contains it for some sequences only, the vectorized comparison (>) in the above code will generate NAs. However which() will ignore them so we're good. Another thing is that in some rare situations, some ranges can start at a position < 1. These ranges are also considered out-of-bound. So a more accurate version of the above code would be something like seqends <- seqlengths(mygr)[as.character(seqnames(mygr))] which(start(mygr) < 1L | end(mygr) > seqends) Finally ranges defined on a circular sequence (e.g. chrM) are never considered out-of-bound. So a completely accurate version of the above code would be even more complicated. It's actually implemented in internal utility GenomicRanges:::get_out_of_bound_index() which is used internally by the validity and "trim" methods for GRanges objects. All this to say that the easiest and most reliable way of getting the index of out-of-bound ranges is probably to do which(mygr.original != mygr.trimmed), as suggested in the Biostar answer. Cheers, H. ________________________________ Post tags: r You may reply via email or visit C: What are out-of-bound ranges? Is it necessary to get rid of them?
ADD REPLY

Login before adding your answer.

Traffic: 697 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6