GRanges returning warning message: 'ranges' contains values outside of sequence bounds.
2
1
Entering edit mode
@piotrgrabowski-6939
Last seen 8.1 years ago
Germany

Hello,

I have a following question. I am using GRanges in R 3.1.1 (everything up to date).

What I am trying to do with it:

1) Store coordinates of exons with coupled variables calculated in previous steps, i.e. all Rattus norvegicus Rn4 exons with respective PSI values (Percent Spliced-In) for analysis of splicing. The exons were selected and the list contains only ~ 10.000 of them.

2) Using getSeq I fetched sequences of certain length upstream from the exon and downstream which are appended into columns for every given exon.

Up to this point I had no problems whatsoever. However, now for some operations I would like to subset some of the exons. When I do, using subset(PSIranges,PSIranges\$PSI.deltaWTKO < 0.05) to select entries with values of delta PSI < 0.05 I get a following warning:

Warning message:
In valid.GenomicRanges.seqinfo(x) :
'ranges' contains values outside of sequence bounds. See ?trim to subset ranges.

Does any one have an idea why this warnings kicks in only after subsetting the GRanges object ? Ultimately, I just want to return certain number of rows from the "PSIranges" GRanges object. How come the warning arises in such situation ?

I don't want to leave it as it is. I would be very thankful for any help. In case it's needed, I can share the GRanges object saved in a repository to clone into.

Best,

Piotr

granges • 3.4k views
0
Entering edit mode

Can you share the code with us?

0
Entering edit mode

Which part exactly ? The script is a bit large and would be harder to read. I can show the part where I create the GRanges object, fetch flanking sequences and append the sequence as metadata in the GRanges object ?

1
Entering edit mode
@herve-pages-1542
Last seen 2 hours ago
Seattle, WA, United States

Hi Piotr,

Sorry for the late answer. The validity method for GenomicRanges objects is warning you that your GRanges object contains ranges that go beyond the limits of the chromosome. It's just a warning, the object is still considered to be a valid object. What's admittedly confusing is that you get this warning when you subset the object. Subsetting doesn't modify the ranges, it just picks up some of them. What's happening however is that subsetting, like many other operations, triggers a call to the validity method. So if your initial object contains ranges that go beyond the limits of the chromosome, you'll get a warning from the validity method each time you do an operation on it that triggers validation. Validation has a cost and we should try to avoid it whenever it's not necessary. It seems that subsetting a GRanges object is an operation that could be implemented without the need to validate the result (assuming that the original object is valid, we know that the subsetted object is also going to be valid). So I'll look at this and will maybe get rid of this validation but first I want to make sure I'm not overlooking something here.

Anyway, that doesn't solve the more general issue that a GRanges object with ranges that go beyond the limits of the chromosome will trigger warnings every now and then down the road. It feels that issuing the warning only once (the first time this situation is detected) should be enough but I can't really think of a good way to achieve this at the moment.

H.

0
Entering edit mode

BTW, I don't think that you have "everything up to date". With the latest BioC (3.0), the warning is different:

Warning message:
In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) :
GRanges object contains 3 out-of-bound ranges located on sequences chr1
and chr2. Note that only ranges located on a non-circular sequence
whose length is not NA can be considered out-of-bound (use seqlengths()
and isCircular() to get the lengths and circularity flags of the
underlying sequences). You can use trim() to trim these ranges. See
?trim,GenomicRanges-method for more information.

Also subsetting the object doesn't trigger the warning anymore so everything is fine. See my "official answer". Cheers.

1
Entering edit mode
@herve-pages-1542
Last seen 2 hours ago
Seattle, WA, United States

Please update to the latest BioC release (3.0). With this release:

library(GenomicRanges)
example(GRanges)

gr2 <- shift(gr, -3)
## Warning message:
## In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) :
##  GRanges object contains 3 out-of-bound ranges located on sequences chr1
##  and chr2. Note that only ranges located on a non-circular sequence
##  whose length is not NA can be considered out-of-bound (use seqlengths()
##  and isCircular() to get the lengths and circularity flags of the
##  underlying sequences). You can use trim() to trim these ranges. See
##  ?trim,GenomicRanges-method for more information.

gr2
## GRanges object with 10 ranges and 2 metadata columns:
##    seqnames    ranges strand |     score                GC
##       <Rle> <IRanges>  <Rle> | <integer>         <numeric>
##  a     chr1   [-2, 7]      - |         1                 1
##  b     chr2   [-1, 7]      + |         2 0.888888888888889
##  c     chr2   [ 0, 7]      + |         3 0.777777777777778
##  d     chr2   [ 1, 7]      * |         4 0.666666666666667
##  e     chr1   [ 2, 7]      * |         5 0.555555555555556
##  f     chr1   [ 3, 7]      + |         6 0.444444444444444
##  g     chr3   [ 4, 7]      + |         7 0.333333333333333
##  h     chr3   [ 5, 7]      + |         8 0.222222222222222
##  i     chr3   [ 6, 7]      - |         9 0.111111111111111
##  j     chr3   [ 7, 7]      - |        10                 0
##  -------
##  seqinfo: 3 sequences from mock1 genome

gr[1:4]  # no warning
rev(gr)  # no warning

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8       LC_NAME=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
[1] GenomicRanges_1.18.3 GenomeInfoDb_1.2.3   IRanges_2.0.0
[4] S4Vectors_0.4.0      BiocGenerics_0.12.1

loaded via a namespace (and not attached):
[1] tools_3.1.2   XVector_0.6.0

Cheers,

H.