Can precede() and follow() in some cases return identical output?
1
0
Entering edit mode
Matthias Z. ▴ 20
@matthias-z-8768
Last seen 7.4 years ago
University Medical Center of Münster, G…

Hello folks,

I am pondering about an issue with the follow and precede functions GenomicRanges. I have a GRanges object with CpG-Island coordinates and another large GRanges object with the coordinates of pretty much all CpGs outside islands:


rangedCpGs
GRanges object with 29467879 ranges and 0 metadata columns:
             seqnames                 ranges strand
                <Rle>              <IRanges>  <Rle>
         [1]     chr1     [3000574, 3000575]      *
         [2]     chr1     [3000575, 3000576]      *
         [3]     chr1     [3000726, 3000727]      *
         [4]     chr1     [3001395, 3001396]      *
         [5]     chr1     [3001631, 3001632]      *
         ...      ...                    ...    ...
  [29467875]     chrX [166649576, 166649577]      *
  [29467876]     chrX [166649598, 166649599]      *
  [29467877]     chrX [166649696, 166649697]      *
  [29467878]     chrX [166649712, 166649713]      *
  [29467879]     chrX [166649815, 166649816]      *
  -------
  seqinfo: 20 sequences from an unspecified genome; no seqlengths

 

CpGislands
GRanges object with 74986 ranges and 0 metadata columns:
          seqnames             ranges strand
             <Rle>          <IRanges>  <Rle>
      [1]    chr10 [3001252, 3001468]      *
      [2]    chr10 [3001990, 3002096]      *
      [3]    chr10 [3102618, 3102897]      *
      [4]    chr10 [3103777, 3103948]      *
      [5]    chr10 [3104651, 3105129]      *
      ...      ...                ...    ...
  [74982]     chrY [2612251, 2612589]      *
  [74983]     chrY [2612783, 2612856]      *
  [74984]     chrY [2618798, 2619227]      *
  [74985]     chrY [2619483, 2619753]      *
  [74986]     chrY [2681236, 2681710]      *
  -------
  seqinfo: 33 sequences from an unspecified genome; no seqlengths

Now I would like to assign every CpG to it's nearest island and furthermore assess, whether it precedes or follows this particular island. 

Hence I started with

IDclosestCGI <- nearest(rangedCpGs,CpGislands)

and wanted to check if

CpGprecedesCGI <- IDclosestCGI == follow(rangedCpGs,CpGislands)

with the intention that a TRUE would mean the CGI is following the CpG. However I got a vector completely TRUE.  So I did the opposite check with precede with the same result, which I didn't expect at all.

Can somebody please explain to me, why this happens?

test <-  precede(rangedCpGs,CpGislands)
test2 <-  follow(rangedCpGs,CpGislands)
head(test)
[1] 31230 31230 31230 31230 31230 31230
head(test2)
[1] 31230 31230 31230 31230 31230 31230
all(test==test2)
[1] TRUE
sum(overlapsAny(rangedCpGs,CpGislands))
[1] 0

Thanks a lot for reading and potentially your help.

Matthias


 sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
[1] GenomicRanges_1.18.4 GenomeInfoDb_1.2.5   IRanges_2.0.1
[4] S4Vectors_0.4.0      BiocGenerics_0.12.1  BiocInstaller_1.16.5

loaded via a namespace (and not attached):
[1] tools_3.1.1   XVector_0.6.0

 

 

 

genomicranges • 1.4k views
ADD COMMENT
2
Entering edit mode
@michael-lawrence-3846
Last seen 3.1 years ago
United States

Most likely need an ignore.strand=TRUE on those calls. This is a really confusing consequence of strand-dependence. Yes, those features are unstranded, but to GRanges, that apparently means "either direction" instead of "left-right". It's tough to imagine cases where one would want it to mean "either direction", but that's how it works, I guess.

ADD COMMENT
0
Entering edit mode

Thanks a lot - this was indeed the solution! I presumed that in case of unstranded features Genomic Ranges would fall back to sort the features according to ascending coordinates. Considering this order,  precede() would mean before and follow() after in the ranked coordinates. 

ADD REPLY

Login before adding your answer.

Traffic: 441 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6