Hello folks,
I am pondering about an issue with the follow and precede functions GenomicRanges. I have a GRanges object with CpG-Island coordinates and another large GRanges object with the coordinates of pretty much all CpGs outside islands:
rangedCpGs GRanges object with 29467879 ranges and 0 metadata columns: seqnames ranges strand <Rle> <IRanges> <Rle> [1] chr1 [3000574, 3000575] * [2] chr1 [3000575, 3000576] * [3] chr1 [3000726, 3000727] * [4] chr1 [3001395, 3001396] * [5] chr1 [3001631, 3001632] * ... ... ... ... [29467875] chrX [166649576, 166649577] * [29467876] chrX [166649598, 166649599] * [29467877] chrX [166649696, 166649697] * [29467878] chrX [166649712, 166649713] * [29467879] chrX [166649815, 166649816] * ------- seqinfo: 20 sequences from an unspecified genome; no seqlengths
CpGislands GRanges object with 74986 ranges and 0 metadata columns: seqnames ranges strand <Rle> <IRanges> <Rle> [1] chr10 [3001252, 3001468] * [2] chr10 [3001990, 3002096] * [3] chr10 [3102618, 3102897] * [4] chr10 [3103777, 3103948] * [5] chr10 [3104651, 3105129] * ... ... ... ... [74982] chrY [2612251, 2612589] * [74983] chrY [2612783, 2612856] * [74984] chrY [2618798, 2619227] * [74985] chrY [2619483, 2619753] * [74986] chrY [2681236, 2681710] * ------- seqinfo: 33 sequences from an unspecified genome; no seqlengths
Now I would like to assign every CpG to it's nearest island and furthermore assess, whether it precedes or follows this particular island.
Hence I started with
IDclosestCGI <- nearest(rangedCpGs,CpGislands)
and wanted to check if
CpGprecedesCGI <- IDclosestCGI == follow(rangedCpGs,CpGislands)
with the intention that a TRUE would mean the CGI is following the CpG. However I got a vector completely TRUE. So I did the opposite check with precede with the same result, which I didn't expect at all.
Can somebody please explain to me, why this happens?
test <- precede(rangedCpGs,CpGislands) test2 <- follow(rangedCpGs,CpGislands) head(test) [1] 31230 31230 31230 31230 31230 31230 head(test2) [1] 31230 31230 31230 31230 31230 31230 all(test==test2) [1] TRUE sum(overlapsAny(rangedCpGs,CpGislands)) [1] 0
Thanks a lot for reading and potentially your help.
Matthias
sessionInfo() R version 3.1.1 (2014-07-10) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] GenomicRanges_1.18.4 GenomeInfoDb_1.2.5 IRanges_2.0.1 [4] S4Vectors_0.4.0 BiocGenerics_0.12.1 BiocInstaller_1.16.5 loaded via a namespace (and not attached): [1] tools_3.1.1 XVector_0.6.0
Thanks a lot - this was indeed the solution! I presumed that in case of unstranded features Genomic Ranges would fall back to sort the features according to ascending coordinates. Considering this order,
precede()
would mean before andfollow()
after in the ranked coordinates.