Entering edit mode
Chris Whelan
▴
60
@chris-whelan-4779
Last seen 10.2 years ago
Hi all,
I'm having an issue with the nearest() method in GenomicRanges. I've
created GRanges from two data frames, from data for a species genome
assembly with lots of scaffolds:
> xs <- with(x.data, GRanges(seqnames=Scaffold,
ranges=IRanges(start=Start, end=Stop), strand=Strand))
> xs
GRanges with 989 ranges and 0 elementMetadata cols:
seqnames ranges strand
<rle> <iranges> <rle>
[1] GL397480 [ 1477346, 1477813] -
[2] GL397446 [ 869823, 870295] +
... ... ... ...
[988] GL397464 [1138154, 1138427] -
[989] GL397464 [1138154, 1138427] -
---
seqlengths:
ADFV01135143 ADFV01136404 GL397262 ... GL397832
GL397836
NA NA NA ... NA
NA
> cpg.islands <- with(cpg.island.data, GRanges(seqnames=V1,
ranges=IRanges(start=V4, end=V5), strand='*'))
> cpg.islands
GRanges with 88968 ranges and 0 elementMetadata cols:
seqnames ranges strand
<rle> <iranges> <rle>
[1] GL397261 [ 4409, 5062] *
[2] GL397261 [ 66765, 67010] *
... ... ... ...
[88967] ADFV01162706 [1834, 2052] *
[88968] ADFV01162751 [ 226, 884] *
---
seqlengths:
ADFV01134425 ADFV01134436 ADFV01134442 ... GL405215
GL405216
NA NA NA ... NA
NA
And I'd like to use nearest to find the closest island to each x:
> nearest(xs, cpg.islands)
Error in Rle(values = callGeneric(runValue(e1)[which1],
runValue(e2)[which2]), :
error in evaluating the argument 'values' in selecting a method for
function 'Rle': Error in Ops.factor(runValue(e1)[which1],
runValue(e2)[which2]) :
level sets of factors are different
> traceback()
12: Rle(values = callGeneric(runValue(e1)[which1],
runValue(e2)[which2]),
lengths = diffWithInitialZero(ends), check = FALSE)
11: seqnames(x) != seqnames(y)
10: seqnames(x) != seqnames(y)
9: `seqselect<-`(`*tmp*`, seqnames(x) != seqnames(y), value = NA)
8: `seqselect<-`(`*tmp*`, seqnames(x) != seqnames(y), value = NA)
7: .local(x, y, ...)
6: distance(x[midx], subject[p[midx]])
5: distance(x[midx], subject[p[midx]])
4: .nearest(x, subject, ignore.strand, ...)
3: .local(x, subject, ...)
2: nearest(xs, cpg.islands)
1: nearest(xs, cpg.islands)
This same data works fine with other methods like findOverlaps():
> findOverlaps(xs, cpg.islands)
Hits of length 225
queryLength: 989
subjectLength: 88968
queryHits subjectHits
<integer> <integer>
1 2 75585
2 6 36857
... ... ...
224 963 84872
225 964 84872
Any idea what I might be doing wrong? Here's my sessionInfo():
> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=C LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
[5] LC_MONETARY=C LC_MESSAGES=C
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] locfit_1.5-8 GenomicFeatures_1.8.2 AnnotationDbi_1.18.1
[4] Biobase_2.16.0 multicore_0.1-7 rtracklayer_1.16.3
[7] GenomicRanges_1.8.9 IRanges_1.14.4 BiocGenerics_0.2.0
[10] reshape2_1.2.1 ggplot2_0.9.1 BiocInstaller_1.4.7
loaded via a namespace (and not attached):
[1] BSgenome_1.24.0 Biostrings_2.24.1 DBI_0.2-5
MASS_7.3-19
[5] RColorBrewer_1.0-5 RCurl_1.91-1 RSQLite_0.11.1
Rsamtools_1.8.5
[9] XML_3.9-4 biomaRt_2.12.0 bitops_1.0-4.1
colorspace_1.1-1
[13] compiler_2.15.1 dichromat_1.2-4 digest_0.5.2
grid_2.15.1
[17] labeling_0.1 lattice_0.20-6 memoise_0.1
munsell_0.3
[21] plyr_1.7.1 proto_0.3-9.2 scales_0.2.1
stats4_2.15.1
[25] stringr_0.6.1 tools_2.15.1 zlibbioc_1.2.0
Thanks in advance,
Chris
[[alternative HTML version deleted]]