Entering edit mode
Simon Coetzee
▴
50
@simon-coetzee-5051
Last seen 3.3 years ago
USA Cedars-Sinai Medical Center
I am receiving a warning message for nearestKNeighbors
that I don't receive for nearest that also seems to alter the output of the function.
using the data p_ran
and e_ran
below
p_ran <- GRanges(c("chr1:248836765-248839421",
"chr1:248857227-248860010",
"chr1:248905669-248907808",
"chr2:263751-265942",
"chr2:283627-286204",
"chr2:287549-288546",
"chr2:317349-318734",
"chr2:674901-679175",
"chr2:692014-694241",
"chr2:739716-741208",
"chr2:1579709-1580907"))
e_ran <- GRanges(c("chr1:248842064-248842915",
"chr1:248862738-248863858",
"chr1:248924237-248927727",
"chr2:246452-247319",
"chr2:280784-281089",
"chr2:673304-674615",
"chr2:690001-690466",
"chr2:694530-694842",
"chr2:1572461-1573381",
"chr2:349126-349492",
"chr2:1586877-1589526"))
when running nearestKNeighbors
I receive the warning:
Warning message:
In ans[] <- x :
number of items to replace is not a multiple of replacement length
This, or something else causes the result of nearest
and nearestKNeighbors
to differ at 4 positions which you can see in the code below:
p_ran
e_ran
which(nearest(p_ran, e_ran) != unlist(nearestKNeighbors(p_ran, e_ran, k = 1)))
e_ran[nearest(p_ran, e_ran)]
e_ran[unlist(nearestKNeighbors(p_ran, e_ran, k = 1))]
here are the results:
> p_ran
GRanges object with 11 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 248836765-248839421 *
[2] chr1 248857227-248860010 *
[3] chr1 248905669-248907808 *
[4] chr2 263751-265942 *
[5] chr2 283627-286204 *
[6] chr2 287549-288546 *
[7] chr2 317349-318734 *
[8] chr2 674901-679175 *
[9] chr2 692014-694241 *
[10] chr2 739716-741208 *
[11] chr2 1579709-1580907 *
-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths
> e_ran
GRanges object with 11 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 248842064-248842915 *
[2] chr1 248862738-248863858 *
[3] chr1 248924237-248927727 *
[4] chr2 246452-247319 *
[5] chr2 280784-281089 *
[6] chr2 673304-674615 *
[7] chr2 690001-690466 *
[8] chr2 694530-694842 *
[9] chr2 1572461-1573381 *
[10] chr2 349126-349492 *
[11] chr2 1586877-1589526 *
-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths
> which(nearest(p_ran, e_ran) != unlist(nearestKNeighbors(p_ran, e_ran, k = 1)))
[1] 4 7 9 11
Warning message:
In ans[] <- x :
number of items to replace is not a multiple of replacement length
> e_ran[nearest(p_ran, e_ran)]
GRanges object with 11 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 248842064-248842915 *
[2] chr1 248862738-248863858 *
[3] chr1 248924237-248927727 *
[4] chr2 280784-281089 *
[5] chr2 280784-281089 *
[6] chr2 280784-281089 *
[7] chr2 349126-349492 *
[8] chr2 673304-674615 *
[9] chr2 694530-694842 *
[10] chr2 694530-694842 *
[11] chr2 1586877-1589526 *
-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths
> e_ran[unlist(nearestKNeighbors(p_ran, e_ran, k = 1))]
GRanges object with 11 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 248842064-248842915 *
[2] chr1 248862738-248863858 *
[3] chr1 248924237-248927727 *
[4] chr2 246452-247319 *
[5] chr2 280784-281089 *
[6] chr2 280784-281089 *
[7] chr2 280784-281089 *
[8] chr2 673304-674615 *
[9] chr2 690001-690466 *
[10] chr2 694530-694842 *
[11] chr2 1572461-1573381 *
-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths
Warning message:
In ans[] <- x :
number of items to replace is not a multiple of replacement length
This is a small reproducible version of a larger data-set where this happens many more times. I don't understand why this is.
sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] GenomicRanges_1.48.0 GenomeInfoDb_1.32.2 IRanges_2.30.0
[4] S4Vectors_0.34.0 BiocGenerics_0.42.0 RhpcBLASctl_0.21-247.1
loaded via a namespace (and not attached):
[1] zlibbioc_1.42.0 compiler_4.2.1 tools_4.2.1
[4] XVector_0.36.0 GenomeInfoDbData_1.2.8 RCurl_1.98-1.7
[7] bitops_1.0-7