Entering edit mode
Chris Whelan
▴
60
@chris-whelan-4779
Last seen 10.2 years ago
Hi Valerie,
Thanks for the update to GenomicRanges; it fixed the problem. Just
wanted to confirm that for you and the list.
Chris
On Jul 31, 2012, at 9:45 AM, Valerie Obenchain wrote:
Thanks for sending the test data. This is now fixed in GenomicRanges
release 1.8.10 and was fixed in devel a few months ago. The 1.8.10
version should be available through biocLite() ~9am PST Wednesday.
The problem was that distance(), which is called from within
nearest(), was not handling the case of no matching seqnames.
> x.data <- read.table("x.csv", header=TRUE, sep="\t")
> xs <- with(x.data, GRanges(seqnames=Scaffold,
+ ranges=IRanges(start=Start,end=Stop), strand=Strand))
>
> cpg.island.data <- read.table('cpg_islands_head.gff')
> cpg.islands <- with(cpg.island.data, GRanges(seqnames=V1,
+ ranges=IRanges(start=V4, end=V5), strand='*'))
In these data, none of the seqnames in xs match those in cpg.islands,
> runValue(seqnames(xs)) %in% runValue(seqnames(cpg.islands))
[1] FALSE FALSE FALSE
Because we have no common seqnames we get a vector of NAs as the
result.
> nearest(xs, cpg.islands)
[1] NA NA NA NA NA NA NA NA NA
If there were matching seqnames you would see results for those
values. For example, if we set the seqnames in cpg.islands to match
one seqname in xs,
> seqlevels(cpg.islands) <- "GL397446"
> nearest(xs, cpg.islands)
[1] NA 10 10 10 NA NA NA NA NA
Valerie
On 07/30/2012 08:17 PM, Chris Whelan wrote:
Hi Valerie,
I get this behavior using only a small subset of my data - attached. I
am loading it like this:
x.data <- read.table("x_data.txt", header=T, sep="\t")
cpg.island.data <- read.table('cpg_islands_head.gff')
In doing some more testing, it seems like I get this error if not all
of the seqnames in the query GRanges are present in the subject
GRanges. Is that data not supported?
I tried setting the levels with:
seqlevels(xs) <- seqlevels(cpg.islands)
But then I just get back a vector of NA's from nearest().
Thanks,
Chris
On Jul 30, 2012, at 7:08 PM, Valerie Obenchain wrote:
> Hi Chris,
>
> It's difficult to know what's going on without a reproducible
example.
> Can you attach a small subset of your GRanges that still triggers
the
> error? If not, is there a place I can get the data for testing?
>
> Valerie
>
> On 07/28/12 12:03, Chris Whelan wrote:
>> Hi all,
>>
>> I'm having an issue with the nearest() method in GenomicRanges.
I've
>> created GRanges from two data frames, from data for a species
genome
>> assembly with lots of scaffolds:
>>
>>> xs<- with(x.data, GRanges(seqnames=Scaffold,
ranges=IRanges(start=Start, end=Stop), strand=Strand))
>>> xs
>> GRanges with 989 ranges and 0 elementMetadata cols:
>> seqnames ranges strand
>> <rle> <iranges> <rle>
>> [1] GL397480 [ 1477346, 1477813] -
>> [2] GL397446 [ 869823, 870295] +
>> ... ... ... ...
>> [988] GL397464 [1138154, 1138427] -
>> [989] GL397464 [1138154, 1138427] -
>> ---
>> seqlengths:
>> ADFV01135143 ADFV01136404 GL397262 ... GL397832
GL397836
>> NA NA NA ... NA
NA
>>
>>> cpg.islands<- with(cpg.island.data, GRanges(seqnames=V1,
ranges=IRanges(start=V4, end=V5), strand='*'))
>>> cpg.islands
>> GRanges with 88968 ranges and 0 elementMetadata cols:
>> seqnames ranges strand
>> <rle> <iranges> <rle>
>> [1] GL397261 [ 4409, 5062] *
>> [2] GL397261 [ 66765, 67010] *
>> ... ... ... ...
>> [88967] ADFV01162706 [1834, 2052] *
>> [88968] ADFV01162751 [ 226, 884] *
>> ---
>> seqlengths:
>> ADFV01134425 ADFV01134436 ADFV01134442 ... GL405215
GL405216
>> NA NA NA ... NA
NA
>>
>> And I'd like to use nearest to find the closest island to each x:
>>
>>> nearest(xs, cpg.islands)
>> Error in Rle(values = callGeneric(runValue(e1)[which1],
>> runValue(e2)[which2]), :
>> error in evaluating the argument 'values' in selecting a method
for
>> function 'Rle': Error in Ops.factor(runValue(e1)[which1],
>> runValue(e2)[which2]) :
>> level sets of factors are different
>>> traceback()
>> 12: Rle(values = callGeneric(runValue(e1)[which1],
runValue(e2)[which2]),
>> lengths = diffWithInitialZero(ends), check = FALSE)
>> 11: seqnames(x) != seqnames(y)
>> 10: seqnames(x) != seqnames(y)
>> 9: `seqselect<-`(`*tmp*`, seqnames(x) != seqnames(y), value = NA)
>> 8: `seqselect<-`(`*tmp*`, seqnames(x) != seqnames(y), value = NA)
>> 7: .local(x, y, ...)
>> 6: distance(x[midx], subject[p[midx]])
>> 5: distance(x[midx], subject[p[midx]])
>> 4: .nearest(x, subject, ignore.strand, ...)
>> 3: .local(x, subject, ...)
>> 2: nearest(xs, cpg.islands)
>> 1: nearest(xs, cpg.islands)
>>
>> This same data works fine with other methods like findOverlaps():
>>
>>> findOverlaps(xs, cpg.islands)
>> Hits of length 225
>> queryLength: 989
>> subjectLength: 88968
>> queryHits subjectHits
>> <integer> <integer>
>> 1 2 75585
>> 2 6 36857
>> ... ... ...
>> 224 963 84872
>> 225 964 84872
>>
>> Any idea what I might be doing wrong? Here's my sessionInfo():
>>
>>> sessionInfo()
>> R version 2.15.1 (2012-06-22)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=C LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
>> [5] LC_MONETARY=C LC_MESSAGES=C
>> [7] LC_PAPER=C LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods
base
>>
>> other attached packages:
>> [1] locfit_1.5-8 GenomicFeatures_1.8.2
AnnotationDbi_1.18.1
>> [4] Biobase_2.16.0 multicore_0.1-7 rtracklayer_1.16.3
>> [7] GenomicRanges_1.8.9 IRanges_1.14.4 BiocGenerics_0.2.0
>> [10] reshape2_1.2.1 ggplot2_0.9.1
BiocInstaller_1.4.7
>>
>> loaded via a namespace (and not attached):
>> [1] BSgenome_1.24.0 Biostrings_2.24.1 DBI_0.2-5
MASS_7.3-19
>> [5] RColorBrewer_1.0-5 RCurl_1.91-1 RSQLite_0.11.1
Rsamtools_1.8.5
>> [9] XML_3.9-4 biomaRt_2.12.0 bitops_1.0-4.1
colorspace_1.1-1
>> [13] compiler_2.15.1 dichromat_1.2-4 digest_0.5.2
grid_2.15.1
>> [17] labeling_0.1 lattice_0.20-6 memoise_0.1
munsell_0.3
>> [21] plyr_1.7.1 proto_0.3-9.2 scales_0.2.1
stats4_2.15.1
>> [25] stringr_0.6.1 tools_2.15.1 zlibbioc_1.2.0
>>
>> Thanks in advance,
>>
>> Chris
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor@r-project.org<mailto:bioconductor@r-project.org>
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
[[alternative HTML version deleted]]