Search
Question: seqselect and window in GRanges
0
gravatar for arne.mueller@novartis.com
8.0 years ago by
Switzerland
Dear All, may I ask a basic question about the GRanges package. It seems that the functions seqselect and window treat start/end as indexes in the GRanges object rather that he actually start/end positions. Is there a method with which I can extract a sub-range from an GRanges object based on genomic coordinates rather than indexes? > gr = GRanges(seqnames="A", ranges=IRanges(start=c(10, 100), end=c(20, 200))) > gr GRanges with 2 ranges and 0 elementMetadata values seqnames ranges strand | <rle> <iranges> <rle> | [1] A [ 10, 20] * | [2] A [100, 200] * | seqlengths A NA > > window(gr, start=12, end=98) Error in solveWindowSEW(length(x), start, end, width) : Invalid sequence coordinates. Please make sure the supplied 'start', 'end' and 'width' arguments are defining a region that is within the limits of the sequence. > window(gr, start=1, end=2) GRanges with 2 ranges and 0 elementMetadata values seqnames ranges strand | <rle> <iranges> <rle> | [1] A [ 10, 20] * | [2] A [100, 200] * | seqlengths A NA > window(gr, start=9, end=40) Error in solveWindowSEW(length(x), start, end, width) : Invalid sequence coordinates. Please make sure the supplied 'start', 'end' and 'width' arguments are defining a region that is within the limits of the sequence. ... > sessionInfo() R version 2.13.0 Under development (unstable) (2010-10-31 r53501) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] GenomicRanges_1.1.38 IRanges_1.9.3 loaded via a namespace (and not attached): [1] tools_2.13.0 thanks a lot for your help, Arne [[alternative HTML version deleted]]
ADD COMMENTlink modified 8.0 years ago by Martin Morgan ♦♦ 22k • written 8.0 years ago by arne.mueller@novartis.com200
0
gravatar for Martin Morgan
8.0 years ago by
Martin Morgan ♦♦ 22k
United States
Martin Morgan ♦♦ 22k wrote:
On 11/30/2010 05:38 AM, arne.mueller at novartis.com wrote: > Dear All, > > may I ask a basic question about the GRanges package. It seems that the > functions seqselect and window treat start/end as indexes in the GRanges > object rather that he actually start/end positions. Is there a method with > which I can extract a sub-range from an GRanges object based on genomic > coordinates rather than indexes? Hi Arne -- it sounds a bit like you want to 1) find overlaping ranges between gr and genomic location(s) and then 2) restrict (narrow might be appropriate if looking for, say 5' regions) the ranges to those locations, along the lines of > gr1 <- gr[gr %in% GRanges("A", IRanges(12, 18))] > ranges(gr1) <- restrict(ranges(gr1), 12, 18) > gr1 GRanges with 1 range and 0 elementMetadata values seqnames ranges strand | <rle> <iranges> <rle> | [1] A [12, 18] * | seqlengths A NA gr %in% GRanges(<...>) is sugar for match(), which is sugar for findOverlaps. Martin > >> gr = GRanges(seqnames="A", ranges=IRanges(start=c(10, 100), end=c(20, > 200))) >> gr > GRanges with 2 ranges and 0 elementMetadata values > seqnames ranges strand | > <rle> <iranges> <rle> | > [1] A [ 10, 20] * | > [2] A [100, 200] * | > > seqlengths > A > NA >> >> window(gr, start=12, end=98) > Error in solveWindowSEW(length(x), start, end, width) : > Invalid sequence coordinates. > Please make sure the supplied 'start', 'end' and 'width' arguments > are defining a region that is within the limits of the sequence. >> window(gr, start=1, end=2) > GRanges with 2 ranges and 0 elementMetadata values > seqnames ranges strand | > <rle> <iranges> <rle> | > [1] A [ 10, 20] * | > [2] A [100, 200] * | > > seqlengths > A > NA >> window(gr, start=9, end=40) > Error in solveWindowSEW(length(x), start, end, width) : > Invalid sequence coordinates. > Please make sure the supplied 'start', 'end' and 'width' arguments > are defining a region that is within the limits of the sequence. > ... > > >> sessionInfo() > R version 2.13.0 Under development (unstable) (2010-10-31 r53501) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices datasets utils methods base > > other attached packages: > [1] GenomicRanges_1.1.38 IRanges_1.9.3 > > loaded via a namespace (and not attached): > [1] tools_2.13.0 > > thanks a lot for your help, > > Arne > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
ADD COMMENTlink written 8.0 years ago by Martin Morgan ♦♦ 22k
On Tue, Nov 30, 2010 at 6:10 AM, Martin Morgan <mtmorgan@fhcrc.org> wrote: > On 11/30/2010 05:38 AM, arne.mueller@novartis.com wrote: > > Dear All, > > > > may I ask a basic question about the GRanges package. It seems that the > > functions seqselect and window treat start/end as indexes in the GRanges > > object rather that he actually start/end positions. Is there a method > with > > which I can extract a sub-range from an GRanges object based on genomic > > coordinates rather than indexes? > > Hi Arne -- > > it sounds a bit like you want to 1) find overlaping ranges between gr > and genomic location(s) and then 2) restrict (narrow might be > appropriate if looking for, say 5' regions) the ranges to those > locations, along the lines of > > > gr1 <- gr[gr %in% GRanges("A", IRanges(12, 18))] > See subsetByOverlaps() for the above; maybe a little cleaner? > > ranges(gr1) <- restrict(ranges(gr1), 12, 18) > > gr1 > GRanges with 1 range and 0 elementMetadata values > seqnames ranges strand | > <rle> <iranges> <rle> | > [1] A [12, 18] * | > > seqlengths > A > NA > > To find the common regions from an overlap operation, this is the most general way: overlaps <- findOverlaps(ranges(gr1), subject) ranges(overlaps, ranges(gr1), subject) Not sure if that's what Arne wants though. gr %in% GRanges(<...>) is sugar for match(), which is sugar for > findOverlaps. > > Martin > > > > > >> gr = GRanges(seqnames="A", ranges=IRanges(start=c(10, 100), end=c(20, > > 200))) > >> gr > > GRanges with 2 ranges and 0 elementMetadata values > > seqnames ranges strand | > > <rle> <iranges> <rle> | > > [1] A [ 10, 20] * | > > [2] A [100, 200] * | > > > > seqlengths > > A > > NA > >> > >> window(gr, start=12, end=98) > > Error in solveWindowSEW(length(x), start, end, width) : > > Invalid sequence coordinates. > > Please make sure the supplied 'start', 'end' and 'width' arguments > > are defining a region that is within the limits of the sequence. > >> window(gr, start=1, end=2) > > GRanges with 2 ranges and 0 elementMetadata values > > seqnames ranges strand | > > <rle> <iranges> <rle> | > > [1] A [ 10, 20] * | > > [2] A [100, 200] * | > > > > seqlengths > > A > > NA > >> window(gr, start=9, end=40) > > Error in solveWindowSEW(length(x), start, end, width) : > > Invalid sequence coordinates. > > Please make sure the supplied 'start', 'end' and 'width' arguments > > are defining a region that is within the limits of the sequence. > > ... > > > > > >> sessionInfo() > > R version 2.13.0 Under development (unstable) (2010-10-31 r53501) > > Platform: x86_64-unknown-linux-gnu (64-bit) > > > > locale: > > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > > [9] LC_ADDRESS=C LC_TELEPHONE=C > > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > > > attached base packages: > > [1] stats graphics grDevices datasets utils methods base > > > > other attached packages: > > [1] GenomicRanges_1.1.38 IRanges_1.9.3 > > > > loaded via a namespace (and not attached): > > [1] tools_2.13.0 > > > > thanks a lot for your help, > > > > Arne > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLYlink written 8.0 years ago by Michael Lawrence10k
Thanks a lot for for the replies for finding a subset of features in a GRanges objects - findOverlaps seems to be the way to go for (as I'm not very much into sugar ;-) regards, arne Martin Morgan <mtmorgan@fhcrc.org> 11/30/2010 03:15 PM To arne.mueller@novartis.com cc bioconductor@stat.math.ethz.ch Subject Re: [BioC] seqselect and window in GRanges On 11/30/2010 05:38 AM, arne.mueller@novartis.com wrote: > Dear All, > > may I ask a basic question about the GRanges package. It seems that the > functions seqselect and window treat start/end as indexes in the GRanges > object rather that he actually start/end positions. Is there a method with > which I can extract a sub-range from an GRanges object based on genomic > coordinates rather than indexes? Hi Arne -- it sounds a bit like you want to 1) find overlaping ranges between gr and genomic location(s) and then 2) restrict (narrow might be appropriate if looking for, say 5' regions) the ranges to those locations, along the lines of > gr1 <- gr[gr %in% GRanges("A", IRanges(12, 18))] > ranges(gr1) <- restrict(ranges(gr1), 12, 18) > gr1 GRanges with 1 range and 0 elementMetadata values seqnames ranges strand | <rle> <iranges> <rle> | [1] A [12, 18] * | seqlengths A NA gr %in% GRanges(<...>) is sugar for match(), which is sugar for findOverlaps. Martin > >> gr = GRanges(seqnames="A", ranges=IRanges(start=c(10, 100), end=c(20, > 200))) >> gr > GRanges with 2 ranges and 0 elementMetadata values > seqnames ranges strand | > <rle> <iranges> <rle> | > [1] A [ 10, 20] * | > [2] A [100, 200] * | > > seqlengths > A > NA >> >> window(gr, start=12, end=98) > Error in solveWindowSEW(length(x), start, end, width) : > Invalid sequence coordinates. > Please make sure the supplied 'start', 'end' and 'width' arguments > are defining a region that is within the limits of the sequence. >> window(gr, start=1, end=2) > GRanges with 2 ranges and 0 elementMetadata values > seqnames ranges strand | > <rle> <iranges> <rle> | > [1] A [ 10, 20] * | > [2] A [100, 200] * | > > seqlengths > A > NA >> window(gr, start=9, end=40) > Error in solveWindowSEW(length(x), start, end, width) : > Invalid sequence coordinates. > Please make sure the supplied 'start', 'end' and 'width' arguments > are defining a region that is within the limits of the sequence. > ... > > >> sessionInfo() > R version 2.13.0 Under development (unstable) (2010-10-31 r53501) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices datasets utils methods base > > other attached packages: > [1] GenomicRanges_1.1.38 IRanges_1.9.3 > > loaded via a namespace (and not attached): > [1] tools_2.13.0 > > thanks a lot for your help, > > Arne > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793 [[alternative HTML version deleted]]
ADD REPLYlink written 8.0 years ago by arne.mueller@novartis.com200
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 401 users visited in the last hour