seqselect and window in GRanges
1
0
Entering edit mode
@arnemuellernovartiscom-2205
Last seen 8.5 years ago
Switzerland
Dear All, may I ask a basic question about the GRanges package. It seems that the functions seqselect and window treat start/end as indexes in the GRanges object rather that he actually start/end positions. Is there a method with which I can extract a sub-range from an GRanges object based on genomic coordinates rather than indexes? > gr = GRanges(seqnames="A", ranges=IRanges(start=c(10, 100), end=c(20, 200))) > gr GRanges with 2 ranges and 0 elementMetadata values seqnames ranges strand | <rle> <iranges> <rle> | [1] A [ 10, 20] * | [2] A [100, 200] * | seqlengths A NA > > window(gr, start=12, end=98) Error in solveWindowSEW(length(x), start, end, width) : Invalid sequence coordinates. Please make sure the supplied 'start', 'end' and 'width' arguments are defining a region that is within the limits of the sequence. > window(gr, start=1, end=2) GRanges with 2 ranges and 0 elementMetadata values seqnames ranges strand | <rle> <iranges> <rle> | [1] A [ 10, 20] * | [2] A [100, 200] * | seqlengths A NA > window(gr, start=9, end=40) Error in solveWindowSEW(length(x), start, end, width) : Invalid sequence coordinates. Please make sure the supplied 'start', 'end' and 'width' arguments are defining a region that is within the limits of the sequence. ... > sessionInfo() R version 2.13.0 Under development (unstable) (2010-10-31 r53501) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] GenomicRanges_1.1.38 IRanges_1.9.3 loaded via a namespace (and not attached): [1] tools_2.13.0 thanks a lot for your help, Arne [[alternative HTML version deleted]]
• 2.0k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 4 days ago
United States
On 11/30/2010 05:38 AM, arne.mueller at novartis.com wrote: > Dear All, > > may I ask a basic question about the GRanges package. It seems that the > functions seqselect and window treat start/end as indexes in the GRanges > object rather that he actually start/end positions. Is there a method with > which I can extract a sub-range from an GRanges object based on genomic > coordinates rather than indexes? Hi Arne -- it sounds a bit like you want to 1) find overlaping ranges between gr and genomic location(s) and then 2) restrict (narrow might be appropriate if looking for, say 5' regions) the ranges to those locations, along the lines of > gr1 <- gr[gr %in% GRanges("A", IRanges(12, 18))] > ranges(gr1) <- restrict(ranges(gr1), 12, 18) > gr1 GRanges with 1 range and 0 elementMetadata values seqnames ranges strand | <rle> <iranges> <rle> | [1] A [12, 18] * | seqlengths A NA gr %in% GRanges(<...>) is sugar for match(), which is sugar for findOverlaps. Martin > >> gr = GRanges(seqnames="A", ranges=IRanges(start=c(10, 100), end=c(20, > 200))) >> gr > GRanges with 2 ranges and 0 elementMetadata values > seqnames ranges strand | > <rle> <iranges> <rle> | > [1] A [ 10, 20] * | > [2] A [100, 200] * | > > seqlengths > A > NA >> >> window(gr, start=12, end=98) > Error in solveWindowSEW(length(x), start, end, width) : > Invalid sequence coordinates. > Please make sure the supplied 'start', 'end' and 'width' arguments > are defining a region that is within the limits of the sequence. >> window(gr, start=1, end=2) > GRanges with 2 ranges and 0 elementMetadata values > seqnames ranges strand | > <rle> <iranges> <rle> | > [1] A [ 10, 20] * | > [2] A [100, 200] * | > > seqlengths > A > NA >> window(gr, start=9, end=40) > Error in solveWindowSEW(length(x), start, end, width) : > Invalid sequence coordinates. > Please make sure the supplied 'start', 'end' and 'width' arguments > are defining a region that is within the limits of the sequence. > ... > > >> sessionInfo() > R version 2.13.0 Under development (unstable) (2010-10-31 r53501) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices datasets utils methods base > > other attached packages: > [1] GenomicRanges_1.1.38 IRanges_1.9.3 > > loaded via a namespace (and not attached): > [1] tools_2.13.0 > > thanks a lot for your help, > > Arne > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
ADD COMMENT
0
Entering edit mode
On Tue, Nov 30, 2010 at 6:10 AM, Martin Morgan <mtmorgan@fhcrc.org> wrote: > On 11/30/2010 05:38 AM, arne.mueller@novartis.com wrote: > > Dear All, > > > > may I ask a basic question about the GRanges package. It seems that the > > functions seqselect and window treat start/end as indexes in the GRanges > > object rather that he actually start/end positions. Is there a method > with > > which I can extract a sub-range from an GRanges object based on genomic > > coordinates rather than indexes? > > Hi Arne -- > > it sounds a bit like you want to 1) find overlaping ranges between gr > and genomic location(s) and then 2) restrict (narrow might be > appropriate if looking for, say 5' regions) the ranges to those > locations, along the lines of > > > gr1 <- gr[gr %in% GRanges("A", IRanges(12, 18))] > See subsetByOverlaps() for the above; maybe a little cleaner? > > ranges(gr1) <- restrict(ranges(gr1), 12, 18) > > gr1 > GRanges with 1 range and 0 elementMetadata values > seqnames ranges strand | > <rle> <iranges> <rle> | > [1] A [12, 18] * | > > seqlengths > A > NA > > To find the common regions from an overlap operation, this is the most general way: overlaps <- findOverlaps(ranges(gr1), subject) ranges(overlaps, ranges(gr1), subject) Not sure if that's what Arne wants though. gr %in% GRanges(<...>) is sugar for match(), which is sugar for > findOverlaps. > > Martin > > > > > >> gr = GRanges(seqnames="A", ranges=IRanges(start=c(10, 100), end=c(20, > > 200))) > >> gr > > GRanges with 2 ranges and 0 elementMetadata values > > seqnames ranges strand | > > <rle> <iranges> <rle> | > > [1] A [ 10, 20] * | > > [2] A [100, 200] * | > > > > seqlengths > > A > > NA > >> > >> window(gr, start=12, end=98) > > Error in solveWindowSEW(length(x), start, end, width) : > > Invalid sequence coordinates. > > Please make sure the supplied 'start', 'end' and 'width' arguments > > are defining a region that is within the limits of the sequence. > >> window(gr, start=1, end=2) > > GRanges with 2 ranges and 0 elementMetadata values > > seqnames ranges strand | > > <rle> <iranges> <rle> | > > [1] A [ 10, 20] * | > > [2] A [100, 200] * | > > > > seqlengths > > A > > NA > >> window(gr, start=9, end=40) > > Error in solveWindowSEW(length(x), start, end, width) : > > Invalid sequence coordinates. > > Please make sure the supplied 'start', 'end' and 'width' arguments > > are defining a region that is within the limits of the sequence. > > ... > > > > > >> sessionInfo() > > R version 2.13.0 Under development (unstable) (2010-10-31 r53501) > > Platform: x86_64-unknown-linux-gnu (64-bit) > > > > locale: > > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > > [9] LC_ADDRESS=C LC_TELEPHONE=C > > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > > > attached base packages: > > [1] stats graphics grDevices datasets utils methods base > > > > other attached packages: > > [1] GenomicRanges_1.1.38 IRanges_1.9.3 > > > > loaded via a namespace (and not attached): > > [1] tools_2.13.0 > > > > thanks a lot for your help, > > > > Arne > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Thanks a lot for for the replies for finding a subset of features in a GRanges objects - findOverlaps seems to be the way to go for (as I'm not very much into sugar ;-) regards, arne Martin Morgan <mtmorgan@fhcrc.org> 11/30/2010 03:15 PM To arne.mueller@novartis.com cc bioconductor@stat.math.ethz.ch Subject Re: [BioC] seqselect and window in GRanges On 11/30/2010 05:38 AM, arne.mueller@novartis.com wrote: > Dear All, > > may I ask a basic question about the GRanges package. It seems that the > functions seqselect and window treat start/end as indexes in the GRanges > object rather that he actually start/end positions. Is there a method with > which I can extract a sub-range from an GRanges object based on genomic > coordinates rather than indexes? Hi Arne -- it sounds a bit like you want to 1) find overlaping ranges between gr and genomic location(s) and then 2) restrict (narrow might be appropriate if looking for, say 5' regions) the ranges to those locations, along the lines of > gr1 <- gr[gr %in% GRanges("A", IRanges(12, 18))] > ranges(gr1) <- restrict(ranges(gr1), 12, 18) > gr1 GRanges with 1 range and 0 elementMetadata values seqnames ranges strand | <rle> <iranges> <rle> | [1] A [12, 18] * | seqlengths A NA gr %in% GRanges(<...>) is sugar for match(), which is sugar for findOverlaps. Martin > >> gr = GRanges(seqnames="A", ranges=IRanges(start=c(10, 100), end=c(20, > 200))) >> gr > GRanges with 2 ranges and 0 elementMetadata values > seqnames ranges strand | > <rle> <iranges> <rle> | > [1] A [ 10, 20] * | > [2] A [100, 200] * | > > seqlengths > A > NA >> >> window(gr, start=12, end=98) > Error in solveWindowSEW(length(x), start, end, width) : > Invalid sequence coordinates. > Please make sure the supplied 'start', 'end' and 'width' arguments > are defining a region that is within the limits of the sequence. >> window(gr, start=1, end=2) > GRanges with 2 ranges and 0 elementMetadata values > seqnames ranges strand | > <rle> <iranges> <rle> | > [1] A [ 10, 20] * | > [2] A [100, 200] * | > > seqlengths > A > NA >> window(gr, start=9, end=40) > Error in solveWindowSEW(length(x), start, end, width) : > Invalid sequence coordinates. > Please make sure the supplied 'start', 'end' and 'width' arguments > are defining a region that is within the limits of the sequence. > ... > > >> sessionInfo() > R version 2.13.0 Under development (unstable) (2010-10-31 r53501) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices datasets utils methods base > > other attached packages: > [1] GenomicRanges_1.1.38 IRanges_1.9.3 > > loaded via a namespace (and not attached): > [1] tools_2.13.0 > > thanks a lot for your help, > > Arne > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793 [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 549 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6