Help with sliding window analysis on GRanges object
1
0
Entering edit mode
@vince-s-buffalo-4618
Last seen 9.6 years ago
United States
Sorry to return to this older topic, but I'm curious — what's the reasoning behind allocating tiles of 1L then using resize? tiles <- unlist(tileGenome(seqinfo(snps), tilewidth=1L)) windows <- resize(tiles, 500L) # you will get a warning about trimming Also, in general, why does tileGenome always return a list rather than a GenomicRanges object? Vince On Mon, Mar 10, 2014 at 8:27 AM, Stefano Iantorno <si3@sanger.ac.uk> wrote: > Thanks, that worked beautifully. I ended up doing the following: > > > > tileranges <- unlist(tileGenome(seqinfo(snps), tilewidth=500)) > > hits.df <- as.data.frame(findOverlaps(tileranges, snps)) > > > > I can then subset tileranges and snps with hits.df$queryHits or > hits.df$subjectHits to retrieve all the information in the original > Granges object. > > Although not overlapping sliding windows (these are more "bins") I think > it might be good enough for my purposes. > > Best, > > > > - Stefano > > > > > > > > From: Michael Lawrence [mailto:lawrence.michael@gene.com] > Sent: 09 March 2014 00:44 > To: Stefano Iantorno > Cc: bioconductor@r-project.org > Subject: Re: [BioC] Help with sliding window analysis on GRanges object > > > > I just realized that this will not scale well for the whole genome. So > you might just want to summarize with the Rle utilities or take 500bp > around each SNP to form your windows. Depends on your goal. > > Michael > > > > On Sat, Mar 8, 2014 at 9:38 PM, Michael Lawrence <michafla@gene.com> > wrote: > > One way would to be generate the GRanges for the sliding windows and use > findOverlaps to get the list of indices. > > Something like this: > > tiles <- unlist(tileGenome(seqinfo(snps), tilewidth=1L)) > > windows <- resize(tiles, 500L) # you will get a warning about trimming > > answer <- as.list(findOverlaps(windows, snps)) > > Good luck. I also like Martin's answer if all you want is e.g. a count. > > > > We might want to think about an argument to tileGenome or some mechanism > for generating a sliding tiling, in addition to the disjoint tiling. > > Michael > > > > > > > > On Sat, Mar 8, 2014 at 1:41 PM, Stefano Iantorno <si3@sanger.ac.uk> > wrote: > > Hello > > > > I am trying to conduct a sliding window analysis on a GRanges object. My > ranges are a list of 60272 single nucleotide positions representing high > confidence SNPs stored as IRanges object. I would like to retrieve the > list of GRanges row IDs for each 500bp window in the genome > (overlapping windows). > > > > All the documentation I could find on sliding window functions such as > runsum, runmean, etc are all for Rle objects. > > > > Any idea where to start from? I can't figure out a way to pick windows > in the IRanges object across intervals, since each interval is > represented by a start and end position (same genomic position since > it's a single nucleotide long). > > > > Any help will be greatly appreciated. > > > > Thanks > > > > - Stefano > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Vince Buffalo Ross-Ibarra Lab www.rilab.org) Plant Sciences, UC Davis [[alternative HTML version deleted]]
SNP IRanges SNP IRanges • 2.1k views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 2.4 years ago
United States
On Thu, Apr 17, 2014 at 10:27 AM, Vince S. Buffalo <vsbuffalo@gmail.com>wrote: > Sorry to return to this older topic, but I'm curious -- what's the > reasoning behind allocating tiles of 1L then using resize? > > tiles <- unlist(tileGenome(seqinfo(snps), tilewidth=1L)) > windows <- resize(tiles, 500L) # you will get a warning about trimming > > This is the way to generate sliding windows. The tileGenome generates a partitioning, i.e., non-overlapping windows. > Also, in general, why does tileGenome always return a list rather than a > GenomicRanges object? > > tileGenome can potentially generate multiple GRanges elements per tile, because by default tiles will cross between chromosomes in an effort to achieve constant tile width. Even when that feature is disabled, the result is still a GRangesList for consistency. Also, until recent changes in devel, it wasn't possible to lapply over a GRanges, which is a typical use case when tiling. > Vince > > > On Mon, Mar 10, 2014 at 8:27 AM, Stefano Iantorno <si3@sanger.ac.uk>wrote: > >> Thanks, that worked beautifully. I ended up doing the following: >> >> >> >> tileranges <- unlist(tileGenome(seqinfo(snps), tilewidth=500)) >> >> hits.df <- as.data.frame(findOverlaps(tileranges, snps)) >> >> >> >> I can then subset tileranges and snps with hits.df$queryHits or >> hits.df$subjectHits to retrieve all the information in the original >> Granges object. >> >> Although not overlapping sliding windows (these are more "bins") I think >> it might be good enough for my purposes. >> >> Best, >> >> >> >> - Stefano >> >> >> >> >> >> >> >> From: Michael Lawrence [mailto:lawrence.michael@gene.com] >> Sent: 09 March 2014 00:44 >> To: Stefano Iantorno >> Cc: bioconductor@r-project.org >> Subject: Re: [BioC] Help with sliding window analysis on GRanges object >> >> >> >> I just realized that this will not scale well for the whole genome. So >> you might just want to summarize with the Rle utilities or take 500bp >> around each SNP to form your windows. Depends on your goal. >> >> Michael >> >> >> >> On Sat, Mar 8, 2014 at 9:38 PM, Michael Lawrence <michafla@gene.com> >> wrote: >> >> One way would to be generate the GRanges for the sliding windows and use >> findOverlaps to get the list of indices. >> >> Something like this: >> >> tiles <- unlist(tileGenome(seqinfo(snps), tilewidth=1L)) >> >> windows <- resize(tiles, 500L) # you will get a warning about trimming >> >> answer <- as.list(findOverlaps(windows, snps)) >> >> Good luck. I also like Martin's answer if all you want is e.g. a count. >> >> >> >> We might want to think about an argument to tileGenome or some mechanism >> for generating a sliding tiling, in addition to the disjoint tiling. >> >> Michael >> >> >> >> >> >> >> >> On Sat, Mar 8, 2014 at 1:41 PM, Stefano Iantorno <si3@sanger.ac.uk> >> wrote: >> >> Hello >> >> >> >> I am trying to conduct a sliding window analysis on a GRanges object. My >> ranges are a list of 60272 single nucleotide positions representing high >> confidence SNPs stored as IRanges object. I would like to retrieve the >> list of GRanges row IDs for each 500bp window in the genome >> (overlapping windows). >> >> >> >> All the documentation I could find on sliding window functions such as >> runsum, runmean, etc are all for Rle objects. >> >> >> >> Any idea where to start from? I can't figure out a way to pick windows >> in the IRanges object across intervals, since each interval is >> represented by a start and end position (same genomic position since >> it's a single nucleotide long). >> >> >> >> Any help will be greatly appreciated. >> >> >> >> Thanks >> >> >> >> - Stefano >> >> >> >> >> -- >> The Wellcome Trust Sanger Institute is operated by Genome Research >> Limited, a charity registered in England with number 1021457 and a >> company registered in England with number 2742969, whose registered >> office is 215 Euston Road, London, NW1 2BE. >> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> >> >> >> >> >> >> -- >> The Wellcome Trust Sanger Institute is operated by Genome Research >> Limited, a charity registered in England with number 1021457 and a >> company registered in England with number 2742969, whose registered >> office is 215 Euston Road, London, NW1 2BE. >> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > > -- > Vince Buffalo > Ross-Ibarra Lab www.rilab.org) > Plant Sciences, UC Davis > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi Vince, Michael, On 04/17/2014 10:40 AM, Michael Lawrence wrote: > On Thu, Apr 17, 2014 at 10:27 AM, Vince S. Buffalo <vsbuffalo at="" gmail.com="">wrote: > >> Sorry to return to this older topic, but I'm curious -- what's the >> reasoning behind allocating tiles of 1L then using resize? >> >> tiles <- unlist(tileGenome(seqinfo(snps), tilewidth=1L)) >> windows <- resize(tiles, 500L) # you will get a warning about trimming >> >> > This is the way to generate sliding windows. The tileGenome generates a > partitioning, i.e., non-overlapping windows. > > >> Also, in general, why does tileGenome always return a list rather than a >> GenomicRanges object? >> >> > tileGenome can potentially generate multiple GRanges elements per tile, > because by default tiles will cross between chromosomes in an effort to > achieve constant tile width. Even when that feature is disabled, the result > is still a GRangesList for consistency. To disable that feature, use 'cut.last.tile.in.chrom=TRUE'. Then it returns a GRanges. See ?tileGenome Cheers, H. > > Also, until recent changes in devel, it wasn't possible to lapply over a > GRanges, which is a typical use case when tiling. > > >> Vince >> >> >> On Mon, Mar 10, 2014 at 8:27 AM, Stefano Iantorno <si3 at="" sanger.ac.uk="">wrote: >> >>> Thanks, that worked beautifully. I ended up doing the following: >>> >>> >>> >>> tileranges <- unlist(tileGenome(seqinfo(snps), tilewidth=500)) >>> >>> hits.df <- as.data.frame(findOverlaps(tileranges, snps)) >>> >>> >>> >>> I can then subset tileranges and snps with hits.df$queryHits or >>> hits.df$subjectHits to retrieve all the information in the original >>> Granges object. >>> >>> Although not overlapping sliding windows (these are more "bins") I think >>> it might be good enough for my purposes. >>> >>> Best, >>> >>> >>> >>> - Stefano >>> >>> >>> >>> >>> >>> >>> >>> From: Michael Lawrence [mailto:lawrence.michael at gene.com] >>> Sent: 09 March 2014 00:44 >>> To: Stefano Iantorno >>> Cc: bioconductor at r-project.org >>> Subject: Re: [BioC] Help with sliding window analysis on GRanges object >>> >>> >>> >>> I just realized that this will not scale well for the whole genome. So >>> you might just want to summarize with the Rle utilities or take 500bp >>> around each SNP to form your windows. Depends on your goal. >>> >>> Michael >>> >>> >>> >>> On Sat, Mar 8, 2014 at 9:38 PM, Michael Lawrence <michafla at="" gene.com=""> >>> wrote: >>> >>> One way would to be generate the GRanges for the sliding windows and use >>> findOverlaps to get the list of indices. >>> >>> Something like this: >>> >>> tiles <- unlist(tileGenome(seqinfo(snps), tilewidth=1L)) >>> >>> windows <- resize(tiles, 500L) # you will get a warning about trimming >>> >>> answer <- as.list(findOverlaps(windows, snps)) >>> >>> Good luck. I also like Martin's answer if all you want is e.g. a count. >>> >>> >>> >>> We might want to think about an argument to tileGenome or some mechanism >>> for generating a sliding tiling, in addition to the disjoint tiling. >>> >>> Michael >>> >>> >>> >>> >>> >>> >>> >>> On Sat, Mar 8, 2014 at 1:41 PM, Stefano Iantorno <si3 at="" sanger.ac.uk=""> >>> wrote: >>> >>> Hello >>> >>> >>> >>> I am trying to conduct a sliding window analysis on a GRanges object. My >>> ranges are a list of 60272 single nucleotide positions representing high >>> confidence SNPs stored as IRanges object. I would like to retrieve the >>> list of GRanges row IDs for each 500bp window in the genome >>> (overlapping windows). >>> >>> >>> >>> All the documentation I could find on sliding window functions such as >>> runsum, runmean, etc are all for Rle objects. >>> >>> >>> >>> Any idea where to start from? I can't figure out a way to pick windows >>> in the IRanges object across intervals, since each interval is >>> represented by a start and end position (same genomic position since >>> it's a single nucleotide long). >>> >>> >>> >>> Any help will be greatly appreciated. >>> >>> >>> >>> Thanks >>> >>> >>> >>> - Stefano >>> >>> >>> >>> >>> -- >>> The Wellcome Trust Sanger Institute is operated by Genome Research >>> Limited, a charity registered in England with number 1021457 and a >>> company registered in England with number 2742969, whose registered >>> office is 215 Euston Road, London, NW1 2BE. >>> >>> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> The Wellcome Trust Sanger Institute is operated by Genome Research >>> Limited, a charity registered in England with number 1021457 and a >>> company registered in England with number 2742969, whose registered >>> office is 215 Euston Road, London, NW1 2BE. >>> >>> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> >> >> -- >> Vince Buffalo >> Ross-Ibarra Lab www.rilab.org) >> Plant Sciences, UC Davis >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY
0
Entering edit mode
Ok, I had assumed that even disabling that feature would generate a GRangesList. On Thu, Apr 17, 2014 at 10:51 AM, Hervé Pagès <hpages@fhcrc.org> wrote: > Hi Vince, Michael, > > > On 04/17/2014 10:40 AM, Michael Lawrence wrote: > >> On Thu, Apr 17, 2014 at 10:27 AM, Vince S. Buffalo <vsbuffalo@gmail.com>> >wrote: >> >> Sorry to return to this older topic, but I'm curious -- what's the >>> >>> reasoning behind allocating tiles of 1L then using resize? >>> >>> tiles <- unlist(tileGenome(seqinfo(snps), tilewidth=1L)) >>> windows <- resize(tiles, 500L) # you will get a warning about trimming >>> >>> >>> This is the way to generate sliding windows. The tileGenome generates a >> partitioning, i.e., non-overlapping windows. >> >> >> Also, in general, why does tileGenome always return a list rather than a >>> GenomicRanges object? >>> >>> >>> tileGenome can potentially generate multiple GRanges elements per tile, >> because by default tiles will cross between chromosomes in an effort to >> achieve constant tile width. Even when that feature is disabled, the >> result >> is still a GRangesList for consistency. >> > > To disable that feature, use 'cut.last.tile.in.chrom=TRUE'. Then it > returns a GRanges. See ?tileGenome > > Cheers, > H. > > > >> Also, until recent changes in devel, it wasn't possible to lapply over a >> GRanges, which is a typical use case when tiling. >> >> >> Vince >>> >>> >>> On Mon, Mar 10, 2014 at 8:27 AM, Stefano Iantorno <si3@sanger.ac.uk>>> >wrote: >>> >>> Thanks, that worked beautifully. I ended up doing the following: >>>> >>>> >>>> >>>> tileranges <- unlist(tileGenome(seqinfo(snps), tilewidth=500)) >>>> >>>> hits.df <- as.data.frame(findOverlaps(tileranges, snps)) >>>> >>>> >>>> >>>> I can then subset tileranges and snps with hits.df$queryHits or >>>> hits.df$subjectHits to retrieve all the information in the original >>>> Granges object. >>>> >>>> Although not overlapping sliding windows (these are more "bins") I think >>>> it might be good enough for my purposes. >>>> >>>> Best, >>>> >>>> >>>> >>>> - Stefano >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> From: Michael Lawrence [mailto:lawrence.michael@gene.com] >>>> Sent: 09 March 2014 00:44 >>>> To: Stefano Iantorno >>>> Cc: bioconductor@r-project.org >>>> Subject: Re: [BioC] Help with sliding window analysis on GRanges object >>>> >>>> >>>> >>>> I just realized that this will not scale well for the whole genome. So >>>> you might just want to summarize with the Rle utilities or take 500bp >>>> around each SNP to form your windows. Depends on your goal. >>>> >>>> Michael >>>> >>>> >>>> >>>> On Sat, Mar 8, 2014 at 9:38 PM, Michael Lawrence <michafla@gene.com> >>>> wrote: >>>> >>>> One way would to be generate the GRanges for the sliding windows and use >>>> findOverlaps to get the list of indices. >>>> >>>> Something like this: >>>> >>>> tiles <- unlist(tileGenome(seqinfo(snps), tilewidth=1L)) >>>> >>>> windows <- resize(tiles, 500L) # you will get a warning about trimming >>>> >>>> answer <- as.list(findOverlaps(windows, snps)) >>>> >>>> Good luck. I also like Martin's answer if all you want is e.g. a count. >>>> >>>> >>>> >>>> We might want to think about an argument to tileGenome or some mechanism >>>> for generating a sliding tiling, in addition to the disjoint tiling. >>>> >>>> Michael >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Sat, Mar 8, 2014 at 1:41 PM, Stefano Iantorno <si3@sanger.ac.uk> >>>> wrote: >>>> >>>> Hello >>>> >>>> >>>> >>>> I am trying to conduct a sliding window analysis on a GRanges object. My >>>> ranges are a list of 60272 single nucleotide positions representing high >>>> confidence SNPs stored as IRanges object. I would like to retrieve the >>>> list of GRanges row IDs for each 500bp window in the genome >>>> (overlapping windows). >>>> >>>> >>>> >>>> All the documentation I could find on sliding window functions such as >>>> runsum, runmean, etc are all for Rle objects. >>>> >>>> >>>> >>>> Any idea where to start from? I can't figure out a way to pick windows >>>> in the IRanges object across intervals, since each interval is >>>> represented by a start and end position (same genomic position since >>>> it's a single nucleotide long). >>>> >>>> >>>> >>>> Any help will be greatly appreciated. >>>> >>>> >>>> >>>> Thanks >>>> >>>> >>>> >>>> - Stefano >>>> >>>> >>>> >>>> >>>> -- >>>> The Wellcome Trust Sanger Institute is operated by Genome Research >>>> Limited, a charity registered in England with number 1021457 and a >>>> company registered in England with number 2742969, whose registered >>>> office is 215 Euston Road, London, NW1 2BE. >>>> >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor@r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> The Wellcome Trust Sanger Institute is operated by Genome Research >>>> Limited, a charity registered in England with number 1021457 and a >>>> company registered in England with number 2742969, whose registered >>>> office is 215 Euston Road, London, NW1 2BE. >>>> >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor@r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>> >>> >>> -- >>> Vince Buffalo >>> Ross-Ibarra Lab www.rilab.org) >>> Plant Sciences, UC Davis >>> >>> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane. >> science.biology.informatics.conductor >> >> > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
On Thu, Apr 17, 2014 at 11:00 AM, Michael Lawrence < lawrence.michael@gene.com> wrote: > Ok, I had assumed that even disabling that feature would generate a > GRangesList. > > > On Thu, Apr 17, 2014 at 10:51 AM, Hervé Pagès <hpages@fhcrc.org> wrote: > >> Hi Vince, Michael, >> >> >> On 04/17/2014 10:40 AM, Michael Lawrence wrote: >> >>> On Thu, Apr 17, 2014 at 10:27 AM, Vince S. Buffalo <vsbuffalo@gmail.com>>> >wrote: >>> >>> Sorry to return to this older topic, but I'm curious -- what's the >>>> >>>> reasoning behind allocating tiles of 1L then using resize? >>>> >>>> tiles <- unlist(tileGenome(seqinfo(snps), tilewidth=1L)) >>>> windows <- resize(tiles, 500L) # you will get a warning about trimming >>>> >>>> >>>> This is the way to generate sliding windows. The tileGenome generates a >>> partitioning, i.e., non-overlapping windows. >>> >> Oh, *very* clever! > >>> >>> Also, in general, why does tileGenome always return a list rather than a >>>> GenomicRanges object? >>>> >>>> >>>> tileGenome can potentially generate multiple GRanges elements per tile, >>> because by default tiles will cross between chromosomes in an effort to >>> achieve constant tile width. Even when that feature is disabled, the >>> result >>> is still a GRangesList for consistency. >>> >> >> To disable that feature, use 'cut.last.tile.in.chrom=TRUE'. Then it >> returns a GRanges. See ?tileGenome >> > Right, this makes sense too. Thanks Hervé and Michael! -- Vince Buffalo Ross-Ibarra Lab www.rilab.org) Plant Sciences, UC Davis [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 926 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6