Help with sliding window analysis on GRanges object
3
1
Entering edit mode
@stefano-iantorno-6441
Last seen 9.6 years ago
Hello I am trying to conduct a sliding window analysis on a GRanges object. My ranges are a list of 60272 single nucleotide positions representing high confidence SNPs stored as IRanges object. I would like to retrieve the list of GRanges row IDs for each 500bp window in the genome (overlapping windows). All the documentation I could find on sliding window functions such as runsum, runmean, etc are all for Rle objects. Any idea where to start from? I can't figure out a way to pick windows in the IRanges object across intervals, since each interval is represented by a start and end position (same genomic position since it's a single nucleotide long). Any help will be greatly appreciated. Thanks - Stefano -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. [[alternative HTML version deleted]]
IRanges IRanges • 4.2k views
ADD COMMENT
1
Entering edit mode
@martin-morgan-1513
Last seen 3 days ago
United States
Hi Stefano -- On 03/08/2014 01:41 PM, Stefano Iantorno wrote: > Hello > > > > I am trying to conduct a sliding window analysis on a GRanges object. My > ranges are a list of 60272 single nucleotide positions representing high > confidence SNPs stored as IRanges object. I would like to retrieve the > list of GRanges row IDs for each 500bp window in the genome > (overlapping windows). > > > > All the documentation I could find on sliding window functions such as > runsum, runmean, etc are all for Rle objects. > > > > Any idea where to start from? I can't figure out a way to pick windows > in the IRanges object across intervals, since each interval is > represented by a start and end position (same genomic position since > it's a single nucleotide long). > > It's hard for me to figure out what you want to do? I guess you've got some SNPs snps = GRanges("chr1", IRanges(c(1000, 1200, 2000), width=1)) and you'd like to count the number of SNPs in a sliding window of width 500? You can easily represent your SNPs as an Rle instead of a GRanges > (cvg <- coverage(snps)) RleList of length 1 $chr1 integer-Rle of length 2000 with 6 runs Lengths: 999 1 199 1 799 1 Values : 0 1 0 1 0 1 and then calculate a running mean (or runsum) as I guess you've found > (r <- runmean(cvg, 500)) RleList of length 1 $chr1 numeric-Rle of length 1501 with 6 runs Lengths: 500 200 300 200 300 1 Values : 0 0.002 0.004 0.002 0 0.002 The toy example could be visualized as plot(as.numeric(r[[1]]), type="l") or perhaps ggbio::autoplot(r). Is that something like what you're looking for? Or you'd like to take this a step further? Maybe you can construct a simple example like the one here to show what you're trying to do? Martin > > Any help will be greatly appreciated. > > > > Thanks > > > > - Stefano > > > > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 2.4 years ago
United States
One way would to be generate the GRanges for the sliding windows and use findOverlaps to get the list of indices. Something like this: tiles <- unlist(tileGenome(seqinfo(snps), tilewidth=1L)) windows <- resize(tiles, 500L) # you will get a warning about trimming answer <- as.list(findOverlaps(windows, snps)) Good luck. I also like Martin's answer if all you want is e.g. a count. We might want to think about an argument to tileGenome or some mechanism for generating a sliding tiling, in addition to the disjoint tiling. Michael On Sat, Mar 8, 2014 at 1:41 PM, Stefano Iantorno <si3@sanger.ac.uk> wrote: > Hello > > > > I am trying to conduct a sliding window analysis on a GRanges object. My > ranges are a list of 60272 single nucleotide positions representing high > confidence SNPs stored as IRanges object. I would like to retrieve the > list of GRanges row IDs for each 500bp window in the genome > (overlapping windows). > > > > All the documentation I could find on sliding window functions such as > runsum, runmean, etc are all for Rle objects. > > > > Any idea where to start from? I can't figure out a way to pick windows > in the IRanges object across intervals, since each interval is > represented by a start and end position (same genomic position since > it's a single nucleotide long). > > > > Any help will be greatly appreciated. > > > > Thanks > > > > - Stefano > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
I just realized that this will not scale well for the whole genome. So you might just want to summarize with the Rle utilities or take 500bp around each SNP to form your windows. Depends on your goal. Michael On Sat, Mar 8, 2014 at 9:38 PM, Michael Lawrence <michafla@gene.com> wrote: > One way would to be generate the GRanges for the sliding windows and use > findOverlaps to get the list of indices. > > Something like this: > tiles <- unlist(tileGenome(seqinfo(snps), tilewidth=1L)) > windows <- resize(tiles, 500L) # you will get a warning about trimming > answer <- as.list(findOverlaps(windows, snps)) > > Good luck. I also like Martin's answer if all you want is e.g. a count. > > We might want to think about an argument to tileGenome or some mechanism > for generating a sliding tiling, in addition to the disjoint tiling. > > Michael > > > > > > On Sat, Mar 8, 2014 at 1:41 PM, Stefano Iantorno <si3@sanger.ac.uk> wrote: > >> Hello >> >> >> >> I am trying to conduct a sliding window analysis on a GRanges object. My >> ranges are a list of 60272 single nucleotide positions representing high >> confidence SNPs stored as IRanges object. I would like to retrieve the >> list of GRanges row IDs for each 500bp window in the genome >> (overlapping windows). >> >> >> >> All the documentation I could find on sliding window functions such as >> runsum, runmean, etc are all for Rle objects. >> >> >> >> Any idea where to start from? I can't figure out a way to pick windows >> in the IRanges object across intervals, since each interval is >> represented by a start and end position (same genomic position since >> it's a single nucleotide long). >> >> >> >> Any help will be greatly appreciated. >> >> >> >> Thanks >> >> >> >> - Stefano >> >> >> >> >> -- >> The Wellcome Trust Sanger Institute is operated by Genome Research >> Limited, a charity registered in England with number 1021457 and a >> company registered in England with number 2742969, whose registered >> office is 215 Euston Road, London, NW1 2BE. >> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
@stefano-iantorno-6441
Last seen 9.6 years ago
Thanks, that worked beautifully. I ended up doing the following: tileranges <- unlist(tileGenome(seqinfo(snps), tilewidth=500)) hits.df <- as.data.frame(findOverlaps(tileranges, snps)) I can then subset tileranges and snps with hits.df$queryHits or hits.df$subjectHits to retrieve all the information in the original Granges object. Although not overlapping sliding windows (these are more "bins") I think it might be good enough for my purposes. Best, - Stefano From: Michael Lawrence [mailto:lawrence.michael@gene.com] Sent: 09 March 2014 00:44 To: Stefano Iantorno Cc: bioconductor@r-project.org Subject: Re: [BioC] Help with sliding window analysis on GRanges object I just realized that this will not scale well for the whole genome. So you might just want to summarize with the Rle utilities or take 500bp around each SNP to form your windows. Depends on your goal. Michael On Sat, Mar 8, 2014 at 9:38 PM, Michael Lawrence <michafla@gene.com> wrote: One way would to be generate the GRanges for the sliding windows and use findOverlaps to get the list of indices. Something like this: tiles <- unlist(tileGenome(seqinfo(snps), tilewidth=1L)) windows <- resize(tiles, 500L) # you will get a warning about trimming answer <- as.list(findOverlaps(windows, snps)) Good luck. I also like Martin's answer if all you want is e.g. a count. We might want to think about an argument to tileGenome or some mechanism for generating a sliding tiling, in addition to the disjoint tiling. Michael On Sat, Mar 8, 2014 at 1:41 PM, Stefano Iantorno <si3@sanger.ac.uk> wrote: Hello I am trying to conduct a sliding window analysis on a GRanges object. My ranges are a list of 60272 single nucleotide positions representing high confidence SNPs stored as IRanges object. I would like to retrieve the list of GRanges row IDs for each 500bp window in the genome (overlapping windows). All the documentation I could find on sliding window functions such as runsum, runmean, etc are all for Rle objects. Any idea where to start from? I can't figure out a way to pick windows in the IRanges object across intervals, since each interval is represented by a start and end position (same genomic position since it's a single nucleotide long). Any help will be greatly appreciated. Thanks - Stefano -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 687 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6