Lapply for two GRanges objects?
1
0
Entering edit mode
wrighth ▴ 260
@wrighth-3452
Last seen 9.6 years ago
Hello; I apologize if this is an obvious question or more suited for the general R list, but I have not been able to find a good solution with the Google: it is possible to use lapply or relatives to speed up overlapping of multiple GRanges objects? Specifically, I'm getting methylation values from bisulfite sequencing CpGs over specific windows in the genome and I need to calculate means across the windows, so for right now what I have been doing is basically: for(i <= length(methyl)) methlymean <- mean(subsetByOverlaps(methyl, windows[i]); but this is fairly slow. I tried something like: m <- lapply(windows, methylmeans(methyl, windows) and defining: methylmeans <- function(methyl, windows) { return(mean(subsetByOverlaps(methyl, windows)@elementMetadata$methyl)); } but this doesn't work. Mapply doesn't work since the window and methyl sizes aren't the same. Any thoughts? Is there anything inbuilt into GRanges for this kind of case? Hollis Wright, PhD Knight Cancer Center Oregon Health and Science University
Sequencing Cancer Sequencing Cancer • 1.2k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 6 weeks ago
United States
On 03/28/2011 02:41 PM, Hollis Wright wrote: > Hello; I apologize if this is an obvious question or more suited for > the general R list, but I have not been able to find a good solution > with the Google: it is possible to use lapply or relatives to speed > up overlapping of multiple GRanges objects? Specifically, I'm getting > methylation values from bisulfite sequencing CpGs over specific > windows in the genome and I need to calculate means across the > windows, so for right now what I have been doing is basically: > for(i<= length(methyl)) > methlymean<- mean(subsetByOverlaps(methyl, windows[i]); > > but this is fairly slow. I tried something like: > > > m<- lapply(windows, methylmeans(methyl, windows) > > and defining: > > > methylmeans<- function(methyl, windows) > { > return(mean(subsetByOverlaps(methyl, windows)@elementMetadata$methyl)); > } > Hi Hollis -- maybe along the lines of (over all ranges in methyl and windows) olaps <- findOverlaps(methyl, windows) v <- elementMetadata(methyl)[queryHits(olaps), "methyl"] i <- subjectHits(olaps) means <- numeric(length(windows)) means[unique(i)] <- sapply(split(v, i), mean) elementMetadata(windows)[["MethylMeans"]] <- means ? Martin > but this doesn't work. Mapply doesn't work since the window and > methyl sizes aren't the same. Any thoughts? Is there anything inbuilt into GRanges for this kind of case? > > Hollis Wright, PhD > Knight Cancer Center > Oregon Health and Science University > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
ADD COMMENT
0
Entering edit mode
On Mon, Mar 28, 2011 at 5:08 PM, Martin Morgan <mtmorgan@fhcrc.org> wrote: > On 03/28/2011 02:41 PM, Hollis Wright wrote: > >> Hello; I apologize if this is an obvious question or more suited for >> the general R list, but I have not been able to find a good solution >> with the Google: it is possible to use lapply or relatives to speed >> up overlapping of multiple GRanges objects? Specifically, I'm getting >> methylation values from bisulfite sequencing CpGs over specific >> windows in the genome and I need to calculate means across the >> windows, so for right now what I have been doing is basically: >> for(i<= length(methyl)) >> methlymean<- mean(subsetByOverlaps(methyl, windows[i]); >> >> but this is fairly slow. I tried something like: >> >> >> m<- lapply(windows, methylmeans(methyl, windows) >> >> and defining: >> >> >> methylmeans<- function(methyl, windows) >> { >> return(mean(subsetByOverlaps(methyl, >> windows)@elementMetadata$methyl)); >> } >> >> > Hi Hollis -- maybe along the lines of (over all ranges in methyl and > windows) > > olaps <- findOverlaps(methyl, windows) > v <- elementMetadata(methyl)[queryHits(olaps), "methyl"] > i <- subjectHits(olaps) > > means <- numeric(length(windows)) > means[unique(i)] <- sapply(split(v, i), mean) > > elementMetadata(windows)[["MethylMeans"]] <- means > > ? > > Fast way: olaps <- findOverlaps(windows, methyl) # now we know that queryHits are sorted rle <- Rle(queryHits(olaps)) part <- PartitioningByWidth(width(rle)) means <- numeric(length(windows)) means[runValue(rle)] <- viewMeans(Views(Rle(values(methyl)$methyl[subjectHits(olaps)]), part)) Haven't tested that or anything but it should get you in the right direction. And it's pretty instructive about IRanges. The PartitioningByWidth trick with Views is a good one in general for basic statistics over ragged arrays. Been working on getting that to work automatically over compressed lists. Michael Martin > > > but this doesn't work. Mapply doesn't work since the window and >> methyl >> > sizes aren't the same. Any thoughts? Is there anything inbuilt into > GRanges for this kind of case? > >> >> Hollis Wright, PhD >> Knight Cancer Center >> Oregon Health and Science University >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793 > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Thanks, Martin; this looks like it works and is much, much faster. I think something like this might speed up some old code of mine, too, I'll have to go back and give it a try. Hollis Wright, PhD Knight Cancer Center Oregon Health and Science University ________________________________________ From: Martin Morgan [mtmorgan@fhcrc.org] Sent: Monday, March 28, 2011 5:08 PM To: Hollis Wright Cc: bioconductor at r-project.org Subject: Re: [BioC] Lapply for two GRanges objects? On 03/28/2011 02:41 PM, Hollis Wright wrote: > Hello; I apologize if this is an obvious question or more suited for > the general R list, but I have not been able to find a good solution > with the Google: it is possible to use lapply or relatives to speed > up overlapping of multiple GRanges objects? Specifically, I'm getting > methylation values from bisulfite sequencing CpGs over specific > windows in the genome and I need to calculate means across the > windows, so for right now what I have been doing is basically: > for(i<= length(methyl)) > methlymean<- mean(subsetByOverlaps(methyl, windows[i]); > > but this is fairly slow. I tried something like: > > > m<- lapply(windows, methylmeans(methyl, windows) > > and defining: > > > methylmeans<- function(methyl, windows) > { > return(mean(subsetByOverlaps(methyl, windows)@elementMetadata$methyl)); > } > Hi Hollis -- maybe along the lines of (over all ranges in methyl and windows) olaps <- findOverlaps(methyl, windows) v <- elementMetadata(methyl)[queryHits(olaps), "methyl"] i <- subjectHits(olaps) means <- numeric(length(windows)) means[unique(i)] <- sapply(split(v, i), mean) elementMetadata(windows)[["MethylMeans"]] <- means ? Martin > but this doesn't work. Mapply doesn't work since the window and > methyl sizes aren't the same. Any thoughts? Is there anything inbuilt into GRanges for this kind of case? > > Hollis Wright, PhD > Knight Cancer Center > Oregon Health and Science University > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
ADD REPLY

Login before adding your answer.

Traffic: 846 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6