loop over IRanges spaces
2
0
Entering edit mode
Yvan ▴ 60
@yvan-4034
Last seen 10.2 years ago
Hello List, I am confused about how to loop over a rangedData object. I have this rangedData RangedData with 61 rows and 1 value column across 3 spaces space ranges | score <character> <iranges> | <numeric> 1 SCAFFOLD_1 [ 8, 8] | -0.09405 and the spaces are "SCAFFOLD_1" "SCAFFOLD_10" "SCAFFOLD_100" using aggregate it is possible to apply a function to one of the space aggregate(rd["SCAFFOLD_1"]$score, start = 1:(length(rd["SCAFFOLD_1"]$score)-w+1), width = w, FUN = sum) but how can I apply the aggregate to all space without a for loop ? And it is possible to add a column sum "on the fly" or should I create a vector "sum" containing values for all spaces and the recreate a rangedData object with this extra column? Thanks for your help and patience yvan
• 1.9k views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States
On Wed, Apr 21, 2010 at 6:07 AM, Yvan <yvan.strahm@uni.no> wrote: > Hello List, > > I am confused about how to loop over a rangedData object. > I have this rangedData > > RangedData with 61 rows and 1 value column across 3 spaces > space ranges | score > <character> <iranges> | <numeric> > 1 SCAFFOLD_1 [ 8, 8] | -0.09405 > > and the spaces are > > "SCAFFOLD_1" "SCAFFOLD_10" "SCAFFOLD_100" > > using aggregate it is possible to apply a function to one of the space > > aggregate(rd["SCAFFOLD_1"]$score, start = > 1:(length(rd["SCAFFOLD_1"]$score)-w+1), width = w, FUN = sum) > > but how can I apply the aggregate to all space without a for loop ? > > It looks like you're attempting a running window sum of the score vector. There are more efficient ways of doing this besides aggregate(). If you convert the score into an Rle, you can use runsum(). Anyway, to do this over each space individually, use lapply(). This would come out to something like: values(rd)[,"smoothScore"] <- lapply(rd, function(x) runsum(Rle(x$score), w)) Probably not exactly right, but it gets you in the right direction... Michael > And it is possible to add a column sum "on the fly" or should I create a > vector "sum" containing values for all spaces and the recreate a rangedData > object with this extra column? > > > Thanks for your help and patience > yvan > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
On 21/04/10 18:43, Michael Lawrence wrote: > > > On Wed, Apr 21, 2010 at 6:07 AM, Yvan <yvan.strahm@uni.no> <mailto:yvan.strahm@uni.no>> wrote: > > Hello List, > > I am confused about how to loop over a rangedData object. > I have this rangedData > > RangedData with 61 rows and 1 value column across 3 spaces > space ranges | score > <character> <iranges> | <numeric> > 1 SCAFFOLD_1 [ 8, 8] | -0.09405 > > and the spaces are > > "SCAFFOLD_1" "SCAFFOLD_10" "SCAFFOLD_100" > > using aggregate it is possible to apply a function to one of the space > > aggregate(rd["SCAFFOLD_1"]$score, start = > 1:(length(rd["SCAFFOLD_1"]$score)-w+1), width = w, FUN = sum) > > but how can I apply the aggregate to all space without a for loop ? > > > It looks like you're attempting a running window sum of the score > vector. There are more efficient ways of doing this besides > aggregate(). If you convert the score into an Rle, you can use runsum(). > > Anyway, to do this over each space individually, use lapply(). > > This would come out to something like: > > values(rd)[,"smoothScore"] <- lapply(rd, function(x) > runsum(Rle(x$score), w)) > > Probably not exactly right, but it gets you in the right direction... > > Michael > > Hello Michael, Thanks for the answer and the tip about runsum! I try with lapply but could not get it working right, the main problem is that the runsum is calculated on all values and not for a each specific spaces. Sorry, I should have been more precise in the problem description. The runsum should be calculated in a space specific manner, let say w=2 space score cumsum 1 space1 1 3 2 space1 2 4 3 space1 2 NA 4 space2 10 21 5 space2 11 22 6 space2 11 NA Is it possible to do it with lapply? Thanks again for your help cheers, yvan [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi, how about funtion rdapply (not lapply) which is for that? The code below computes the sum score for each space in the RangedData: # taken from the examples mostly: > ranges <- IRanges(c(1,2,3),c(4,5,6)) > score <- c(2L, 0L, 1L) > rd <- RangedData(ranges, score, space = c("chr1","chr2","chr1")) > rd RangedData with 3 rows and 1 value column across 2 spaces space ranges | score <character> <iranges> | <integer> 1 chr1 [1, 4] | 2 2 chr1 [3, 6] | 1 3 chr2 [2, 5] | 0 > params <- RDApplyParams(rd, function(rd) sum(score(rd))) > rdapply(params) $chr1 [1] 3 $chr2 [1] 0 Cheers Michael Am Apr 22, 2010 um 1:57 PM schrieb Yvan: > On 21/04/10 18:43, Michael Lawrence wrote: >> >> >> On Wed, Apr 21, 2010 at 6:07 AM, Yvan <yvan.strahm at="" uni.no="">> <mailto:yvan.strahm at="" uni.no="">> wrote: >> >> Hello List, >> >> I am confused about how to loop over a rangedData object. >> I have this rangedData >> >> RangedData with 61 rows and 1 value column across 3 spaces >> space ranges | score >> <character> <iranges> | <numeric> >> 1 SCAFFOLD_1 [ 8, 8] | -0.09405 >> >> and the spaces are >> >> "SCAFFOLD_1" "SCAFFOLD_10" "SCAFFOLD_100" >> >> using aggregate it is possible to apply a function to one of the space >> >> aggregate(rd["SCAFFOLD_1"]$score, start = >> 1:(length(rd["SCAFFOLD_1"]$score)-w+1), width = w, FUN = sum) >> >> but how can I apply the aggregate to all space without a for loop ? >> >> >> It looks like you're attempting a running window sum of the score >> vector. There are more efficient ways of doing this besides >> aggregate(). If you convert the score into an Rle, you can use runsum(). >> >> Anyway, to do this over each space individually, use lapply(). >> >> This would come out to something like: >> >> values(rd)[,"smoothScore"] <- lapply(rd, function(x) >> runsum(Rle(x$score), w)) >> >> Probably not exactly right, but it gets you in the right direction... >> >> Michael >> >> > Hello Michael, > > Thanks for the answer and the tip about runsum! > I try with lapply but could not get it working right, the main problem > is that the runsum is calculated on all values and not for a each > specific spaces. > Sorry, I should have been more precise in the problem description. > The runsum should be calculated in a space specific manner, let say w=2 > > space score cumsum > 1 space1 1 3 > 2 space1 2 4 > 3 space1 2 NA > 4 space2 10 21 > 5 space2 11 22 > 6 space2 11 NA > > Is it possible to do it with lapply? > Thanks again for your help > cheers, > yvan > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
On Thu, Apr 22, 2010 at 5:49 AM, Michael Dondrup <michael.dondrup@uni.no>wrote: > Hi, > how about funtion rdapply (not lapply) which is for that? > > lapply() should apply per-space as well, basically providing a short- cut for the more complicated rdapply(). > lapply(rd, function(x) sum(x$score)) $chr1 [1] 3 $chr2 [1] 0 sapply() also works: > sapply(rd, function(x) sum(x$score)) chr1 chr2 3 0 Another choice is tapply: > tapply(rd$score, space(rd), sum) chr1 chr2 3 0 Michael The code below computes the sum score for each space in the RangedData: > # taken from the examples mostly: > > ranges <- IRanges(c(1,2,3),c(4,5,6)) > > score <- c(2L, 0L, 1L) > > rd <- RangedData(ranges, score, space = c("chr1","chr2","chr1")) > > rd > RangedData with 3 rows and 1 value column across 2 spaces > space ranges | score > <character> <iranges> | <integer> > 1 chr1 [1, 4] | 2 > 2 chr1 [3, 6] | 1 > 3 chr2 [2, 5] | 0 > > params <- RDApplyParams(rd, function(rd) sum(score(rd))) > > rdapply(params) > $chr1 > [1] 3 > > $chr2 > [1] 0 > > > Cheers > Michael > > Am Apr 22, 2010 um 1:57 PM schrieb Yvan: > > > On 21/04/10 18:43, Michael Lawrence wrote: > >> > >> > >> On Wed, Apr 21, 2010 at 6:07 AM, Yvan <yvan.strahm@uni.no> >> <mailto:yvan.strahm@uni.no>> wrote: > >> > >> Hello List, > >> > >> I am confused about how to loop over a rangedData object. > >> I have this rangedData > >> > >> RangedData with 61 rows and 1 value column across 3 spaces > >> space ranges | score > >> <character> <iranges> | <numeric> > >> 1 SCAFFOLD_1 [ 8, 8] | -0.09405 > >> > >> and the spaces are > >> > >> "SCAFFOLD_1" "SCAFFOLD_10" "SCAFFOLD_100" > >> > >> using aggregate it is possible to apply a function to one of the > space > >> > >> aggregate(rd["SCAFFOLD_1"]$score, start = > >> 1:(length(rd["SCAFFOLD_1"]$score)-w+1), width = w, FUN = sum) > >> > >> but how can I apply the aggregate to all space without a for loop ? > >> > >> > >> It looks like you're attempting a running window sum of the score > >> vector. There are more efficient ways of doing this besides > >> aggregate(). If you convert the score into an Rle, you can use runsum(). > >> > >> Anyway, to do this over each space individually, use lapply(). > >> > >> This would come out to something like: > >> > >> values(rd)[,"smoothScore"] <- lapply(rd, function(x) > >> runsum(Rle(x$score), w)) > >> > >> Probably not exactly right, but it gets you in the right direction... > >> > >> Michael > >> > >> > > Hello Michael, > > > > Thanks for the answer and the tip about runsum! > > I try with lapply but could not get it working right, the main problem > > is that the runsum is calculated on all values and not for a each > > specific spaces. > > Sorry, I should have been more precise in the problem description. > > The runsum should be calculated in a space specific manner, let say w=2 > > > > space score cumsum > > 1 space1 1 3 > > 2 space1 2 4 > > 3 space1 2 NA > > 4 space2 10 21 > > 5 space2 11 22 > > 6 space2 11 NA > > > > Is it possible to do it with lapply? > > Thanks again for your help > > cheers, > > yvan > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hello, Thank you both of you. I could could calculate the sliding window, but not as a Rle object, could not append values for the last w-1 position in the Rle object in order to take care of the size problem. So I did it like that: params <-RDApplyParams(rd,function(rd) append((diff(c(0,cumsum(rd$score)),lag=w)/w),rep(0,each=w-1),after=(le ngth(rd$score)-w+1))) But when I try to add the new values to the rangedData object I got these error. > values(rd)[,"windo"]<-rdapply(params) Error in `[<-`(`*tmp*`, , j, value = <s4 object="" of="" class="" "dataframe"="">) : ncol(x[j]) != ncol(value) In addition: Warning messages: 1: In mapply(f, ..., SIMPLIFY = FALSE) : longer argument not a multiple of length of shorter 2: In mapply(f, ..., SIMPLIFY = FALSE) : longer argument not a multiple of length of shorter But when I check the size, they are the same, here for one space > x<-rdapply(params) > length(x$SCAFFOLD_100) == length(rd["SCAFFOLD_100"]$windo) [1] TRUE Maybe params miss a parameter or the way I try to update the rd object is wrong. Anyway form the rdapply output a vector could be created and so a new rd object with the new value column. yvan On 22/04/10 15:51, Michael Lawrence wrote: > > > On Thu, Apr 22, 2010 at 5:49 AM, Michael Dondrup > <michael.dondrup@uni.no <mailto:michael.dondrup@uni.no="">> wrote: > > Hi, > how about funtion rdapply (not lapply) which is for that? > > > lapply() should apply per-space as well, basically providing a > short-cut for the more complicated rdapply(). > > > lapply(rd, function(x) sum(x$score)) > $chr1 > [1] 3 > > $chr2 > [1] 0 > > sapply() also works: > > sapply(rd, function(x) sum(x$score)) > chr1 chr2 > 3 0 > > Another choice is tapply: > > tapply(rd$score, space(rd), sum) > chr1 chr2 > 3 0 > > Michael > > The code below computes the sum score for each space in the > RangedData: > # taken from the examples mostly: > > ranges <- IRanges(c(1,2,3),c(4,5,6)) > > score <- c(2L, 0L, 1L) > > rd <- RangedData(ranges, score, space = c("chr1","chr2","chr1")) > > rd > RangedData with 3 rows and 1 value column across 2 spaces > space ranges | score > <character> <iranges> | <integer> > 1 chr1 [1, 4] | 2 > 2 chr1 [3, 6] | 1 > 3 chr2 [2, 5] | 0 > > params <- RDApplyParams(rd, function(rd) sum(score(rd))) > > rdapply(params) > $chr1 > [1] 3 > > $chr2 > [1] 0 > > > Cheers > Michael > > Am Apr 22, 2010 um 1:57 PM schrieb Yvan: > > > On 21/04/10 18:43, Michael Lawrence wrote: > >> > >> > >> On Wed, Apr 21, 2010 at 6:07 AM, Yvan <yvan.strahm@uni.no> <mailto:yvan.strahm@uni.no> > >> <mailto:yvan.strahm@uni.no <mailto:yvan.strahm@uni.no="">>> wrote: > >> > >> Hello List, > >> > >> I am confused about how to loop over a rangedData object. > >> I have this rangedData > >> > >> RangedData with 61 rows and 1 value column across 3 spaces > >> space ranges | score > >> <character> <iranges> | <numeric> > >> 1 SCAFFOLD_1 [ 8, 8] | -0.09405 > >> > >> and the spaces are > >> > >> "SCAFFOLD_1" "SCAFFOLD_10" "SCAFFOLD_100" > >> > >> using aggregate it is possible to apply a function to one of > the space > >> > >> aggregate(rd["SCAFFOLD_1"]$score, start = > >> 1:(length(rd["SCAFFOLD_1"]$score)-w+1), width = w, FUN = sum) > >> > >> but how can I apply the aggregate to all space without a for > loop ? > >> > >> > >> It looks like you're attempting a running window sum of the score > >> vector. There are more efficient ways of doing this besides > >> aggregate(). If you convert the score into an Rle, you can use > runsum(). > >> > >> Anyway, to do this over each space individually, use lapply(). > >> > >> This would come out to something like: > >> > >> values(rd)[,"smoothScore"] <- lapply(rd, function(x) > >> runsum(Rle(x$score), w)) > >> > >> Probably not exactly right, but it gets you in the right > direction... > >> > >> Michael > >> > >> > > Hello Michael, > > > > Thanks for the answer and the tip about runsum! > > I try with lapply but could not get it working right, the main > problem > > is that the runsum is calculated on all values and not for a each > > specific spaces. > > Sorry, I should have been more precise in the problem description. > > The runsum should be calculated in a space specific manner, let > say w=2 > > > > space score cumsum > > 1 space1 1 3 > > 2 space1 2 4 > > 3 space1 2 NA > > 4 space2 10 21 > > 5 space2 11 22 > > 6 space2 11 NA > > > > Is it possible to do it with lapply? > > Thanks again for your help > > cheers, > > yvan > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > <mailto:bioconductor@stat.math.ethz.ch> > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch <mailto:bioconductor@stat.math.ethz.ch> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
On Fri, Apr 23, 2010 at 1:49 AM, Yvan <yvan.strahm@uni.no> wrote: > Hello, > Thank you both of you. > > I could could calculate the sliding window, but not as a Rle object, could > not append values for the last w-1 position in the Rle object in order to > take care of the size problem. > > Why not? Rle supports all the normal vector operations. And runsum or runmean will output a vector of the same size as the input, using a choice of two endrules. If you want 0's at the end, try something like: rle[(nrow(rd)-w+1):nrow(rd)] <- 0 > So I did it like that: > > params <-RDApplyParams(rd,function(rd) > append((diff(c(0,cumsum(rd$score)),lag=w)/w),rep(0,each=w-1),after=( length(rd$score)-w+1))) > > But when I try to add the new values to the rangedData object I got these > error. > > > values(rd)[,"windo"]<-rdapply(params) > Error in `[<-`(`*tmp*`, , j, value = <s4 object="" of="" class="" "dataframe"="">) : > ncol(x[j]) != ncol(value) > In addition: Warning messages: > 1: In mapply(f, ..., SIMPLIFY = FALSE) : > longer argument not a multiple of length of shorter > 2: In mapply(f, ..., SIMPLIFY = FALSE) : > longer argument not a multiple of length of shorter > > But when I check the size, they are the same, here for one space > > > x<-rdapply(params) > > length(x$SCAFFOLD_100) == length(rd["SCAFFOLD_100"]$windo) > [1] TRUE > > It may be that that type of insertion is unsupported. Why not just do something like: rd$window <- unlist(x) > Maybe params miss a parameter or the way I try to update the rd object is > wrong. Anyway form the rdapply output a vector could be created and so a new > rd object with the new value column. > > yvan > > > On 22/04/10 15:51, Michael Lawrence wrote: > > > > On Thu, Apr 22, 2010 at 5:49 AM, Michael Dondrup <michael.dondrup@uni.no>wrote: > >> Hi, >> how about funtion rdapply (not lapply) which is for that? >> >> > lapply() should apply per-space as well, basically providing a short-cut > for the more complicated rdapply(). > > > lapply(rd, function(x) sum(x$score)) > $chr1 > [1] 3 > > $chr2 > [1] 0 > > sapply() also works: > > sapply(rd, function(x) sum(x$score)) > chr1 chr2 > 3 0 > > Another choice is tapply: > > tapply(rd$score, space(rd), sum) > chr1 chr2 > 3 0 > > Michael > > The code below computes the sum score for each space in the RangedData: >> # taken from the examples mostly: >> > ranges <- IRanges(c(1,2,3),c(4,5,6)) >> > score <- c(2L, 0L, 1L) >> > rd <- RangedData(ranges, score, space = c("chr1","chr2","chr1")) >> > rd >> RangedData with 3 rows and 1 value column across 2 spaces >> space ranges | score >> <character> <iranges> | <integer> >> 1 chr1 [1, 4] | 2 >> 2 chr1 [3, 6] | 1 >> 3 chr2 [2, 5] | 0 >> > params <- RDApplyParams(rd, function(rd) sum(score(rd))) >> > rdapply(params) >> $chr1 >> [1] 3 >> >> $chr2 >> [1] 0 >> >> >> Cheers >> Michael >> >> Am Apr 22, 2010 um 1:57 PM schrieb Yvan: >> >> > On 21/04/10 18:43, Michael Lawrence wrote: >> >> >> >> >> >> On Wed, Apr 21, 2010 at 6:07 AM, Yvan <yvan.strahm@uni.no>> >> <mailto:yvan.strahm@uni.no>> wrote: >> >> >> >> Hello List, >> >> >> >> I am confused about how to loop over a rangedData object. >> >> I have this rangedData >> >> >> >> RangedData with 61 rows and 1 value column across 3 spaces >> >> space ranges | score >> >> <character> <iranges> | <numeric> >> >> 1 SCAFFOLD_1 [ 8, 8] | -0.09405 >> >> >> >> and the spaces are >> >> >> >> "SCAFFOLD_1" "SCAFFOLD_10" "SCAFFOLD_100" >> >> >> >> using aggregate it is possible to apply a function to one of the >> space >> >> >> >> aggregate(rd["SCAFFOLD_1"]$score, start = >> >> 1:(length(rd["SCAFFOLD_1"]$score)-w+1), width = w, FUN = sum) >> >> >> >> but how can I apply the aggregate to all space without a for loop ? >> >> >> >> >> >> It looks like you're attempting a running window sum of the score >> >> vector. There are more efficient ways of doing this besides >> >> aggregate(). If you convert the score into an Rle, you can use >> runsum(). >> >> >> >> Anyway, to do this over each space individually, use lapply(). >> >> >> >> This would come out to something like: >> >> >> >> values(rd)[,"smoothScore"] <- lapply(rd, function(x) >> >> runsum(Rle(x$score), w)) >> >> >> >> Probably not exactly right, but it gets you in the right direction... >> >> >> >> Michael >> >> >> >> >> > Hello Michael, >> > >> > Thanks for the answer and the tip about runsum! >> > I try with lapply but could not get it working right, the main problem >> > is that the runsum is calculated on all values and not for a each >> > specific spaces. >> > Sorry, I should have been more precise in the problem description. >> > The runsum should be calculated in a space specific manner, let say w=2 >> > >> > space score cumsum >> > 1 space1 1 3 >> > 2 space1 2 4 >> > 3 space1 2 NA >> > 4 space2 10 21 >> > 5 space2 11 22 >> > 6 space2 11 NA >> > >> > Is it possible to do it with lapply? >> > Thanks again for your help >> > cheers, >> > yvan >> > >> > [[alternative HTML version deleted]] >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor@stat.math.ethz.ch >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Also note that it's not really necessary to loop here, as is often the case with IRanges: rd$windo <- unlist(runmean(RleList(values(rd)[,"score"]))) On Fri, Apr 23, 2010 at 6:13 AM, Michael Lawrence <michafla@gene.com> wrote: > > > On Fri, Apr 23, 2010 at 1:49 AM, Yvan <yvan.strahm@uni.no> wrote: > >> Hello, >> Thank you both of you. >> >> I could could calculate the sliding window, but not as a Rle object, could >> not append values for the last w-1 position in the Rle object in order to >> take care of the size problem. >> >> > Why not? Rle supports all the normal vector operations. And runsum or > runmean will output a vector of the same size as the input, using a choice > of two endrules. If you want 0's at the end, try something like: > > rle[(nrow(rd)-w+1):nrow(rd)] <- 0 > > > >> So I did it like that: >> >> params <-RDApplyParams(rd,function(rd) >> append((diff(c(0,cumsum(rd$score)),lag=w)/w),rep(0,each=w-1),after= (length(rd$score)-w+1))) >> >> But when I try to add the new values to the rangedData object I got these >> error. >> >> > values(rd)[,"windo"]<-rdapply(params) >> Error in `[<-`(`*tmp*`, , j, value = <s4 object="" of="" class="" "dataframe"="">) : >> ncol(x[j]) != ncol(value) >> In addition: Warning messages: >> 1: In mapply(f, ..., SIMPLIFY = FALSE) : >> longer argument not a multiple of length of shorter >> 2: In mapply(f, ..., SIMPLIFY = FALSE) : >> longer argument not a multiple of length of shorter >> >> But when I check the size, they are the same, here for one space >> >> > x<-rdapply(params) >> > length(x$SCAFFOLD_100) == length(rd["SCAFFOLD_100"]$windo) >> [1] TRUE >> >> > It may be that that type of insertion is unsupported. Why not just do > something like: > > rd$window <- unlist(x) > > >> Maybe params miss a parameter or the way I try to update the rd object is >> wrong. Anyway form the rdapply output a vector could be created and so a new >> rd object with the new value column. >> >> yvan >> >> >> On 22/04/10 15:51, Michael Lawrence wrote: >> >> >> >> On Thu, Apr 22, 2010 at 5:49 AM, Michael Dondrup <michael.dondrup@uni.no>wrote: >> >>> Hi, >>> how about funtion rdapply (not lapply) which is for that? >>> >>> >> lapply() should apply per-space as well, basically providing a short-cut >> for the more complicated rdapply(). >> >> > lapply(rd, function(x) sum(x$score)) >> $chr1 >> [1] 3 >> >> $chr2 >> [1] 0 >> >> sapply() also works: >> > sapply(rd, function(x) sum(x$score)) >> chr1 chr2 >> 3 0 >> >> Another choice is tapply: >> > tapply(rd$score, space(rd), sum) >> chr1 chr2 >> 3 0 >> >> Michael >> >> The code below computes the sum score for each space in the RangedData: >>> # taken from the examples mostly: >>> > ranges <- IRanges(c(1,2,3),c(4,5,6)) >>> > score <- c(2L, 0L, 1L) >>> > rd <- RangedData(ranges, score, space = c("chr1","chr2","chr1")) >>> > rd >>> RangedData with 3 rows and 1 value column across 2 spaces >>> space ranges | score >>> <character> <iranges> | <integer> >>> 1 chr1 [1, 4] | 2 >>> 2 chr1 [3, 6] | 1 >>> 3 chr2 [2, 5] | 0 >>> > params <- RDApplyParams(rd, function(rd) sum(score(rd))) >>> > rdapply(params) >>> $chr1 >>> [1] 3 >>> >>> $chr2 >>> [1] 0 >>> >>> >>> Cheers >>> Michael >>> >>> Am Apr 22, 2010 um 1:57 PM schrieb Yvan: >>> >>> > On 21/04/10 18:43, Michael Lawrence wrote: >>> >> >>> >> >>> >> On Wed, Apr 21, 2010 at 6:07 AM, Yvan <yvan.strahm@uni.no>>> >> <mailto:yvan.strahm@uni.no>> wrote: >>> >> >>> >> Hello List, >>> >> >>> >> I am confused about how to loop over a rangedData object. >>> >> I have this rangedData >>> >> >>> >> RangedData with 61 rows and 1 value column across 3 spaces >>> >> space ranges | score >>> >> <character> <iranges> | <numeric> >>> >> 1 SCAFFOLD_1 [ 8, 8] | -0.09405 >>> >> >>> >> and the spaces are >>> >> >>> >> "SCAFFOLD_1" "SCAFFOLD_10" "SCAFFOLD_100" >>> >> >>> >> using aggregate it is possible to apply a function to one of the >>> space >>> >> >>> >> aggregate(rd["SCAFFOLD_1"]$score, start = >>> >> 1:(length(rd["SCAFFOLD_1"]$score)-w+1), width = w, FUN = sum) >>> >> >>> >> but how can I apply the aggregate to all space without a for loop ? >>> >> >>> >> >>> >> It looks like you're attempting a running window sum of the score >>> >> vector. There are more efficient ways of doing this besides >>> >> aggregate(). If you convert the score into an Rle, you can use >>> runsum(). >>> >> >>> >> Anyway, to do this over each space individually, use lapply(). >>> >> >>> >> This would come out to something like: >>> >> >>> >> values(rd)[,"smoothScore"] <- lapply(rd, function(x) >>> >> runsum(Rle(x$score), w)) >>> >> >>> >> Probably not exactly right, but it gets you in the right direction... >>> >> >>> >> Michael >>> >> >>> >> >>> > Hello Michael, >>> > >>> > Thanks for the answer and the tip about runsum! >>> > I try with lapply but could not get it working right, the main problem >>> > is that the runsum is calculated on all values and not for a each >>> > specific spaces. >>> > Sorry, I should have been more precise in the problem description. >>> > The runsum should be calculated in a space specific manner, let say w=2 >>> > >>> > space score cumsum >>> > 1 space1 1 3 >>> > 2 space1 2 4 >>> > 3 space1 2 NA >>> > 4 space2 10 21 >>> > 5 space2 11 22 >>> > 6 space2 11 NA >>> > >>> > Is it possible to do it with lapply? >>> > Thanks again for your help >>> > cheers, >>> > yvan >>> > >>> > [[alternative HTML version deleted]] >>> > >>> > _______________________________________________ >>> > Bioconductor mailing list >>> > Bioconductor@stat.math.ethz.ch >>> > https://stat.ethz.ch/mailman/listinfo/bioconductor >>> > Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> >> > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Yvan, It appears to me that you are trying to perform two conflicting activities: 1) Calculate the running sum of a metric over an annotated sequence (as evidenced by your aggregate function call) 2) Find the sum for a metric across specified intervals on the annotated sequence (as evidenced by your desire to assign the aggregated sums into an existing RangedData object) Taking a step back, I am guessing that you are trying to transform something akin to a UCSC bed file into something else that is UCSC bed file like. If you are using rtracklayer, this means your initial data are stored in a RangedData object. To create a RangedData object containing the running sum of a values column from an initial RangedData object, I recommend: 1) Creating an RleList object from the RangedData object using the coverage function. Make sure to specify the metric of interest in the weight argument to coverage. 2) Using the runsum function on the RleList object to calculate your running sums. 3) Creating a RangedData object from the RleList object in step 2 using as(<<obj>>, "RangedData") Here is an example: > # Step 1: create an RleList representation of the metric > rd <- RangedData(IRanges(start = c(5, 10, 15, 2, 4, 8), end = c(7, 14, 21, 3, 6, 9)), score = 1:6, space = rep(c("A", "B"), each = 3)) > scoreRleList <- coverage(rd, weight = "score", width = list(A = 30, B = 10)) > scoreRleList SimpleRleList of length 2 $A 'integer' Rle of length 30 with 6 runs Lengths: 4 3 2 5 7 9 Values : 0 1 0 2 3 0 $B 'integer' Rle of length 10 with 6 runs Lengths: 1 2 3 1 2 1 Values : 0 4 5 0 6 0 > # Step 2: calculate the running sums > scoreRunsum <- runsum(scoreRleList, k = 3, endrule = "constant") > scoreRunsum SimpleRleList of length 2 $A 'integer' Rle of length 30 with 15 runs Lengths: 3 1 1 1 1 1 1 1 3 1 1 5 1 1 8 Values : 0 1 2 3 2 1 2 4 6 7 8 9 6 3 0 $B 'integer' Rle of length 10 with 7 runs Lengths: 2 1 1 1 1 1 3 Values : 8 13 14 15 10 11 12 > # Step 3: Create a RangedData representation of the running sums > rdRunsum <- as(scoreRunsum, "RangedData") > rdRunsum RangedData with 22 rows and 1 value column across 2 spaces space ranges | score <character> <iranges> | <integer> 1 A [ 1, 3] | 0 2 A [ 4, 4] | 1 3 A [ 5, 5] | 2 4 A [ 6, 6] | 3 5 A [ 7, 7] | 2 6 A [ 8, 8] | 1 7 A [ 9, 9] | 2 8 A [10, 10] | 4 9 A [11, 13] | 6 ... ... ... ... ... 14 A [22, 22] | 3 15 A [23, 30] | 0 16 B [ 1, 2] | 8 17 B [ 3, 3] | 13 18 B [ 4, 4] | 14 19 B [ 5, 5] | 15 20 B [ 6, 6] | 10 21 B [ 7, 7] | 11 22 B [ 8, 10] | 12 > sessionInfo() R version 2.11.0 Patched (2010-04-24 r51820) i386-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] IRanges_1.6.1 loaded via a namespace (and not attached): [1] tools_2.11.0 On 4/23/10 6:26 AM, Michael Lawrence wrote: > Also note that it's not really necessary to loop here, as is often the case > with IRanges: > > rd$windo<- unlist(runmean(RleList(values(rd)[,"score"]))) > > On Fri, Apr 23, 2010 at 6:13 AM, Michael Lawrence<michafla at="" gene.com=""> wrote: > > >> >> On Fri, Apr 23, 2010 at 1:49 AM, Yvan<yvan.strahm at="" uni.no=""> wrote: >> >> >>> Hello, >>> Thank you both of you. >>> >>> I could could calculate the sliding window, but not as a Rle object, could >>> not append values for the last w-1 position in the Rle object in order to >>> take care of the size problem. >>> >>> >>> >> Why not? Rle supports all the normal vector operations. And runsum or >> runmean will output a vector of the same size as the input, using a choice >> of two endrules. If you want 0's at the end, try something like: >> >> rle[(nrow(rd)-w+1):nrow(rd)]<- 0 >> >> >> >> >>> So I did it like that: >>> >>> params<-RDApplyParams(rd,function(rd) >>> append((diff(c(0,cumsum(rd$score)),lag=w)/w),rep(0,each=w-1),after =(length(rd$score)-w+1))) >>> >>> But when I try to add the new values to the rangedData object I got these >>> error. >>> >>> >>>> values(rd)[,"windo"]<-rdapply(params) >>>> >>> Error in `[<-`(`*tmp*`, , j, value =<s4 object="" of="" class="" "dataframe"="">) : >>> ncol(x[j]) != ncol(value) >>> In addition: Warning messages: >>> 1: In mapply(f, ..., SIMPLIFY = FALSE) : >>> longer argument not a multiple of length of shorter >>> 2: In mapply(f, ..., SIMPLIFY = FALSE) : >>> longer argument not a multiple of length of shorter >>> >>> But when I check the size, they are the same, here for one space >>> >>> >>>> x<-rdapply(params) >>>> length(x$SCAFFOLD_100) == length(rd["SCAFFOLD_100"]$windo) >>>> >>> [1] TRUE >>> >>> >>> >> It may be that that type of insertion is unsupported. Why not just do >> something like: >> >> rd$window<- unlist(x) >> >> >> >>> Maybe params miss a parameter or the way I try to update the rd object is >>> wrong. Anyway form the rdapply output a vector could be created and so a new >>> rd object with the new value column. >>> >>> yvan >>> >>> >>> On 22/04/10 15:51, Michael Lawrence wrote: >>> >>> >>> >>> On Thu, Apr 22, 2010 at 5:49 AM, Michael Dondrup<michael.dondrup at="" uni.no="">wrote: >>> >>> >>>> Hi, >>>> how about funtion rdapply (not lapply) which is for that? >>>> >>>> >>>> >>> lapply() should apply per-space as well, basically providing a short-cut >>> for the more complicated rdapply(). >>> >>> >>>> lapply(rd, function(x) sum(x$score)) >>>> >>> $chr1 >>> [1] 3 >>> >>> $chr2 >>> [1] 0 >>> >>> sapply() also works: >>> >>>> sapply(rd, function(x) sum(x$score)) >>>> >>> chr1 chr2 >>> 3 0 >>> >>> Another choice is tapply: >>> >>>> tapply(rd$score, space(rd), sum) >>>> >>> chr1 chr2 >>> 3 0 >>> >>> Michael >>> >>> The code below computes the sum score for each space in the RangedData: >>> >>>> # taken from the examples mostly: >>>> >>>>> ranges<- IRanges(c(1,2,3),c(4,5,6)) >>>>> score<- c(2L, 0L, 1L) >>>>> rd<- RangedData(ranges, score, space = c("chr1","chr2","chr1")) >>>>> rd >>>>> >>>> RangedData with 3 rows and 1 value column across 2 spaces >>>> space ranges | score >>>> <character> <iranges> |<integer> >>>> 1 chr1 [1, 4] | 2 >>>> 2 chr1 [3, 6] | 1 >>>> 3 chr2 [2, 5] | 0 >>>> >>>>> params<- RDApplyParams(rd, function(rd) sum(score(rd))) >>>>> rdapply(params) >>>>> >>>> $chr1 >>>> [1] 3 >>>> >>>> $chr2 >>>> [1] 0 >>>> >>>> >>>> Cheers >>>> Michael >>>> >>>> Am Apr 22, 2010 um 1:57 PM schrieb Yvan: >>>> >>>> >>>>> On 21/04/10 18:43, Michael Lawrence wrote: >>>>> >>>>>> >>>>>> On Wed, Apr 21, 2010 at 6:07 AM, Yvan<yvan.strahm at="" uni.no="">>>>>> <mailto:yvan.strahm at="" uni.no="">> wrote: >>>>>> >>>>>> Hello List, >>>>>> >>>>>> I am confused about how to loop over a rangedData object. >>>>>> I have this rangedData >>>>>> >>>>>> RangedData with 61 rows and 1 value column across 3 spaces >>>>>> space ranges | score >>>>>> <character> <iranges> |<numeric> >>>>>> 1 SCAFFOLD_1 [ 8, 8] | -0.09405 >>>>>> >>>>>> and the spaces are >>>>>> >>>>>> "SCAFFOLD_1" "SCAFFOLD_10" "SCAFFOLD_100" >>>>>> >>>>>> using aggregate it is possible to apply a function to one of the >>>>>> >>>> space >>>> >>>>>> aggregate(rd["SCAFFOLD_1"]$score, start = >>>>>> 1:(length(rd["SCAFFOLD_1"]$score)-w+1), width = w, FUN = sum) >>>>>> >>>>>> but how can I apply the aggregate to all space without a for loop ? >>>>>> >>>>>> >>>>>> It looks like you're attempting a running window sum of the score >>>>>> vector. There are more efficient ways of doing this besides >>>>>> aggregate(). If you convert the score into an Rle, you can use >>>>>> >>>> runsum(). >>>> >>>>>> Anyway, to do this over each space individually, use lapply(). >>>>>> >>>>>> This would come out to something like: >>>>>> >>>>>> values(rd)[,"smoothScore"]<- lapply(rd, function(x) >>>>>> runsum(Rle(x$score), w)) >>>>>> >>>>>> Probably not exactly right, but it gets you in the right direction... >>>>>> >>>>>> Michael >>>>>> >>>>>> >>>>>> >>>>> Hello Michael, >>>>> >>>>> Thanks for the answer and the tip about runsum! >>>>> I try with lapply but could not get it working right, the main problem >>>>> is that the runsum is calculated on all values and not for a each >>>>> specific spaces. >>>>> Sorry, I should have been more precise in the problem description. >>>>> The runsum should be calculated in a space specific manner, let say w=2 >>>>> >>>>> space score cumsum >>>>> 1 space1 1 3 >>>>> 2 space1 2 4 >>>>> 3 space1 2 NA >>>>> 4 space2 10 21 >>>>> 5 space2 11 22 >>>>> 6 space2 11 NA >>>>> >>>>> Is it possible to do it with lapply? >>>>> Thanks again for your help >>>>> cheers, >>>>> yvan >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at stat.math.ethz.ch >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>> >>> >>> >> > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
Hello Patrick, Sorry for the late answer and thank you very much for these precisions. Maybe I should try to explain what I am trying to do. I am starting with a gff file produced by Nimblegen's NimbleScan software for a CGH high density array. Out of the nine columns, I am interested mainly in the first (seqname) and the sixth (score) even if all columns are loaded into a gff object/matrix(correct term?). The main purpose of my project is to filtrate out some region of each spaces based on sliding window over the score values. The same methodology used to filtrate out the regions will be used in a second pass to detect the region of interest. I was hoping using IRanges to perform the first sliding window on the different spaces, and then create some views for each spaces based on the sliding window results and finally recalculate the sliding window on all the views. After these two passes, a track for the ucsc genome browser should be created and also a gff file in order to view the results in the SignalMap software from Nimblegen. Hope I was clear enough and do you think that IRanges is the correct tool? Thank you again for your time and help. Cheers, yvan On 26/04/10 21:08, Patrick Aboyoun wrote: > Yvan, > It appears to me that you are trying to perform two conflicting > activities: > > 1) Calculate the running sum of a metric over an annotated sequence > (as evidenced by your aggregate function call) > 2) Find the sum for a metric across specified intervals on the > annotated sequence (as evidenced by your desire to assign the > aggregated sums into an existing RangedData object) > > Taking a step back, I am guessing that you are trying to transform > something akin to a UCSC bed file into something else that is UCSC bed > file like. If you are using rtracklayer, this means your initial data > are stored in a RangedData object. To create a RangedData object > containing the running sum of a values column from an initial > RangedData object, I recommend: > > 1) Creating an RleList object from the RangedData object using the > coverage function. Make sure to specify the metric of interest in the > weight argument to coverage. > 2) Using the runsum function on the RleList object to calculate your > running sums. > 3) Creating a RangedData object from the RleList object in step 2 > using as(<<obj>>, "RangedData") > > Here is an example: > > > # Step 1: create an RleList representation of the metric > > rd <- RangedData(IRanges(start = c(5, 10, 15, 2, 4, 8), end = c(7, > 14, 21, 3, 6, 9)), > score = 1:6, space = rep(c("A", "B"), each = 3)) > > scoreRleList <- coverage(rd, weight = "score", width = list(A = 30, > B = 10)) > > scoreRleList > SimpleRleList of length 2 > $A > 'integer' Rle of length 30 with 6 runs > Lengths: 4 3 2 5 7 9 > Values : 0 1 0 2 3 0 > > $B > 'integer' Rle of length 10 with 6 runs > Lengths: 1 2 3 1 2 1 > Values : 0 4 5 0 6 0 > > > # Step 2: calculate the running sums > > scoreRunsum <- runsum(scoreRleList, k = 3, endrule = "constant") > > scoreRunsum > SimpleRleList of length 2 > $A > 'integer' Rle of length 30 with 15 runs > Lengths: 3 1 1 1 1 1 1 1 3 1 1 5 1 1 8 > Values : 0 1 2 3 2 1 2 4 6 7 8 9 6 3 0 > > $B > 'integer' Rle of length 10 with 7 runs > Lengths: 2 1 1 1 1 1 3 > Values : 8 13 14 15 10 11 12 > > > > # Step 3: Create a RangedData representation of the running sums > > rdRunsum <- as(scoreRunsum, "RangedData") > > rdRunsum > RangedData with 22 rows and 1 value column across 2 spaces > space ranges | score > <character> <iranges> | <integer> > 1 A [ 1, 3] | 0 > 2 A [ 4, 4] | 1 > 3 A [ 5, 5] | 2 > 4 A [ 6, 6] | 3 > 5 A [ 7, 7] | 2 > 6 A [ 8, 8] | 1 > 7 A [ 9, 9] | 2 > 8 A [10, 10] | 4 > 9 A [11, 13] | 6 > ... ... ... ... ... > 14 A [22, 22] | 3 > 15 A [23, 30] | 0 > 16 B [ 1, 2] | 8 > 17 B [ 3, 3] | 13 > 18 B [ 4, 4] | 14 > 19 B [ 5, 5] | 15 > 20 B [ 6, 6] | 10 > 21 B [ 7, 7] | 11 > 22 B [ 8, 10] | 12 > > > sessionInfo() > R version 2.11.0 Patched (2010-04-24 r51820) > i386-apple-darwin9.8.0 > > locale: > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] IRanges_1.6.1 > > loaded via a namespace (and not attached): > [1] tools_2.11.0 > > > > On 4/23/10 6:26 AM, Michael Lawrence wrote: >> Also note that it's not really necessary to loop here, as is often >> the case >> with IRanges: >> >> rd$windo<- unlist(runmean(RleList(values(rd)[,"score"]))) >> >> On Fri, Apr 23, 2010 at 6:13 AM, Michael Lawrence<michafla at="" gene.com=""> >> wrote: >> >>> >>> On Fri, Apr 23, 2010 at 1:49 AM, Yvan<yvan.strahm at="" uni.no=""> wrote: >>> >>>> Hello, >>>> Thank you both of you. >>>> >>>> I could could calculate the sliding window, but not as a Rle >>>> object, could >>>> not append values for the last w-1 position in the Rle object in >>>> order to >>>> take care of the size problem. >>>> >>>> >>> Why not? Rle supports all the normal vector operations. And runsum or >>> runmean will output a vector of the same size as the input, using a >>> choice >>> of two endrules. If you want 0's at the end, try something like: >>> >>> rle[(nrow(rd)-w+1):nrow(rd)]<- 0 >>> >>> >>> >>>> So I did it like that: >>>> >>>> params<-RDApplyParams(rd,function(rd) >>>> append((diff(c(0,cumsum(rd$score)),lag=w)/w),rep(0,each=w-1),afte r=(length(rd$score)-w+1))) >>>> >>>> >>>> But when I try to add the new values to the rangedData object I got >>>> these >>>> error. >>>> >>>>> values(rd)[,"windo"]<-rdapply(params) >>>> Error in `[<-`(`*tmp*`, , j, value =<s4 object="" of="" class="">>>> "DataFrame">) : >>>> ncol(x[j]) != ncol(value) >>>> In addition: Warning messages: >>>> 1: In mapply(f, ..., SIMPLIFY = FALSE) : >>>> longer argument not a multiple of length of shorter >>>> 2: In mapply(f, ..., SIMPLIFY = FALSE) : >>>> longer argument not a multiple of length of shorter >>>> >>>> But when I check the size, they are the same, here for one space >>>> >>>>> x<-rdapply(params) >>>>> length(x$SCAFFOLD_100) == length(rd["SCAFFOLD_100"]$windo) >>>> [1] TRUE >>>> >>>> >>> It may be that that type of insertion is unsupported. Why not just do >>> something like: >>> >>> rd$window<- unlist(x) >>> >>> >>>> Maybe params miss a parameter or the way I try to update the rd >>>> object is >>>> wrong. Anyway form the rdapply output a vector could be created and >>>> so a new >>>> rd object with the new value column. >>>> >>>> yvan >>>> >>>> >>>> On 22/04/10 15:51, Michael Lawrence wrote: >>>> >>>> >>>> >>>> On Thu, Apr 22, 2010 at 5:49 AM, Michael >>>> Dondrup<michael.dondrup at="" uni.no="">wrote: >>>> >>>>> Hi, >>>>> how about funtion rdapply (not lapply) which is for that? >>>>> >>>>> >>>> lapply() should apply per-space as well, basically providing a >>>> short-cut >>>> for the more complicated rdapply(). >>>> >>>>> lapply(rd, function(x) sum(x$score)) >>>> $chr1 >>>> [1] 3 >>>> >>>> $chr2 >>>> [1] 0 >>>> >>>> sapply() also works: >>>>> sapply(rd, function(x) sum(x$score)) >>>> chr1 chr2 >>>> 3 0 >>>> >>>> Another choice is tapply: >>>>> tapply(rd$score, space(rd), sum) >>>> chr1 chr2 >>>> 3 0 >>>> >>>> Michael >>>> >>>> The code below computes the sum score for each space in the >>>> RangedData: >>>>> # taken from the examples mostly: >>>>>> ranges<- IRanges(c(1,2,3),c(4,5,6)) >>>>>> score<- c(2L, 0L, 1L) >>>>>> rd<- RangedData(ranges, score, space = c("chr1","chr2","chr1")) >>>>>> rd >>>>> RangedData with 3 rows and 1 value column across 2 spaces >>>>> space ranges | score >>>>> <character> <iranges> |<integer> >>>>> 1 chr1 [1, 4] | 2 >>>>> 2 chr1 [3, 6] | 1 >>>>> 3 chr2 [2, 5] | 0 >>>>>> params<- RDApplyParams(rd, function(rd) sum(score(rd))) >>>>>> rdapply(params) >>>>> $chr1 >>>>> [1] 3 >>>>> >>>>> $chr2 >>>>> [1] 0 >>>>> >>>>> >>>>> Cheers >>>>> Michael >>>>> >>>>> Am Apr 22, 2010 um 1:57 PM schrieb Yvan: >>>>> >>>>>> On 21/04/10 18:43, Michael Lawrence wrote: >>>>>>> >>>>>>> On Wed, Apr 21, 2010 at 6:07 AM, Yvan<yvan.strahm at="" uni.no="">>>>>>> <mailto:yvan.strahm at="" uni.no="">> wrote: >>>>>>> >>>>>>> Hello List, >>>>>>> >>>>>>> I am confused about how to loop over a rangedData object. >>>>>>> I have this rangedData >>>>>>> >>>>>>> RangedData with 61 rows and 1 value column across 3 spaces >>>>>>> space ranges | score >>>>>>> <character> <iranges> |<numeric> >>>>>>> 1 SCAFFOLD_1 [ 8, 8] | -0.09405 >>>>>>> >>>>>>> and the spaces are >>>>>>> >>>>>>> "SCAFFOLD_1" "SCAFFOLD_10" "SCAFFOLD_100" >>>>>>> >>>>>>> using aggregate it is possible to apply a function to one of >>>>>>> the >>>>> space >>>>>>> aggregate(rd["SCAFFOLD_1"]$score, start = >>>>>>> 1:(length(rd["SCAFFOLD_1"]$score)-w+1), width = w, FUN = sum) >>>>>>> >>>>>>> but how can I apply the aggregate to all space without a for >>>>>>> loop ? >>>>>>> >>>>>>> >>>>>>> It looks like you're attempting a running window sum of the score >>>>>>> vector. There are more efficient ways of doing this besides >>>>>>> aggregate(). If you convert the score into an Rle, you can use >>>>> runsum(). >>>>>>> Anyway, to do this over each space individually, use lapply(). >>>>>>> >>>>>>> This would come out to something like: >>>>>>> >>>>>>> values(rd)[,"smoothScore"]<- lapply(rd, function(x) >>>>>>> runsum(Rle(x$score), w)) >>>>>>> >>>>>>> Probably not exactly right, but it gets you in the right >>>>>>> direction... >>>>>>> >>>>>>> Michael >>>>>>> >>>>>>> >>>>>> Hello Michael, >>>>>> >>>>>> Thanks for the answer and the tip about runsum! >>>>>> I try with lapply but could not get it working right, the main >>>>>> problem >>>>>> is that the runsum is calculated on all values and not for a each >>>>>> specific spaces. >>>>>> Sorry, I should have been more precise in the problem description. >>>>>> The runsum should be calculated in a space specific manner, let >>>>>> say w=2 >>>>>> >>>>>> space score cumsum >>>>>> 1 space1 1 3 >>>>>> 2 space1 2 4 >>>>>> 3 space1 2 NA >>>>>> 4 space2 10 21 >>>>>> 5 space2 11 22 >>>>>> 6 space2 11 NA >>>>>> >>>>>> Is it possible to do it with lapply? >>>>>> Thanks again for your help >>>>>> cheers, >>>>>> yvan >>>>>> >>>>>> [[alternative HTML version deleted]] >>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor at stat.math.ethz.ch >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at stat.math.ethz.ch >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>> >>>> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
Yvan, I'm not sure if I follow all the details you provided. You have a GFF file contained "scored" regions. (Do these regions form a partitioning of your sequence positions or do the regions overlap? If a partitioning/region is not represented is it assumed to have a score of 0?) You would like to aggregate over neighboring regions using a moving window sum, thus creating new partitioning (with accompanying scores) of the sequence positions. (I provided code to do what I just mentioned.) From that point I am less clear on what you are trying to do. Could you provide a simple concrete example maybe involving three regions on a single sequence to illustrate what you are trying to achieve? Thanks. Patrick On 4/29/10 4:51 AM, Yvan wrote: > Hello Patrick, > > Sorry for the late answer and thank you very much for these precisions. > > Maybe I should try to explain what I am trying to do. I am starting > with a gff file produced by Nimblegen's NimbleScan software for a CGH > high density array. Out of the nine columns, I am interested mainly in > the first (seqname) and the sixth (score) even if all columns are > loaded into a gff object/matrix(correct term?). The main purpose of > my project is to filtrate out some region of each spaces based on > sliding window over the score values. The same methodology used to > filtrate out the regions will be used in a second pass to detect the > region of interest. I was hoping using IRanges to perform the first > sliding window on the different spaces, and then create some views for > each spaces based on the sliding window results and finally > recalculate the sliding window on all the views. After these two > passes, a track for the ucsc genome browser should be created and also > a gff file in order to view the results in the SignalMap software from > Nimblegen. > > Hope I was clear enough and do you think that IRanges is the correct > tool? > Thank you again for your time and help. > Cheers, > yvan > > > On 26/04/10 21:08, Patrick Aboyoun wrote: >> Yvan, >> It appears to me that you are trying to perform two conflicting >> activities: >> >> 1) Calculate the running sum of a metric over an annotated sequence >> (as evidenced by your aggregate function call) >> 2) Find the sum for a metric across specified intervals on the >> annotated sequence (as evidenced by your desire to assign the >> aggregated sums into an existing RangedData object) >> >> Taking a step back, I am guessing that you are trying to transform >> something akin to a UCSC bed file into something else that is UCSC >> bed file like. If you are using rtracklayer, this means your initial >> data are stored in a RangedData object. To create a RangedData object >> containing the running sum of a values column from an initial >> RangedData object, I recommend: >> >> 1) Creating an RleList object from the RangedData object using the >> coverage function. Make sure to specify the metric of interest in the >> weight argument to coverage. >> 2) Using the runsum function on the RleList object to calculate your >> running sums. >> 3) Creating a RangedData object from the RleList object in step 2 >> using as(<<obj>>, "RangedData") >> >> Here is an example: >> >> > # Step 1: create an RleList representation of the metric >> > rd <- RangedData(IRanges(start = c(5, 10, 15, 2, 4, 8), end = c(7, >> 14, 21, 3, 6, 9)), >> score = 1:6, space = rep(c("A", "B"), each = 3)) >> > scoreRleList <- coverage(rd, weight = "score", width = list(A = 30, >> B = 10)) >> > scoreRleList >> SimpleRleList of length 2 >> $A >> 'integer' Rle of length 30 with 6 runs >> Lengths: 4 3 2 5 7 9 >> Values : 0 1 0 2 3 0 >> >> $B >> 'integer' Rle of length 10 with 6 runs >> Lengths: 1 2 3 1 2 1 >> Values : 0 4 5 0 6 0 >> >> > # Step 2: calculate the running sums >> > scoreRunsum <- runsum(scoreRleList, k = 3, endrule = "constant") >> > scoreRunsum >> SimpleRleList of length 2 >> $A >> 'integer' Rle of length 30 with 15 runs >> Lengths: 3 1 1 1 1 1 1 1 3 1 1 5 1 1 8 >> Values : 0 1 2 3 2 1 2 4 6 7 8 9 6 3 0 >> >> $B >> 'integer' Rle of length 10 with 7 runs >> Lengths: 2 1 1 1 1 1 3 >> Values : 8 13 14 15 10 11 12 >> >> >> > # Step 3: Create a RangedData representation of the running sums >> > rdRunsum <- as(scoreRunsum, "RangedData") >> > rdRunsum >> RangedData with 22 rows and 1 value column across 2 spaces >> space ranges | score >> <character> <iranges> | <integer> >> 1 A [ 1, 3] | 0 >> 2 A [ 4, 4] | 1 >> 3 A [ 5, 5] | 2 >> 4 A [ 6, 6] | 3 >> 5 A [ 7, 7] | 2 >> 6 A [ 8, 8] | 1 >> 7 A [ 9, 9] | 2 >> 8 A [10, 10] | 4 >> 9 A [11, 13] | 6 >> ... ... ... ... ... >> 14 A [22, 22] | 3 >> 15 A [23, 30] | 0 >> 16 B [ 1, 2] | 8 >> 17 B [ 3, 3] | 13 >> 18 B [ 4, 4] | 14 >> 19 B [ 5, 5] | 15 >> 20 B [ 6, 6] | 10 >> 21 B [ 7, 7] | 11 >> 22 B [ 8, 10] | 12 >> >> > sessionInfo() >> R version 2.11.0 Patched (2010-04-24 r51820) >> i386-apple-darwin9.8.0 >> >> locale: >> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] IRanges_1.6.1 >> >> loaded via a namespace (and not attached): >> [1] tools_2.11.0 >> >> >> >> On 4/23/10 6:26 AM, Michael Lawrence wrote: >>> Also note that it's not really necessary to loop here, as is often >>> the case >>> with IRanges: >>> >>> rd$windo<- unlist(runmean(RleList(values(rd)[,"score"]))) >>> >>> On Fri, Apr 23, 2010 at 6:13 AM, Michael >>> Lawrence<michafla at="" gene.com=""> wrote: >>> >>>> >>>> On Fri, Apr 23, 2010 at 1:49 AM, Yvan<yvan.strahm at="" uni.no=""> wrote: >>>> >>>>> Hello, >>>>> Thank you both of you. >>>>> >>>>> I could could calculate the sliding window, but not as a Rle >>>>> object, could >>>>> not append values for the last w-1 position in the Rle object in >>>>> order to >>>>> take care of the size problem. >>>>> >>>>> >>>> Why not? Rle supports all the normal vector operations. And runsum or >>>> runmean will output a vector of the same size as the input, using a >>>> choice >>>> of two endrules. If you want 0's at the end, try something like: >>>> >>>> rle[(nrow(rd)-w+1):nrow(rd)]<- 0 >>>> >>>> >>>> >>>>> So I did it like that: >>>>> >>>>> params<-RDApplyParams(rd,function(rd) >>>>> append((diff(c(0,cumsum(rd$score)),lag=w)/w),rep(0,each=w-1),aft er=(length(rd$score)-w+1))) >>>>> >>>>> >>>>> But when I try to add the new values to the rangedData object I >>>>> got these >>>>> error. >>>>> >>>>>> values(rd)[,"windo"]<-rdapply(params) >>>>> Error in `[<-`(`*tmp*`, , j, value =<s4 object="" of="" class="">>>>> "DataFrame">) : >>>>> ncol(x[j]) != ncol(value) >>>>> In addition: Warning messages: >>>>> 1: In mapply(f, ..., SIMPLIFY = FALSE) : >>>>> longer argument not a multiple of length of shorter >>>>> 2: In mapply(f, ..., SIMPLIFY = FALSE) : >>>>> longer argument not a multiple of length of shorter >>>>> >>>>> But when I check the size, they are the same, here for one space >>>>> >>>>>> x<-rdapply(params) >>>>>> length(x$SCAFFOLD_100) == length(rd["SCAFFOLD_100"]$windo) >>>>> [1] TRUE >>>>> >>>>> >>>> It may be that that type of insertion is unsupported. Why not just do >>>> something like: >>>> >>>> rd$window<- unlist(x) >>>> >>>> >>>>> Maybe params miss a parameter or the way I try to update the rd >>>>> object is >>>>> wrong. Anyway form the rdapply output a vector could be created >>>>> and so a new >>>>> rd object with the new value column. >>>>> >>>>> yvan >>>>> >>>>> >>>>> On 22/04/10 15:51, Michael Lawrence wrote: >>>>> >>>>> >>>>> >>>>> On Thu, Apr 22, 2010 at 5:49 AM, Michael >>>>> Dondrup<michael.dondrup at="" uni.no="">wrote: >>>>> >>>>>> Hi, >>>>>> how about funtion rdapply (not lapply) which is for that? >>>>>> >>>>>> >>>>> lapply() should apply per-space as well, basically providing a >>>>> short-cut >>>>> for the more complicated rdapply(). >>>>> >>>>>> lapply(rd, function(x) sum(x$score)) >>>>> $chr1 >>>>> [1] 3 >>>>> >>>>> $chr2 >>>>> [1] 0 >>>>> >>>>> sapply() also works: >>>>>> sapply(rd, function(x) sum(x$score)) >>>>> chr1 chr2 >>>>> 3 0 >>>>> >>>>> Another choice is tapply: >>>>>> tapply(rd$score, space(rd), sum) >>>>> chr1 chr2 >>>>> 3 0 >>>>> >>>>> Michael >>>>> >>>>> The code below computes the sum score for each space in the >>>>> RangedData: >>>>>> # taken from the examples mostly: >>>>>>> ranges<- IRanges(c(1,2,3),c(4,5,6)) >>>>>>> score<- c(2L, 0L, 1L) >>>>>>> rd<- RangedData(ranges, score, space = c("chr1","chr2","chr1")) >>>>>>> rd >>>>>> RangedData with 3 rows and 1 value column across 2 spaces >>>>>> space ranges | score >>>>>> <character> <iranges> |<integer> >>>>>> 1 chr1 [1, 4] | 2 >>>>>> 2 chr1 [3, 6] | 1 >>>>>> 3 chr2 [2, 5] | 0 >>>>>>> params<- RDApplyParams(rd, function(rd) sum(score(rd))) >>>>>>> rdapply(params) >>>>>> $chr1 >>>>>> [1] 3 >>>>>> >>>>>> $chr2 >>>>>> [1] 0 >>>>>> >>>>>> >>>>>> Cheers >>>>>> Michael >>>>>> >>>>>> Am Apr 22, 2010 um 1:57 PM schrieb Yvan: >>>>>> >>>>>>> On 21/04/10 18:43, Michael Lawrence wrote: >>>>>>>> >>>>>>>> On Wed, Apr 21, 2010 at 6:07 AM, Yvan<yvan.strahm at="" uni.no="">>>>>>>> <mailto:yvan.strahm at="" uni.no="">> wrote: >>>>>>>> >>>>>>>> Hello List, >>>>>>>> >>>>>>>> I am confused about how to loop over a rangedData object. >>>>>>>> I have this rangedData >>>>>>>> >>>>>>>> RangedData with 61 rows and 1 value column across 3 spaces >>>>>>>> space ranges | score >>>>>>>> <character> <iranges> |<numeric> >>>>>>>> 1 SCAFFOLD_1 [ 8, 8] | -0.09405 >>>>>>>> >>>>>>>> and the spaces are >>>>>>>> >>>>>>>> "SCAFFOLD_1" "SCAFFOLD_10" "SCAFFOLD_100" >>>>>>>> >>>>>>>> using aggregate it is possible to apply a function to one >>>>>>>> of the >>>>>> space >>>>>>>> aggregate(rd["SCAFFOLD_1"]$score, start = >>>>>>>> 1:(length(rd["SCAFFOLD_1"]$score)-w+1), width = w, FUN = sum) >>>>>>>> >>>>>>>> but how can I apply the aggregate to all space without a >>>>>>>> for loop ? >>>>>>>> >>>>>>>> >>>>>>>> It looks like you're attempting a running window sum of the score >>>>>>>> vector. There are more efficient ways of doing this besides >>>>>>>> aggregate(). If you convert the score into an Rle, you can use >>>>>> runsum(). >>>>>>>> Anyway, to do this over each space individually, use lapply(). >>>>>>>> >>>>>>>> This would come out to something like: >>>>>>>> >>>>>>>> values(rd)[,"smoothScore"]<- lapply(rd, function(x) >>>>>>>> runsum(Rle(x$score), w)) >>>>>>>> >>>>>>>> Probably not exactly right, but it gets you in the right >>>>>>>> direction... >>>>>>>> >>>>>>>> Michael >>>>>>>> >>>>>>>> >>>>>>> Hello Michael, >>>>>>> >>>>>>> Thanks for the answer and the tip about runsum! >>>>>>> I try with lapply but could not get it working right, the main >>>>>>> problem >>>>>>> is that the runsum is calculated on all values and not for a each >>>>>>> specific spaces. >>>>>>> Sorry, I should have been more precise in the problem description. >>>>>>> The runsum should be calculated in a space specific manner, let >>>>>>> say w=2 >>>>>>> >>>>>>> space score cumsum >>>>>>> 1 space1 1 3 >>>>>>> 2 space1 2 4 >>>>>>> 3 space1 2 NA >>>>>>> 4 space2 10 21 >>>>>>> 5 space2 11 22 >>>>>>> 6 space2 11 NA >>>>>>> >>>>>>> Is it possible to do it with lapply? >>>>>>> Thanks again for your help >>>>>>> cheers, >>>>>>> yvan >>>>>>> >>>>>>> [[alternative HTML version deleted]] >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioconductor mailing list >>>>>>> Bioconductor at stat.math.ethz.ch >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>> Search the archives: >>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor at stat.math.ethz.ch >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: >>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>> >>>>> >>>>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >
ADD REPLY
0
Entering edit mode
Yvan ▴ 60
@yvan-4034
Last seen 10.2 years ago
Hello List, I am confused about how to loop over a rangedData object. I have this rangedData RangedData with 61 rows and 1 value column across 3 spaces space ranges | score <character> <iranges> | <numeric> 1 SCAFFOLD_1 [ 8, 8] | -0.09405 and the spaces are "SCAFFOLD_1" "SCAFFOLD_10" "SCAFFOLD_100" using aggregate it is possible to apply a function to one of the space aggregate(rd["SCAFFOLD_1"]$score, start = 1:(length(rd["SCAFFOLD_1"]$score)-w+1), width = w, FUN = sum) but how can I apply the aggregate to all space without a for loop ? And it is possible to add a column sum "on the fly" or should I create a vector "sum" containing values for all spaces and the recreate a rangedData object with this extra column? Thanks for your help and patience yvan
ADD COMMENT

Login before adding your answer.

Traffic: 694 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6