Question: loop over IRanges spaces
0
gravatar for Yvan
9.4 years ago by
Yvan60
Yvan60 wrote:
Hello, I tried your code below, but and iI am not able to create the scoreRleList as weight could be only a integer list. > read_gff function(file="gff3.txt") { gff <- read.delim(file,header=FALSE) colnames(gff) <- c("seqname", "source", "feature", "start", "end", "score", "strand","frame", "comments") return(gff) } > library(IRanges) Attaching package: 'IRanges' The following object(s) are masked from 'package:base': cbind, Map, mapply, order, paste, pmax, pmax.int, pmin, pmin.int, rbind, rep.int, table > gff<-read_gff("/home/yvan/Lundalm/R/projects/coen/gff_original/50Lines .gff") > rd<-RangedData(IRanges(gff$start,width=1),score=gff$score,start=gff$st art,space=gff$seqname) > scoreRleList<-coverage(rd, weight ="score",width = as.list(table(rd$space))) Error in .local(x, shift, width, weight, ...) : 'weight' must be a non-empty list of integers > Can I change/force the type of 'weight' ? In a different approach, I try to use different views on the data set. This was motivated by the fact that the difference on the start of each probes could indicate either a change of scaffold or a gap inside a scaffold. > first_view<-slice(diff(rd$start), lower=10, upper=200) > first_view Views on a 60-integer XInteger subject subject: 28 24 25 37 23 24 32 34 25 23 63 ... 24 35 24 29 31 28 19 29 27 29 37 views: start end width [1] 1 19 19 [28 24 25 37 23 24 32 34 25 23 63 21 36 27 19 33 33 24 32] [2] 21 39 19 [ 31 22 34 18 36 25 ... 31 108 34 28 26 23 33] [3] 41 60 20 [32 28 26 20 33 25 30 30 24 ... 29 31 28 19 29 27 29 37 22] > so here the three views are the three scaffold. but if I want to change the subject of the view to the score > second_view<-Views(subject=rd$score,start=first_view$start,end=first_v iew$end) Error in function (classes, fdef, mtable) : unable to find an inherited method for function "Views", for signature "numeric" > Again can I change the expected type of the signature in the Views function. Thanks again for your help. Cheers, yvan On 04/26/2010 09:08 PM, Patrick Aboyoun wrote: > Yvan, > It appears to me that you are trying to perform two conflicting > activities: > > 1) Calculate the running sum of a metric over an annotated sequence > (as evidenced by your aggregate function call) > 2) Find the sum for a metric across specified intervals on the > annotated sequence (as evidenced by your desire to assign the > aggregated sums into an existing RangedData object) > > Taking a step back, I am guessing that you are trying to transform > something akin to a UCSC bed file into something else that is UCSC bed > file like. If you are using rtracklayer, this means your initial data > are stored in a RangedData object. To create a RangedData object > containing the running sum of a values column from an initial > RangedData object, I recommend: > > 1) Creating an RleList object from the RangedData object using the > coverage function. Make sure to specify the metric of interest in the > weight argument to coverage. > 2) Using the runsum function on the RleList object to calculate your > running sums. > 3) Creating a RangedData object from the RleList object in step 2 > using as(<<obj>>, "RangedData") > > Here is an example: > > > # Step 1: create an RleList representation of the metric > > rd <- RangedData(IRanges(start = c(5, 10, 15, 2, 4, 8), end = c(7, > 14, 21, 3, 6, 9)), > score = 1:6, space = rep(c("A", "B"), each = 3)) > > scoreRleList <- coverage(rd, weight = "score", width = list(A = 30, > B = 10)) > > scoreRleList > SimpleRleList of length 2 > $A > 'integer' Rle of length 30 with 6 runs > Lengths: 4 3 2 5 7 9 > Values : 0 1 0 2 3 0 > > $B > 'integer' Rle of length 10 with 6 runs > Lengths: 1 2 3 1 2 1 > Values : 0 4 5 0 6 0 > > > # Step 2: calculate the running sums > > scoreRunsum <- runsum(scoreRleList, k = 3, endrule = "constant") > > scoreRunsum > SimpleRleList of length 2 > $A > 'integer' Rle of length 30 with 15 runs > Lengths: 3 1 1 1 1 1 1 1 3 1 1 5 1 1 8 > Values : 0 1 2 3 2 1 2 4 6 7 8 9 6 3 0 > > $B > 'integer' Rle of length 10 with 7 runs > Lengths: 2 1 1 1 1 1 3 > Values : 8 13 14 15 10 11 12 > > > > # Step 3: Create a RangedData representation of the running sums > > rdRunsum <- as(scoreRunsum, "RangedData") > > rdRunsum > RangedData with 22 rows and 1 value column across 2 spaces > space ranges | score > <character> <iranges> | <integer> > 1 A [ 1, 3] | 0 > 2 A [ 4, 4] | 1 > 3 A [ 5, 5] | 2 > 4 A [ 6, 6] | 3 > 5 A [ 7, 7] | 2 > 6 A [ 8, 8] | 1 > 7 A [ 9, 9] | 2 > 8 A [10, 10] | 4 > 9 A [11, 13] | 6 > ... ... ... ... ... > 14 A [22, 22] | 3 > 15 A [23, 30] | 0 > 16 B [ 1, 2] | 8 > 17 B [ 3, 3] | 13 > 18 B [ 4, 4] | 14 > 19 B [ 5, 5] | 15 > 20 B [ 6, 6] | 10 > 21 B [ 7, 7] | 11 > 22 B [ 8, 10] | 12 > > > sessionInfo() > R version 2.11.0 Patched (2010-04-24 r51820) > i386-apple-darwin9.8.0 > > locale: > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] IRanges_1.6.1 > > loaded via a namespace (and not attached): > [1] tools_2.11.0 > > > > On 4/23/10 6:26 AM, Michael Lawrence wrote: >> Also note that it's not really necessary to loop here, as is often >> the case >> with IRanges: >> >> rd$windo<- unlist(runmean(RleList(values(rd)[,"score"]))) >> >> On Fri, Apr 23, 2010 at 6:13 AM, Michael Lawrence<michafla at="" gene.com=""> >> wrote: >> >>> >>> On Fri, Apr 23, 2010 at 1:49 AM, Yvan<yvan.strahm at="" uni.no=""> wrote: >>> >>>> Hello, >>>> Thank you both of you. >>>> >>>> I could could calculate the sliding window, but not as a Rle >>>> object, could >>>> not append values for the last w-1 position in the Rle object in >>>> order to >>>> take care of the size problem. >>>> >>>> >>> Why not? Rle supports all the normal vector operations. And runsum or >>> runmean will output a vector of the same size as the input, using a >>> choice >>> of two endrules. If you want 0's at the end, try something like: >>> >>> rle[(nrow(rd)-w+1):nrow(rd)]<- 0 >>> >>> >>> >>>> So I did it like that: >>>> >>>> params<-RDApplyParams(rd,function(rd) >>>> append((diff(c(0,cumsum(rd$score)),lag=w)/w),rep(0,each=w-1),afte r=(length(rd$score)-w+1))) >>>> >>>> >>>> But when I try to add the new values to the rangedData object I got >>>> these >>>> error. >>>> >>>>> values(rd)[,"windo"]<-rdapply(params) >>>> Error in `[<-`(`*tmp*`, , j, value =<s4 object="" of="" class="">>>> "DataFrame">) : >>>> ncol(x[j]) != ncol(value) >>>> In addition: Warning messages: >>>> 1: In mapply(f, ..., SIMPLIFY = FALSE) : >>>> longer argument not a multiple of length of shorter >>>> 2: In mapply(f, ..., SIMPLIFY = FALSE) : >>>> longer argument not a multiple of length of shorter >>>> >>>> But when I check the size, they are the same, here for one space >>>> >>>>> x<-rdapply(params) >>>>> length(x$SCAFFOLD_100) == length(rd["SCAFFOLD_100"]$windo) >>>> [1] TRUE >>>> >>>> >>> It may be that that type of insertion is unsupported. Why not just do >>> something like: >>> >>> rd$window<- unlist(x) >>> >>> >>>> Maybe params miss a parameter or the way I try to update the rd >>>> object is >>>> wrong. Anyway form the rdapply output a vector could be created and >>>> so a new >>>> rd object with the new value column. >>>> >>>> yvan >>>> >>>> >>>> On 22/04/10 15:51, Michael Lawrence wrote: >>>> >>>> >>>> >>>> On Thu, Apr 22, 2010 at 5:49 AM, Michael >>>> Dondrup<michael.dondrup at="" uni.no="">wrote: >>>> >>>>> Hi, >>>>> how about funtion rdapply (not lapply) which is for that? >>>>> >>>>> >>>> lapply() should apply per-space as well, basically providing a >>>> short-cut >>>> for the more complicated rdapply(). >>>> >>>>> lapply(rd, function(x) sum(x$score)) >>>> $chr1 >>>> [1] 3 >>>> >>>> $chr2 >>>> [1] 0 >>>> >>>> sapply() also works: >>>>> sapply(rd, function(x) sum(x$score)) >>>> chr1 chr2 >>>> 3 0 >>>> >>>> Another choice is tapply: >>>>> tapply(rd$score, space(rd), sum) >>>> chr1 chr2 >>>> 3 0 >>>> >>>> Michael >>>> >>>> The code below computes the sum score for each space in the >>>> RangedData: >>>>> # taken from the examples mostly: >>>>>> ranges<- IRanges(c(1,2,3),c(4,5,6)) >>>>>> score<- c(2L, 0L, 1L) >>>>>> rd<- RangedData(ranges, score, space = c("chr1","chr2","chr1")) >>>>>> rd >>>>> RangedData with 3 rows and 1 value column across 2 spaces >>>>> space ranges | score >>>>> <character> <iranges> |<integer> >>>>> 1 chr1 [1, 4] | 2 >>>>> 2 chr1 [3, 6] | 1 >>>>> 3 chr2 [2, 5] | 0 >>>>>> params<- RDApplyParams(rd, function(rd) sum(score(rd))) >>>>>> rdapply(params) >>>>> $chr1 >>>>> [1] 3 >>>>> >>>>> $chr2 >>>>> [1] 0 >>>>> >>>>> >>>>> Cheers >>>>> Michael >>>>> >>>>> Am Apr 22, 2010 um 1:57 PM schrieb Yvan: >>>>> >>>>>> On 21/04/10 18:43, Michael Lawrence wrote: >>>>>>> >>>>>>> On Wed, Apr 21, 2010 at 6:07 AM, Yvan<yvan.strahm at="" uni.no="">>>>>>> <mailto:yvan.strahm at="" uni.no="">> wrote: >>>>>>> >>>>>>> Hello List, >>>>>>> >>>>>>> I am confused about how to loop over a rangedData object. >>>>>>> I have this rangedData >>>>>>> >>>>>>> RangedData with 61 rows and 1 value column across 3 spaces >>>>>>> space ranges | score >>>>>>> <character> <iranges> |<numeric> >>>>>>> 1 SCAFFOLD_1 [ 8, 8] | -0.09405 >>>>>>> >>>>>>> and the spaces are >>>>>>> >>>>>>> "SCAFFOLD_1" "SCAFFOLD_10" "SCAFFOLD_100" >>>>>>> >>>>>>> using aggregate it is possible to apply a function to one of >>>>>>> the >>>>> space >>>>>>> aggregate(rd["SCAFFOLD_1"]$score, start = >>>>>>> 1:(length(rd["SCAFFOLD_1"]$score)-w+1), width = w, FUN = sum) >>>>>>> >>>>>>> but how can I apply the aggregate to all space without a for >>>>>>> loop ? >>>>>>> >>>>>>> >>>>>>> It looks like you're attempting a running window sum of the score >>>>>>> vector. There are more efficient ways of doing this besides >>>>>>> aggregate(). If you convert the score into an Rle, you can use >>>>> runsum(). >>>>>>> Anyway, to do this over each space individually, use lapply(). >>>>>>> >>>>>>> This would come out to something like: >>>>>>> >>>>>>> values(rd)[,"smoothScore"]<- lapply(rd, function(x) >>>>>>> runsum(Rle(x$score), w)) >>>>>>> >>>>>>> Probably not exactly right, but it gets you in the right >>>>>>> direction... >>>>>>> >>>>>>> Michael >>>>>>> >>>>>>> >>>>>> Hello Michael, >>>>>> >>>>>> Thanks for the answer and the tip about runsum! >>>>>> I try with lapply but could not get it working right, the main >>>>>> problem >>>>>> is that the runsum is calculated on all values and not for a each >>>>>> specific spaces. >>>>>> Sorry, I should have been more precise in the problem description. >>>>>> The runsum should be calculated in a space specific manner, let >>>>>> say w=2 >>>>>> >>>>>> space score cumsum >>>>>> 1 space1 1 3 >>>>>> 2 space1 2 4 >>>>>> 3 space1 2 NA >>>>>> 4 space2 10 21 >>>>>> 5 space2 11 22 >>>>>> 6 space2 11 NA >>>>>> >>>>>> Is it possible to do it with lapply? >>>>>> Thanks again for your help >>>>>> cheers, >>>>>> yvan >>>>>> >>>>>> [[alternative HTML version deleted]] >>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor at stat.math.ethz.ch >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at stat.math.ethz.ch >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>> >>>> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENTlink modified 9.4 years ago by Michael Lawrence11k • written 9.4 years ago by Yvan60
Answer: loop over IRanges spaces
0
gravatar for Michael Lawrence
9.4 years ago by
United States
Michael Lawrence11k wrote:
On Tue, May 11, 2010 at 2:15 AM, Yvan <yvan.strahm@uni.no> wrote: > Hello, > > I tried your code below, but and iI am not able to create the scoreRleList > as weight could be only a integer list. > > > read_gff > function(file="gff3.txt") { > gff <- read.delim(file,header=FALSE) > colnames(gff) <- c("seqname", "source", "feature", "start", "end", > "score", "strand","frame", "comments") > return(gff) > } > The rtracklayer package would do the above for you. > > library(IRanges) > > Attaching package: 'IRanges' > > The following object(s) are masked from 'package:base': > > cbind, Map, mapply, order, paste, pmax, pmax.int, pmin, pmin.int, > rbind, rep.int, table > > > > gff<-read_gff("/home/yvan/Lundalm/R/projects/coen/gff_original/50Lin es.gff") > > > rd<-RangedData(IRanges(gff$start,width=1),score=gff$score,start=gff$ start,space=gff$seqname) > > scoreRleList<-coverage(rd, weight ="score",width = > as.list(table(rd$space))) > Error in .local(x, shift, width, weight, ...) : > 'weight' must be a non-empty list of integers > > > Can I change/force the type of 'weight' ? > > It sounds like your scores are floating point values, so they become a numeric vector in R. Just coerce them with as.integer: rd$score <- as.integer(rd$score) That might get you closer. You might want to multiply by some power of 10 first to save some precision. > > In a different approach, I try to use different views on the data set. This > was motivated by the fact that the difference on the start of each probes > could indicate either a change of scaffold or a gap inside a scaffold. > > > first_view<-slice(diff(rd$start), lower=10, upper=200) > > first_view > Views on a 60-integer XInteger subject > subject: 28 24 25 37 23 24 32 34 25 23 63 ... 24 35 24 29 31 28 19 29 27 29 > 37 > views: > start end width > [1] 1 19 19 [28 24 25 37 23 24 32 34 25 23 63 21 36 27 19 33 33 24 > 32] > [2] 21 39 19 [ 31 22 34 18 36 25 ... 31 108 34 28 26 23 > 33] > [3] 41 60 20 [32 28 26 20 33 25 30 30 24 ... 29 31 28 19 29 27 29 37 > 22] > > > > so here the three views are the three scaffold. but if I want to change the > subject of the view to the score > > > second_view<-Views(subject=rd$score,start=first_view$start,end=first _view$end) > Error in function (classes, fdef, mtable) : > unable to find an inherited method for function "Views", for signature > "numeric" > > > > Again can I change the expected type of the signature in the Views > function. > > That would take a lot of work, involving a new NumericView class. Much simpler to just coerce the scores to integer as in the above. Michael Thanks again for your help. > Cheers, > yvan > > > On 04/26/2010 09:08 PM, Patrick Aboyoun wrote: > >> Yvan, >> It appears to me that you are trying to perform two conflicting >> activities: >> >> 1) Calculate the running sum of a metric over an annotated sequence (as >> evidenced by your aggregate function call) >> 2) Find the sum for a metric across specified intervals on the annotated >> sequence (as evidenced by your desire to assign the aggregated sums into an >> existing RangedData object) >> >> Taking a step back, I am guessing that you are trying to transform >> something akin to a UCSC bed file into something else that is UCSC bed file >> like. If you are using rtracklayer, this means your initial data are stored >> in a RangedData object. To create a RangedData object containing the running >> sum of a values column from an initial RangedData object, I recommend: >> >> 1) Creating an RleList object from the RangedData object using the >> coverage function. Make sure to specify the metric of interest in the weight >> argument to coverage. >> 2) Using the runsum function on the RleList object to calculate your >> running sums. >> 3) Creating a RangedData object from the RleList object in step 2 using >> as(<<obj>>, "RangedData") >> >> Here is an example: >> >> > # Step 1: create an RleList representation of the metric >> > rd <- RangedData(IRanges(start = c(5, 10, 15, 2, 4, 8), end = c(7, 14, >> 21, 3, 6, 9)), >> score = 1:6, space = rep(c("A", "B"), each = 3)) >> > scoreRleList <- coverage(rd, weight = "score", width = list(A = 30, B = >> 10)) >> > scoreRleList >> SimpleRleList of length 2 >> $A >> 'integer' Rle of length 30 with 6 runs >> Lengths: 4 3 2 5 7 9 >> Values : 0 1 0 2 3 0 >> >> $B >> 'integer' Rle of length 10 with 6 runs >> Lengths: 1 2 3 1 2 1 >> Values : 0 4 5 0 6 0 >> >> > # Step 2: calculate the running sums >> > scoreRunsum <- runsum(scoreRleList, k = 3, endrule = "constant") >> > scoreRunsum >> SimpleRleList of length 2 >> $A >> 'integer' Rle of length 30 with 15 runs >> Lengths: 3 1 1 1 1 1 1 1 3 1 1 5 1 1 8 >> Values : 0 1 2 3 2 1 2 4 6 7 8 9 6 3 0 >> >> $B >> 'integer' Rle of length 10 with 7 runs >> Lengths: 2 1 1 1 1 1 3 >> Values : 8 13 14 15 10 11 12 >> >> >> > # Step 3: Create a RangedData representation of the running sums >> > rdRunsum <- as(scoreRunsum, "RangedData") >> > rdRunsum >> RangedData with 22 rows and 1 value column across 2 spaces >> space ranges | score >> <character> <iranges> | <integer> >> 1 A [ 1, 3] | 0 >> 2 A [ 4, 4] | 1 >> 3 A [ 5, 5] | 2 >> 4 A [ 6, 6] | 3 >> 5 A [ 7, 7] | 2 >> 6 A [ 8, 8] | 1 >> 7 A [ 9, 9] | 2 >> 8 A [10, 10] | 4 >> 9 A [11, 13] | 6 >> ... ... ... ... ... >> 14 A [22, 22] | 3 >> 15 A [23, 30] | 0 >> 16 B [ 1, 2] | 8 >> 17 B [ 3, 3] | 13 >> 18 B [ 4, 4] | 14 >> 19 B [ 5, 5] | 15 >> 20 B [ 6, 6] | 10 >> 21 B [ 7, 7] | 11 >> 22 B [ 8, 10] | 12 >> >> > sessionInfo() >> R version 2.11.0 Patched (2010-04-24 r51820) >> i386-apple-darwin9.8.0 >> >> locale: >> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] IRanges_1.6.1 >> >> loaded via a namespace (and not attached): >> [1] tools_2.11.0 >> >> >> >> On 4/23/10 6:26 AM, Michael Lawrence wrote: >> >>> Also note that it's not really necessary to loop here, as is often the >>> case >>> with IRanges: >>> >>> rd$windo<- unlist(runmean(RleList(values(rd)[,"score"]))) >>> >>> On Fri, Apr 23, 2010 at 6:13 AM, Michael Lawrence<michafla@gene.com> >>> wrote: >>> >>> >>>> On Fri, Apr 23, 2010 at 1:49 AM, Yvan<yvan.strahm@uni.no> wrote: >>>> >>>> Hello, >>>>> Thank you both of you. >>>>> >>>>> I could could calculate the sliding window, but not as a Rle object, >>>>> could >>>>> not append values for the last w-1 position in the Rle object in order >>>>> to >>>>> take care of the size problem. >>>>> >>>>> >>>>> Why not? Rle supports all the normal vector operations. And runsum or >>>> runmean will output a vector of the same size as the input, using a >>>> choice >>>> of two endrules. If you want 0's at the end, try something like: >>>> >>>> rle[(nrow(rd)-w+1):nrow(rd)]<- 0 >>>> >>>> >>>> >>>> So I did it like that: >>>>> >>>>> params<-RDApplyParams(rd,function(rd) >>>>> append((diff(c(0,cumsum(rd$score)),lag=w)/w),rep(0,each=w-1),aft er=(length(rd$score)-w+1))) >>>>> >>>>> >>>>> But when I try to add the new values to the rangedData object I got >>>>> these >>>>> error. >>>>> >>>>> values(rd)[,"windo"]<-rdapply(params) >>>>>> >>>>> Error in `[<-`(`*tmp*`, , j, value =<s4 object="" of="" class="" "dataframe"="">) : >>>>> ncol(x[j]) != ncol(value) >>>>> In addition: Warning messages: >>>>> 1: In mapply(f, ..., SIMPLIFY = FALSE) : >>>>> longer argument not a multiple of length of shorter >>>>> 2: In mapply(f, ..., SIMPLIFY = FALSE) : >>>>> longer argument not a multiple of length of shorter >>>>> >>>>> But when I check the size, they are the same, here for one space >>>>> >>>>> x<-rdapply(params) >>>>>> length(x$SCAFFOLD_100) == length(rd["SCAFFOLD_100"]$windo) >>>>>> >>>>> [1] TRUE >>>>> >>>>> >>>>> It may be that that type of insertion is unsupported. Why not just do >>>> something like: >>>> >>>> rd$window<- unlist(x) >>>> >>>> >>>> Maybe params miss a parameter or the way I try to update the rd object >>>>> is >>>>> wrong. Anyway form the rdapply output a vector could be created and so >>>>> a new >>>>> rd object with the new value column. >>>>> >>>>> yvan >>>>> >>>>> >>>>> On 22/04/10 15:51, Michael Lawrence wrote: >>>>> >>>>> >>>>> >>>>> On Thu, Apr 22, 2010 at 5:49 AM, Michael Dondrup< >>>>> Michael.Dondrup@uni.no>wrote: >>>>> >>>>> Hi, >>>>>> how about funtion rdapply (not lapply) which is for that? >>>>>> >>>>>> >>>>>> lapply() should apply per-space as well, basically providing a >>>>> short-cut >>>>> for the more complicated rdapply(). >>>>> >>>>> lapply(rd, function(x) sum(x$score)) >>>>>> >>>>> $chr1 >>>>> [1] 3 >>>>> >>>>> $chr2 >>>>> [1] 0 >>>>> >>>>> sapply() also works: >>>>> >>>>>> sapply(rd, function(x) sum(x$score)) >>>>>> >>>>> chr1 chr2 >>>>> 3 0 >>>>> >>>>> Another choice is tapply: >>>>> >>>>>> tapply(rd$score, space(rd), sum) >>>>>> >>>>> chr1 chr2 >>>>> 3 0 >>>>> >>>>> Michael >>>>> >>>>> The code below computes the sum score for each space in the >>>>> RangedData: >>>>> >>>>>> # taken from the examples mostly: >>>>>> >>>>>>> ranges<- IRanges(c(1,2,3),c(4,5,6)) >>>>>>> score<- c(2L, 0L, 1L) >>>>>>> rd<- RangedData(ranges, score, space = c("chr1","chr2","chr1")) >>>>>>> rd >>>>>>> >>>>>> RangedData with 3 rows and 1 value column across 2 spaces >>>>>> space ranges | score >>>>>> <character> <iranges> |<integer> >>>>>> 1 chr1 [1, 4] | 2 >>>>>> 2 chr1 [3, 6] | 1 >>>>>> 3 chr2 [2, 5] | 0 >>>>>> >>>>>>> params<- RDApplyParams(rd, function(rd) sum(score(rd))) >>>>>>> rdapply(params) >>>>>>> >>>>>> $chr1 >>>>>> [1] 3 >>>>>> >>>>>> $chr2 >>>>>> [1] 0 >>>>>> >>>>>> >>>>>> Cheers >>>>>> Michael >>>>>> >>>>>> Am Apr 22, 2010 um 1:57 PM schrieb Yvan: >>>>>> >>>>>> On 21/04/10 18:43, Michael Lawrence wrote: >>>>>>> >>>>>>>> >>>>>>>> On Wed, Apr 21, 2010 at 6:07 AM, Yvan<yvan.strahm@uni.no>>>>>>>> <mailto:yvan.strahm@uni.no>> wrote: >>>>>>>> >>>>>>>> Hello List, >>>>>>>> >>>>>>>> I am confused about how to loop over a rangedData object. >>>>>>>> I have this rangedData >>>>>>>> >>>>>>>> RangedData with 61 rows and 1 value column across 3 spaces >>>>>>>> space ranges | score >>>>>>>> <character> <iranges> |<numeric> >>>>>>>> 1 SCAFFOLD_1 [ 8, 8] | -0.09405 >>>>>>>> >>>>>>>> and the spaces are >>>>>>>> >>>>>>>> "SCAFFOLD_1" "SCAFFOLD_10" "SCAFFOLD_100" >>>>>>>> >>>>>>>> using aggregate it is possible to apply a function to one of the >>>>>>>> >>>>>>> space >>>>>> >>>>>>> aggregate(rd["SCAFFOLD_1"]$score, start = >>>>>>>> 1:(length(rd["SCAFFOLD_1"]$score)-w+1), width = w, FUN = sum) >>>>>>>> >>>>>>>> but how can I apply the aggregate to all space without a for loop >>>>>>>> ? >>>>>>>> >>>>>>>> >>>>>>>> It looks like you're attempting a running window sum of the score >>>>>>>> vector. There are more efficient ways of doing this besides >>>>>>>> aggregate(). If you convert the score into an Rle, you can use >>>>>>>> >>>>>>> runsum(). >>>>>> >>>>>>> Anyway, to do this over each space individually, use lapply(). >>>>>>>> >>>>>>>> This would come out to something like: >>>>>>>> >>>>>>>> values(rd)[,"smoothScore"]<- lapply(rd, function(x) >>>>>>>> runsum(Rle(x$score), w)) >>>>>>>> >>>>>>>> Probably not exactly right, but it gets you in the right >>>>>>>> direction... >>>>>>>> >>>>>>>> Michael >>>>>>>> >>>>>>>> >>>>>>>> Hello Michael, >>>>>>> >>>>>>> Thanks for the answer and the tip about runsum! >>>>>>> I try with lapply but could not get it working right, the main >>>>>>> problem >>>>>>> is that the runsum is calculated on all values and not for a each >>>>>>> specific spaces. >>>>>>> Sorry, I should have been more precise in the problem description. >>>>>>> The runsum should be calculated in a space specific manner, let say >>>>>>> w=2 >>>>>>> >>>>>>> space score cumsum >>>>>>> 1 space1 1 3 >>>>>>> 2 space1 2 4 >>>>>>> 3 space1 2 NA >>>>>>> 4 space2 10 21 >>>>>>> 5 space2 11 22 >>>>>>> 6 space2 11 NA >>>>>>> >>>>>>> Is it possible to do it with lapply? >>>>>>> Thanks again for your help >>>>>>> cheers, >>>>>>> yvan >>>>>>> >>>>>>> [[alternative HTML version deleted]] >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioconductor mailing list >>>>>>> Bioconductor@stat.math.ethz.ch >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>> Search the archives: >>>>>>> >>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor@stat.math.ethz.ch >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: >>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>> >>>>>> >>>>> >>>>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> > [[alternative HTML version deleted]]
ADD COMMENTlink written 9.4 years ago by Michael Lawrence11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 136 users visited in the last hour