IRanges: cbind not well defined for RangedData?

0

Entering edit mode

Michael Dondrup ▴ 550

@michael-dondrup-3849

Last seen 11.4 years ago

Hi, here is another little possible glitch with RangedData and cbind(), actually would like to propose to change or expand the behavior of the cbind function or to add to it's documentation. The use-case is as follows: Assume we have some chromosomal Ranges in a RangedData object. Then we can iteratively compute statistics on these ranges and attach them to the DataFrame holding extra data, e.g. some count data or combine qualitiy scores possibly from multiple conditions. So according to the documentation of the RangedData-class, > The first mode treats the object as a contiguous "data frame" annotated with range information. >The accessors start, end, and width get the corresponding fields in the ranges as atomic integer vectors, undoing > the division over the spaces. The [[ > and matrix-style [, extraction and subsetting functions unroll the data in the same way. [[<- does the inverse. I assume I could use cbind(rd, a.value) to attach the statistics to the internal data representation. So would it be possible to make cbind return something more useful, or are there better ways to do it? Best Michael Example: > a.value = rnorm(4) > rd1 = RangedData(ranges=IRanges(start=runif(4, min=1, max=10E8), width=runif(4, min=1, max=10E5), names=paste("bla",1:4)), space=1:2) > rd1 RangedData with 4 rows and 0 value columns across 2 spaces space ranges | <character> <iranges> | bla 1 1 [773679042, 774010137] | bla 3 1 [194819013, 195136171] | bla 2 2 [183105318, 183509803] | bla 4 2 [107730452, 107823748] | > obj = cbind(rd1, a.value) And I would intuitively assume the result to look exactly like this: > RangedData(ranges=IRanges(start=runif(4, min=1, max=10E8), width=runif(4, min=1, max=10E5), names=paste("bla",1:4)), space=1:2, a.value) RangedData with 4 rows and 1 value column across 2 spaces space ranges | a.value <character> <iranges> | <numeric> bla 1 1 [473042533, 473820859] | -1.7956588 bla 3 1 [ 75991383, 76022516] | 0.3588571 bla 2 2 [475385363, 476224756] | 1.4166218 bla 4 2 [532603052, 532902678] | 0.2324424 But what I get is much different: > class(obj) [1] "matrix" > typeof(obj) [1] "list" > obj rd1 a.value [1,] ? 0.3255676 [2,] ? 0.5913471 [3,] ? 0.9317755 [4,] ? -0.8897527 > sessionInfo() R version 2.10.1 (2009-12-14) x86_64-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] IRanges_1.4.9 loaded via a namespace (and not attached): [1] tools_2.10.1

• 2.1k views

ADD COMMENT • link updated 15.8 years ago by Michael Lawrence ★ 11k • written 15.8 years ago by Michael Dondrup ▴ 550

0

Entering edit mode

Michael Lawrence ★ 11k

@michael-lawrence-3846

Last seen 4.1 years ago

United States

On Thu, Mar 18, 2010 at 7:55 AM, Michael Dondrup <michael.dondrup@uni.no>wrote: > Hi, > here is another little possible glitch with RangedData and cbind(), > actually would like to propose to > change or expand the behavior of the cbind function or to add to it's > documentation. The use-case is as > follows: > Assume we have some chromosomal Ranges in a RangedData object. Then we can > iteratively compute statistics on > these ranges and attach them to the DataFrame holding extra data, e.g. some > count data or combine qualitiy scores possibly from multiple conditions. > > So according to the documentation of the RangedData-class, > > The first mode treats the object as a contiguous "data frame" annotated > with range information. > >The accessors start, end, and width get the corresponding fields in the > ranges as atomic integer vectors, undoing > > the division over the spaces. The [[ > and matrix-style [, extraction and > subsetting functions unroll the data in the same way. [[<- does the inverse. > I assume I could use cbind(rd, a.value) to attach the statistics to the > internal data representation. So would it be possible to > make cbind return something more useful, or are there better ways to do it? > > > Right now it's just using the cbind method for "ANY", because one does not exist for RangedData. To be honest, I've always just used the $<- syntax for adding the statistics. This seems like it would work well in your use case, as well. Like: rd$a.value <- a.value Michael > Best > Michael > > > Example: > > > a.value = rnorm(4) > > rd1 = RangedData(ranges=IRanges(start=runif(4, min=1, max=10E8), > width=runif(4, min=1, max=10E5), names=paste("bla",1:4)), space=1:2) > > rd1 > RangedData with 4 rows and 0 value columns across 2 spaces > space ranges | > <character> <iranges> | > bla 1 1 [773679042, 774010137] | > bla 3 1 [194819013, 195136171] | > bla 2 2 [183105318, 183509803] | > bla 4 2 [107730452, 107823748] | > > > obj = cbind(rd1, a.value) > > And I would intuitively assume the result to look exactly like this: > > > RangedData(ranges=IRanges(start=runif(4, min=1, max=10E8), width=runif(4, > min=1, max=10E5), names=paste("bla",1:4)), space=1:2, a.value) > RangedData with 4 rows and 1 value column across 2 spaces > space ranges | a.value > <character> <iranges> | <numeric> > bla 1 1 [473042533, 473820859] | -1.7956588 > bla 3 1 [ 75991383, 76022516] | 0.3588571 > bla 2 2 [475385363, 476224756] | 1.4166218 > bla 4 2 [532603052, 532902678] | 0.2324424 > > But what I get is much different: > > > class(obj) > [1] "matrix" > > typeof(obj) > [1] "list" > > > obj > rd1 a.value > [1,] ? 0.3255676 > [2,] ? 0.5913471 > [3,] ? 0.9317755 > [4,] ? -0.8897527 > > > sessionInfo() > R version 2.10.1 (2009-12-14) > x86_64-apple-darwin9.8.0 > > locale: > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] IRanges_1.4.9 > > loaded via a namespace (and not attached): > [1] tools_2.10.1 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 15.8 years ago Michael Lawrence ★ 11k

0

Entering edit mode

I have been experimenting with S4 dispatch on ... (optional arguments) and reading the man page for dotMethods > help(dotsMethods) Long story short, adding support for cbind-ing a vector to an S4 object would probably involve either 1) creating an S4 class union of an S4 class (e.g. RangedData) with vector so the existing S4 dispatch would choose the correct method or 2) creating an S4 default method for cbind that has it own dispatch mechanism for choosing a cbind method. I don't find either of these options appealing and second Michael Lawrence's suggestion of using "$<-" or "[[<-" to bind new columns to a RangedData object. > a.value <- rnorm(4) > rd1 <- RangedData(ranges=IRanges(start=runif(4, min=1, max=10E8), width=runif(4, min=1, max=10E5), names=paste("bla",1:4)), space=1:2) > obj <- cbind(rd1, a.value) > showMethods("cbind") Function: cbind (package IRanges) ...="ANY" ...="DataFrame" ...="DataFrameList" ...="DataTable" ...="numeric#RangedData" (inherited from: ...="ANY") > df1 <- unlist(values(rd1)) > class(df1) [1] "DataFrame" attr(,"package") [1] "IRanges" > cbind(df1, a.value) df1 a.value [1,] ? -0.6268173 [2,] ? 2.540871 [3,] ? 0.4137926 [4,] ? -0.897856 > showMethods("cbind") Function: cbind (package IRanges) ...="ANY" ...="DataFrame" ...="DataFrame#numeric" (inherited from: ...="ANY") ...="DataFrameList" ...="DataTable" ...="numeric#RangedData" (inherited from: ...="ANY") > sessionInfo() R version 2.11.0 Under development (unstable) (2010-03-14 r51276) i386-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] IRanges_1.5.64 On 3/18/10 10:32 AM, Michael Lawrence wrote: > On Thu, Mar 18, 2010 at 7:55 AM, Michael Dondrup<michael.dondrup at="" uni.no="">wrote: > > >> Hi, >> here is another little possible glitch with RangedData and cbind(), >> actually would like to propose to >> change or expand the behavior of the cbind function or to add to it's >> documentation. The use-case is as >> follows: >> Assume we have some chromosomal Ranges in a RangedData object. Then we can >> iteratively compute statistics on >> these ranges and attach them to the DataFrame holding extra data, e.g. some >> count data or combine qualitiy scores possibly from multiple conditions. >> >> So according to the documentation of the RangedData-class, >> >>> The first mode treats the object as a contiguous "data frame" annotated >>> >> with range information. >> >>> The accessors start, end, and width get the corresponding fields in the >>> >> ranges as atomic integer vectors, undoing >> >>> the division over the spaces. The [[> and matrix-style [, extraction and >>> >> subsetting functions unroll the data in the same way. [[<- does the inverse. >> I assume I could use cbind(rd, a.value) to attach the statistics to the >> internal data representation. So would it be possible to >> make cbind return something more useful, or are there better ways to do it? >> >> >> >> > Right now it's just using the cbind method for "ANY", because one does not > exist for RangedData. To be honest, I've always just used the $<- syntax for > adding the statistics. This seems like it would work well in your use case, > as well. > > Like: > > rd$a.value<- a.value > > Michael > > > > >> Best >> Michael >> >> >> Example: >> >> >>> a.value = rnorm(4) >>> rd1 = RangedData(ranges=IRanges(start=runif(4, min=1, max=10E8), >>> >> width=runif(4, min=1, max=10E5), names=paste("bla",1:4)), space=1:2) >> >>> rd1 >>> >> RangedData with 4 rows and 0 value columns across 2 spaces >> space ranges | >> <character> <iranges> | >> bla 1 1 [773679042, 774010137] | >> bla 3 1 [194819013, 195136171] | >> bla 2 2 [183105318, 183509803] | >> bla 4 2 [107730452, 107823748] | >> >> >>> obj = cbind(rd1, a.value) >>> >> And I would intuitively assume the result to look exactly like this: >> >> >>> RangedData(ranges=IRanges(start=runif(4, min=1, max=10E8), width=runif(4, >>> >> min=1, max=10E5), names=paste("bla",1:4)), space=1:2, a.value) >> RangedData with 4 rows and 1 value column across 2 spaces >> space ranges | a.value >> <character> <iranges> |<numeric> >> bla 1 1 [473042533, 473820859] | -1.7956588 >> bla 3 1 [ 75991383, 76022516] | 0.3588571 >> bla 2 2 [475385363, 476224756] | 1.4166218 >> bla 4 2 [532603052, 532902678] | 0.2324424 >> >> But what I get is much different: >> >> >>> class(obj) >>> >> [1] "matrix" >> >>> typeof(obj) >>> >> [1] "list" >> >> >>> obj >>> >> rd1 a.value >> [1,] ? 0.3255676 >> [2,] ? 0.5913471 >> [3,] ? 0.9317755 >> [4,] ? -0.8897527 >> >> >>> sessionInfo() >>> >> R version 2.10.1 (2009-12-14) >> x86_64-apple-darwin9.8.0 >> >> locale: >> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] IRanges_1.4.9 >> >> loaded via a namespace (and not attached): >> [1] tools_2.10.1 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 15.8 years ago Patrick Aboyoun ★ 1.6k

0

Entering edit mode

Dear Patrick and Michael, thank you very much for your helpful support on my last two connected issued! It is somehow in the documentation in the examples but I must have overlooked it. I tried it out immediately, and it works fine: > rd = RangedData(IRanges(start=1:4, width=10, names=paste("a",1:4)), space=1:2 ) > rd > rd$a.value = rnorm(4) > rd RangedData with 4 rows and 1 value column across 2 spaces space ranges | a.value <character> <iranges> | <numeric> 1 1 [1, 10] | -0.6765515 2 1 [3, 12] | 1.5406962 3 2 [2, 11] | -1.2599696 4 2 [4, 13] | 0.4971178 But then I had to reboot my computer because by accident tried this on a 100,000 ranges and the value was actually a list, not a vector, and then the re- cycling rule struck me: > rd$a.list = as.list(1:4) first everything seems fine and normal but if you try to print it: > rd RangedData with 4 rows and 1 value column across 2 spaces Error in .Method(..., deparse.level = deparse.level) : number of rows of matrices must match (see arg 2) or try to convert into a data.frame: > as.data.frame(rd) space start end width names a.list.1L a.list.2L a.list.3L a.list.4L 1 1 1 10 10 a 1 1 2 3 4 2 1 3 12 10 a 3 1 2 3 4 3 2 2 11 10 a 2 1 2 3 4 4 2 4 13 10 a 4 1 2 3 4 as I tried this, I R ran into some memory problems. This just as a warning, to make sure you really use a vector here. Maybe something to put in the type checking, or documentation? Anyway, thanks a lot again Michael Am Mar 18, 2010 um 6:55 PM schrieb Patrick Aboyoun: > I have been experimenting with S4 dispatch on ... (optional arguments) and reading the man page for dotMethods > > > help(dotsMethods) > > Long story short, adding support for cbind-ing a vector to an S4 object would probably involve either > > 1) creating an S4 class union of an S4 class (e.g. RangedData) with vector so the existing S4 dispatch would choose the correct method or > 2) creating an S4 default method for cbind that has it own dispatch mechanism for choosing a cbind method. > > I don't find either of these options appealing and second Michael Lawrence's suggestion of using "$<-" or "[[<-" to bind new columns to a RangedData object. > > > a.value <- rnorm(4) > > rd1 <- RangedData(ranges=IRanges(start=runif(4, min=1, max=10E8), width=runif(4, min=1, max=10E5), names=paste("bla",1:4)), space=1:2) > > obj <- cbind(rd1, a.value) > > showMethods("cbind") > Function: cbind (package IRanges) > ...="ANY" > ...="DataFrame" > ...="DataFrameList" > ...="DataTable" > ...="numeric#RangedData" > (inherited from: ...="ANY") > > > df1 <- unlist(values(rd1)) > > class(df1) > [1] "DataFrame" > attr(,"package") > [1] "IRanges" > > cbind(df1, a.value) > df1 a.value > [1,] ? -0.6268173 > [2,] ? 2.540871 > [3,] ? 0.4137926 > [4,] ? -0.897856 > > showMethods("cbind") > Function: cbind (package IRanges) > ...="ANY" > ...="DataFrame" > ...="DataFrame#numeric" > (inherited from: ...="ANY") > ...="DataFrameList" > ...="DataTable" > ...="numeric#RangedData" > (inherited from: ...="ANY") > > > sessionInfo() > R version 2.11.0 Under development (unstable) (2010-03-14 r51276) > i386-apple-darwin9.8.0 > > locale: > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] IRanges_1.5.64 > > > On 3/18/10 10:32 AM, Michael Lawrence wrote: >> On Thu, Mar 18, 2010 at 7:55 AM, Michael Dondrup<michael.dondrup at="" uni.no="">wrote: >> >> >>> Hi, >>> here is another little possible glitch with RangedData and cbind(), >>> actually would like to propose to >>> change or expand the behavior of the cbind function or to add to it's >>> documentation. The use-case is as >>> follows: >>> Assume we have some chromosomal Ranges in a RangedData object. Then we can >>> iteratively compute statistics on >>> these ranges and attach them to the DataFrame holding extra data, e.g. some >>> count data or combine qualitiy scores possibly from multiple conditions. >>> >>> So according to the documentation of the RangedData-class, >>> >>>> The first mode treats the object as a contiguous "data frame" annotated >>>> >>> with range information. >>> >>>> The accessors start, end, and width get the corresponding fields in the >>>> >>> ranges as atomic integer vectors, undoing >>> >>>> the division over the spaces. The [[> and matrix-style [, extraction and >>>> >>> subsetting functions unroll the data in the same way. [[<- does the inverse. >>> I assume I could use cbind(rd, a.value) to attach the statistics to the >>> internal data representation. So would it be possible to >>> make cbind return something more useful, or are there better ways to do it? >>> >>> >>> >>> >> Right now it's just using the cbind method for "ANY", because one does not >> exist for RangedData. To be honest, I've always just used the $<- syntax for >> adding the statistics. This seems like it would work well in your use case, >> as well. >> >> Like: >> >> rd$a.value<- a.value >> >> Michael >> >> >> >> >>> Best >>> Michael >>> >>> >>> Example: >>> >>> >>>> a.value = rnorm(4) >>>> rd1 = RangedData(ranges=IRanges(start=runif(4, min=1, max=10E8), >>>> >>> width=runif(4, min=1, max=10E5), names=paste("bla",1:4)), space=1:2) >>> >>>> rd1 >>>> >>> RangedData with 4 rows and 0 value columns across 2 spaces >>> space ranges | >>> <character> <iranges> | >>> bla 1 1 [773679042, 774010137] | >>> bla 3 1 [194819013, 195136171] | >>> bla 2 2 [183105318, 183509803] | >>> bla 4 2 [107730452, 107823748] | >>> >>> >>>> obj = cbind(rd1, a.value) >>>> >>> And I would intuitively assume the result to look exactly like this: >>> >>> >>>> RangedData(ranges=IRanges(start=runif(4, min=1, max=10E8), width=runif(4, >>>> >>> min=1, max=10E5), names=paste("bla",1:4)), space=1:2, a.value) >>> RangedData with 4 rows and 1 value column across 2 spaces >>> space ranges | a.value >>> <character> <iranges> |<numeric> >>> bla 1 1 [473042533, 473820859] | -1.7956588 >>> bla 3 1 [ 75991383, 76022516] | 0.3588571 >>> bla 2 2 [475385363, 476224756] | 1.4166218 >>> bla 4 2 [532603052, 532902678] | 0.2324424 >>> >>> But what I get is much different: >>> >>> >>>> class(obj) >>>> >>> [1] "matrix" >>> >>>> typeof(obj) >>>> >>> [1] "list" >>> >>> >>>> obj >>>> >>> rd1 a.value >>> [1,] ? 0.3255676 >>> [2,] ? 0.5913471 >>> [3,] ? 0.9317755 >>> [4,] ? -0.8897527 >>> >>> >>>> sessionInfo() >>>> >>> R version 2.10.1 (2009-12-14) >>> x86_64-apple-darwin9.8.0 >>> >>> locale: >>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] IRanges_1.4.9 >>> >>> loaded via a namespace (and not attached): >>> [1] tools_2.10.1 >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >

ADD REPLY • link 15.8 years ago Michael Dondrup ▴ 550

0

Entering edit mode

Michael, Thanks for the report. RangedData objects have been designed to hold list objects in the values columns. You did, however, find a bug the printing of a RangedData object when it contains a list column. I fixed the show method in both BioC 2.5 IRanges (>= 1.4.16) and BioC 2.6 IRanges (>= 1.5.66) to handle this case. > rd <- RangedData(IRanges(start=1:4, width=10, names=paste("a",1:4)), space=1:2 ) > rd$a.value <- rnorm(4) > rd$a.list <- as.list(1:4) > rd RangedData with 4 rows and 2 value columns across 2 spaces space ranges | a.value a.list <character> <iranges> | <numeric> <list> a 1 1 [1, 10] | 0.5362468 ######## a 3 1 [3, 12] | 0.5459593 ######## a 2 2 [2, 11] | 0.4705777 ######## a 4 2 [4, 13] | 0.4160833 ######## As you noticed, a list column in a RangedData object will result in column expansion if you convert it to a data.frame, which can lead to large data object is the number of rows in a RangedData object is large. Since the show method prints out the classes of each of the columns, the user will be able to check to ensure their data columns are stored correctly prior to any conversion to a data.frame. > as.data.frame(rd) space start end width names a.value a.list.1L a.list.2L a.list.3L a.list.4L 1 1 1 10 10 a 1 0.5362468 1 2 3 4 2 1 3 12 10 a 3 0.5459593 1 2 3 4 3 2 2 11 10 a 2 0.4705777 1 2 3 4 4 2 4 13 10 a 4 0.4160833 1 2 3 4 Patrick On 3/19/10 7:23 AM, Michael Dondrup wrote: > Dear Patrick and Michael, > > thank you very much for your helpful support on my last two connected issued! It is somehow in > the documentation in the examples but I must have overlooked it. > > I tried it out immediately, and it works fine: > > >> rd = RangedData(IRanges(start=1:4, width=10, names=paste("a",1:4)), space=1:2 ) >> rd >> rd$a.value = rnorm(4) >> rd >> > RangedData with 4 rows and 1 value column across 2 spaces > space ranges | a.value > <character> <iranges> |<numeric> > 1 1 [1, 10] | -0.6765515 > 2 1 [3, 12] | 1.5406962 > 3 2 [2, 11] | -1.2599696 > 4 2 [4, 13] | 0.4971178 > > But then I had to reboot my computer because by accident tried this on a 100,000 ranges > and the value was actually a list, not a vector, and then the re- cycling rule struck me: > > >> rd$a.list = as.list(1:4) >> > first everything seems fine and normal but if you try to print it: > >> rd >> > RangedData with 4 rows and 1 value column across 2 spaces > Error in .Method(..., deparse.level = deparse.level) : > number of rows of matrices must match (see arg 2) > or try to convert into a data.frame: > >> as.data.frame(rd) >> > space start end width names a.list.1L a.list.2L a.list.3L a.list.4L > 1 1 1 10 10 a 1 1 2 3 4 > 2 1 3 12 10 a 3 1 2 3 4 > 3 2 2 11 10 a 2 1 2 3 4 > 4 2 4 13 10 a 4 1 2 3 4 > > as I tried this, I R ran into some memory problems. > > This just as a warning, to make sure you really use a vector here. Maybe something to put in the > type checking, or documentation? > > Anyway, thanks a lot again > Michael > >

ADD REPLY • link 15.8 years ago Patrick Aboyoun ★ 1.6k

0

Entering edit mode

On Fri, Mar 19, 2010 at 12:59 PM, Patrick Aboyoun <paboyoun@fhcrc.org>wrote: > Michael, > Thanks for the report. RangedData objects have been designed to hold list > objects in the values columns. You did, however, find a bug the printing of > a RangedData object when it contains a list column. I fixed the show method > in both BioC 2.5 IRanges (>= 1.4.16) and BioC 2.6 IRanges (>= 1.5.66) to > handle this case. > > > rd <- RangedData(IRanges(start=1:4, width=10, names=paste("a",1:4)), > space=1:2 ) > > rd$a.value <- rnorm(4) > > rd$a.list <- as.list(1:4) > > rd > RangedData with 4 rows and 2 value columns across 2 spaces > space ranges | a.value a.list > <character> <iranges> | <numeric> <list> > a 1 1 [1, 10] | 0.5362468 ######## > a 3 1 [3, 12] | 0.5459593 ######## > a 2 2 [2, 11] | 0.4705777 ######## > a 4 2 [4, 13] | 0.4160833 ######## > > Thanks for doing this Patrick, but what's the deal with the #'s? I mean, how about "1, 2, 3, 4" instead? That's how data.frame prints it. > As you noticed, a list column in a RangedData object will result in column > expansion if you convert it to a data.frame, which can lead to large data > object is the number of rows in a RangedData object is large. Does this make sense? data.frame can handle list columns. data(mtcars) mtcars$a.list <- list(1:4) > Since the show method prints out the classes of each of the columns, the > user will be able to check to ensure their data columns are stored correctly > prior to any conversion to a data.frame. > > > as.data.frame(rd) > space start end width names a.value a.list.1L a.list.2L a.list.3L > a.list.4L > 1 1 1 10 10 a 1 0.5362468 1 2 3 > 4 > 2 1 3 12 10 a 3 0.5459593 1 2 3 > 4 > 3 2 2 11 10 a 2 0.4705777 1 2 3 > 4 > 4 2 4 13 10 a 4 0.4160833 1 2 3 > 4 > > > > Patrick > > > On 3/19/10 7:23 AM, Michael Dondrup wrote: > >> Dear Patrick and Michael, >> >> thank you very much for your helpful support on my last two connected >> issued! It is somehow in >> the documentation in the examples but I must have overlooked it. >> >> I tried it out immediately, and it works fine: >> >> >> >>> rd = RangedData(IRanges(start=1:4, width=10, names=paste("a",1:4)), >>> space=1:2 ) >>> rd >>> rd$a.value = rnorm(4) >>> rd >>> >>> >> RangedData with 4 rows and 1 value column across 2 spaces >> space ranges | a.value >> <character> <iranges> |<numeric> >> 1 1 [1, 10] | -0.6765515 >> 2 1 [3, 12] | 1.5406962 >> 3 2 [2, 11] | -1.2599696 >> 4 2 [4, 13] | 0.4971178 >> >> But then I had to reboot my computer because by accident tried this on a >> 100,000 ranges >> and the value was actually a list, not a vector, and then the re- cycling >> rule struck me: >> >> >> >>> rd$a.list = as.list(1:4) >>> >>> >> first everything seems fine and normal but if you try to print it: >> >> >>> rd >>> >>> >> RangedData with 4 rows and 1 value column across 2 spaces >> Error in .Method(..., deparse.level = deparse.level) : >> number of rows of matrices must match (see arg 2) >> or try to convert into a data.frame: >> >> >>> as.data.frame(rd) >>> >>> >> space start end width names a.list.1L a.list.2L a.list.3L a.list.4L >> 1 1 1 10 10 a 1 1 2 3 4 >> 2 1 3 12 10 a 3 1 2 3 4 >> 3 2 2 11 10 a 2 1 2 3 4 >> 4 2 4 13 10 a 4 1 2 3 4 >> >> as I tried this, I R ran into some memory problems. >> >> This just as a warning, to make sure you really use a vector here. Maybe >> something to put in the >> type checking, or documentation? >> >> Anyway, thanks a lot again >> Michael >> >> >> > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 15.8 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Michael L., Given that we have IntegerList objects to store lists of integers, I am not inclined to build logic for printing a list column in a DataTable. To change the current behavior, the relevant method to work on is showAsCell,list-method. The conversion of a DataTable to a data.frame when the DataTable contains some non atomic columns is a bit dicey. I'm not sure that a data.frame truly supports list columns or it was something grandfathered since data.frame inherits from list. For example the data.frame constructor converts list inputs to multiple columns: > data.frame(x = 1:4, y = as.list(2:5)) x y.2L y.3L y.4L y.5L 1 1 2 3 4 5 2 2 2 3 4 5 3 3 2 3 4 5 4 4 2 3 4 5 We can circumvent this behavior by decorating a list object with the necessary data.frame attributes, but I'm not sure how many methods will be able to handle a data.frame with a list column properly. Patrick On 3/19/10 3:46 PM, Michael Lawrence wrote: > > > On Fri, Mar 19, 2010 at 12:59 PM, Patrick Aboyoun <paboyoun@fhcrc.org> <mailto:paboyoun@fhcrc.org>> wrote: > > Michael, > Thanks for the report. RangedData objects have been designed to > hold list objects in the values columns. You did, however, find a > bug the printing of a RangedData object when it contains a list > column. I fixed the show method in both BioC 2.5 IRanges (>= > 1.4.16) and BioC 2.6 IRanges (>= 1.5.66) to handle this case. > > > rd <- RangedData(IRanges(start=1:4, width=10, > names=paste("a",1:4)), space=1:2 ) > > rd$a.value <- rnorm(4) > > rd$a.list <- as.list(1:4) > > rd > RangedData with 4 rows and 2 value columns across 2 spaces > space ranges | a.value a.list > <character> <iranges> | <numeric> <list> > a 1 1 [1, 10] | 0.5362468 ######## > a 3 1 [3, 12] | 0.5459593 ######## > a 2 2 [2, 11] | 0.4705777 ######## > a 4 2 [4, 13] | 0.4160833 ######## > > > Thanks for doing this Patrick, but what's the deal with the #'s? I > mean, how about "1, 2, 3, 4" instead? That's how data.frame prints it. > > As you noticed, a list column in a RangedData object will result > in column expansion if you convert it to a data.frame, which can > lead to large data object is the number of rows in a RangedData > object is large. > > > Does this make sense? data.frame can handle list columns. > > data(mtcars) > mtcars$a.list <- list(1:4) > > Since the show method prints out the classes of each of the > columns, the user will be able to check to ensure their data > columns are stored correctly prior to any conversion to a data.frame. > > > as.data.frame(rd) > space start end width names a.value a.list.1L a.list.2L > a.list.3L a.list.4L > 1 1 1 10 10 a 1 0.5362468 1 2 > 3 4 > 2 1 3 12 10 a 3 0.5459593 1 2 > 3 4 > 3 2 2 11 10 a 2 0.4705777 1 2 > 3 4 > 4 2 4 13 10 a 4 0.4160833 1 2 > 3 4 > > > > Patrick > > > On 3/19/10 7:23 AM, Michael Dondrup wrote: > > Dear Patrick and Michael, > > thank you very much for your helpful support on my last two > connected issued! It is somehow in > the documentation in the examples but I must have overlooked it. > > I tried it out immediately, and it works fine: > > > rd = RangedData(IRanges(start=1:4, width=10, > names=paste("a",1:4)), space=1:2 ) > rd > rd$a.value = rnorm(4) > rd > > RangedData with 4 rows and 1 value column across 2 spaces > space ranges | a.value > <character> <iranges> |<numeric> > 1 1 [1, 10] | -0.6765515 > 2 1 [3, 12] | 1.5406962 > 3 2 [2, 11] | -1.2599696 > 4 2 [4, 13] | 0.4971178 > > But then I had to reboot my computer because by accident tried > this on a 100,000 ranges > and the value was actually a list, not a vector, and then the > re-cycling rule struck me: > > > rd$a.list = as.list(1:4) > > first everything seems fine and normal but if you try to print it: > > rd > > RangedData with 4 rows and 1 value column across 2 spaces > Error in .Method(..., deparse.level = deparse.level) : > number of rows of matrices must match (see arg 2) > or try to convert into a data.frame: > > as.data.frame(rd) > > space start end width names a.list.1L a.list.2L a.list.3L > a.list.4L > 1 1 1 10 10 a 1 1 2 3 > 4 > 2 1 3 12 10 a 3 1 2 3 > 4 > 3 2 2 11 10 a 2 1 2 3 > 4 > 4 2 4 13 10 a 4 1 2 3 > 4 > > as I tried this, I R ran into some memory problems. > > This just as a warning, to make sure you really use a vector > here. Maybe something to put in the > type checking, or documentation? > > Anyway, thanks a lot again > Michael > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch <mailto:bioconductor@stat.math.ethz.ch> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > [[alternative HTML version deleted]]

ADD REPLY • link 15.8 years ago Patrick Aboyoun ★ 1.6k

0

Entering edit mode

I've done some testing for as.data.frame on a RangedData object and found that the existing coercion methodology was producing incorrect results in certain circumstances when there was a list, SimpleList or CompressList data column due to vector recycling. For now, as.data.frame for a RangedData object will throw an error if it contains a list, SimpleList, or CompressedList data column. If there is demand for as.data.frame supporting list columns, we can take another look at this issue. Thanks, Patrick On 3/19/10 5:14 PM, Patrick Aboyoun wrote: > Michael L., > Given that we have IntegerList objects to store lists of integers, I am > not inclined to build logic for printing a list column in a DataTable. > To change the current behavior, the relevant method to work on is > showAsCell,list-method. > > The conversion of a DataTable to a data.frame when the DataTable > contains some non atomic columns is a bit dicey. I'm not sure that a > data.frame truly supports list columns or it was something grandfathered > since data.frame inherits from list. For example the data.frame > constructor converts list inputs to multiple columns: > > > data.frame(x = 1:4, y = as.list(2:5)) > x y.2L y.3L y.4L y.5L > 1 1 2 3 4 5 > 2 2 2 3 4 5 > 3 3 2 3 4 5 > 4 4 2 3 4 5 > > We can circumvent this behavior by decorating a list object with the > necessary data.frame attributes, but I'm not sure how many methods will > be able to handle a data.frame with a list column properly. > > > Patrick > > > On 3/19/10 3:46 PM, Michael Lawrence wrote: > >> >> On Fri, Mar 19, 2010 at 12:59 PM, Patrick Aboyoun<paboyoun at="" fhcrc.org="">> <mailto:paboyoun at="" fhcrc.org="">> wrote: >> >> Michael, >> Thanks for the report. RangedData objects have been designed to >> hold list objects in the values columns. You did, however, find a >> bug the printing of a RangedData object when it contains a list >> column. I fixed the show method in both BioC 2.5 IRanges (>= >> 1.4.16) and BioC 2.6 IRanges (>= 1.5.66) to handle this case. >> >> > rd<- RangedData(IRanges(start=1:4, width=10, >> names=paste("a",1:4)), space=1:2 ) >> > rd$a.value<- rnorm(4) >> > rd$a.list<- as.list(1:4) >> > rd >> RangedData with 4 rows and 2 value columns across 2 spaces >> space ranges | a.value a.list >> <character> <iranges> |<numeric> <list> >> a 1 1 [1, 10] | 0.5362468 ######## >> a 3 1 [3, 12] | 0.5459593 ######## >> a 2 2 [2, 11] | 0.4705777 ######## >> a 4 2 [4, 13] | 0.4160833 ######## >> >> >> Thanks for doing this Patrick, but what's the deal with the #'s? I >> mean, how about "1, 2, 3, 4" instead? That's how data.frame prints it. >> >> As you noticed, a list column in a RangedData object will result >> in column expansion if you convert it to a data.frame, which can >> lead to large data object is the number of rows in a RangedData >> object is large. >> >> >> Does this make sense? data.frame can handle list columns. >> >> data(mtcars) >> mtcars$a.list<- list(1:4) >> >> Since the show method prints out the classes of each of the >> columns, the user will be able to check to ensure their data >> columns are stored correctly prior to any conversion to a data.frame. >> >> > as.data.frame(rd) >> space start end width names a.value a.list.1L a.list.2L >> a.list.3L a.list.4L >> 1 1 1 10 10 a 1 0.5362468 1 2 >> 3 4 >> 2 1 3 12 10 a 3 0.5459593 1 2 >> 3 4 >> 3 2 2 11 10 a 2 0.4705777 1 2 >> 3 4 >> 4 2 4 13 10 a 4 0.4160833 1 2 >> 3 4 >> >> >> >> Patrick >> >> >> On 3/19/10 7:23 AM, Michael Dondrup wrote: >> >> Dear Patrick and Michael, >> >> thank you very much for your helpful support on my last two >> connected issued! It is somehow in >> the documentation in the examples but I must have overlooked it. >> >> I tried it out immediately, and it works fine: >> >> >> rd = RangedData(IRanges(start=1:4, width=10, >> names=paste("a",1:4)), space=1:2 ) >> rd >> rd$a.value = rnorm(4) >> rd >> >> RangedData with 4 rows and 1 value column across 2 spaces >> space ranges | a.value >> <character> <iranges> |<numeric> >> 1 1 [1, 10] | -0.6765515 >> 2 1 [3, 12] | 1.5406962 >> 3 2 [2, 11] | -1.2599696 >> 4 2 [4, 13] | 0.4971178 >> >> But then I had to reboot my computer because by accident tried >> this on a 100,000 ranges >> and the value was actually a list, not a vector, and then the >> re-cycling rule struck me: >> >> >> rd$a.list = as.list(1:4) >> >> first everything seems fine and normal but if you try to print it: >> >> rd >> >> RangedData with 4 rows and 1 value column across 2 spaces >> Error in .Method(..., deparse.level = deparse.level) : >> number of rows of matrices must match (see arg 2) >> or try to convert into a data.frame: >> >> as.data.frame(rd) >> >> space start end width names a.list.1L a.list.2L a.list.3L >> a.list.4L >> 1 1 1 10 10 a 1 1 2 3 >> 4 >> 2 1 3 12 10 a 3 1 2 3 >> 4 >> 3 2 2 11 10 a 2 1 2 3 >> 4 >> 4 2 4 13 10 a 4 1 2 3 >> 4 >> >> as I tried this, I R ran into some memory problems. >> >> This just as a warning, to make sure you really use a vector >> here. Maybe something to put in the >> type checking, or documentation? >> >> Anyway, thanks a lot again >> Michael >> >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch<mailto:bioconductor at="" stat.math.ethz.ch=""> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 15.8 years ago Patrick Aboyoun ★ 1.6k

0

Entering edit mode

On Fri, Mar 19, 2010 at 7:28 PM, Patrick Aboyoun <paboyoun@fhcrc.org> wrote: > I've done some testing for as.data.frame on a RangedData object and found > that the existing coercion methodology was producing incorrect results in > certain circumstances when there was a list, SimpleList or CompressList data > column due to vector recycling. For now, as.data.frame for a RangedData > object will throw an error if it contains a list, SimpleList, or > CompressedList data column. If there is demand for as.data.frame supporting > list columns, we can take another look at this issue. > > > That's fine, but when it comes to the data.frame() constructor, all arguments are documented to be passed to as.data.frame(), which will lead to separate columns. You can use I() to get around this, i.e. data.frame(x = 1:4, y = I(rep(list(as.list(2:5)), 4))) Need an extra level of nesting, due to the way as.data.frame.AsIs works. Thanks, > Patrick > > > > On 3/19/10 5:14 PM, Patrick Aboyoun wrote: > >> Michael L., >> Given that we have IntegerList objects to store lists of integers, I am >> not inclined to build logic for printing a list column in a DataTable. >> To change the current behavior, the relevant method to work on is >> showAsCell,list-method. >> >> The conversion of a DataTable to a data.frame when the DataTable >> contains some non atomic columns is a bit dicey. I'm not sure that a >> data.frame truly supports list columns or it was something grandfathered >> since data.frame inherits from list. For example the data.frame >> constructor converts list inputs to multiple columns: >> >> > data.frame(x = 1:4, y = as.list(2:5)) >> x y.2L y.3L y.4L y.5L >> 1 1 2 3 4 5 >> 2 2 2 3 4 5 >> 3 3 2 3 4 5 >> 4 4 2 3 4 5 >> >> We can circumvent this behavior by decorating a list object with the >> necessary data.frame attributes, but I'm not sure how many methods will >> be able to handle a data.frame with a list column properly. >> >> >> Patrick >> >> >> On 3/19/10 3:46 PM, Michael Lawrence wrote: >> >> >>> >>> On Fri, Mar 19, 2010 at 12:59 PM, Patrick Aboyoun<paboyoun@fhcrc.org>>> <mailto:paboyoun@fhcrc.org>> wrote: >>> >>> >>> Michael, >>> Thanks for the report. RangedData objects have been designed to >>> hold list objects in the values columns. You did, however, find a >>> bug the printing of a RangedData object when it contains a list >>> column. I fixed the show method in both BioC 2.5 IRanges (>= >>> 1.4.16) and BioC 2.6 IRanges (>= 1.5.66) to handle this case. >>> >>> > rd<- RangedData(IRanges(start=1:4, width=10, >>> names=paste("a",1:4)), space=1:2 ) >>> > rd$a.value<- rnorm(4) >>> > rd$a.list<- as.list(1:4) >>> > rd >>> RangedData with 4 rows and 2 value columns across 2 spaces >>> space ranges | a.value a.list >>> <character> <iranges> |<numeric> <list> >>> a 1 1 [1, 10] | 0.5362468 ######## >>> a 3 1 [3, 12] | 0.5459593 ######## >>> a 2 2 [2, 11] | 0.4705777 ######## >>> a 4 2 [4, 13] | 0.4160833 ######## >>> >>> >>> Thanks for doing this Patrick, but what's the deal with the #'s? I >>> mean, how about "1, 2, 3, 4" instead? That's how data.frame prints it. >>> >>> As you noticed, a list column in a RangedData object will result >>> in column expansion if you convert it to a data.frame, which can >>> lead to large data object is the number of rows in a RangedData >>> object is large. >>> >>> >>> Does this make sense? data.frame can handle list columns. >>> >>> data(mtcars) >>> mtcars$a.list<- list(1:4) >>> >>> Since the show method prints out the classes of each of the >>> columns, the user will be able to check to ensure their data >>> columns are stored correctly prior to any conversion to a data.frame. >>> >>> > as.data.frame(rd) >>> space start end width names a.value a.list.1L a.list.2L >>> a.list.3L a.list.4L >>> 1 1 1 10 10 a 1 0.5362468 1 2 >>> 3 4 >>> 2 1 3 12 10 a 3 0.5459593 1 2 >>> 3 4 >>> 3 2 2 11 10 a 2 0.4705777 1 2 >>> 3 4 >>> 4 2 4 13 10 a 4 0.4160833 1 2 >>> 3 4 >>> >>> >>> >>> Patrick >>> >>> >>> On 3/19/10 7:23 AM, Michael Dondrup wrote: >>> >>> Dear Patrick and Michael, >>> >>> thank you very much for your helpful support on my last two >>> connected issued! It is somehow in >>> the documentation in the examples but I must have overlooked it. >>> >>> I tried it out immediately, and it works fine: >>> >>> >>> rd = RangedData(IRanges(start=1:4, width=10, >>> names=paste("a",1:4)), space=1:2 ) >>> rd >>> rd$a.value = rnorm(4) >>> rd >>> >>> RangedData with 4 rows and 1 value column across 2 spaces >>> space ranges | a.value >>> <character> <iranges> |<numeric> >>> 1 1 [1, 10] | -0.6765515 >>> 2 1 [3, 12] | 1.5406962 >>> 3 2 [2, 11] | -1.2599696 >>> 4 2 [4, 13] | 0.4971178 >>> >>> But then I had to reboot my computer because by accident tried >>> this on a 100,000 ranges >>> and the value was actually a list, not a vector, and then the >>> re-cycling rule struck me: >>> >>> >>> rd$a.list = as.list(1:4) >>> >>> first everything seems fine and normal but if you try to print >>> it: >>> >>> rd >>> >>> RangedData with 4 rows and 1 value column across 2 spaces >>> Error in .Method(..., deparse.level = deparse.level) : >>> number of rows of matrices must match (see arg 2) >>> or try to convert into a data.frame: >>> >>> as.data.frame(rd) >>> >>> space start end width names a.list.1L a.list.2L a.list.3L >>> a.list.4L >>> 1 1 1 10 10 a 1 1 2 3 >>> 4 >>> 2 1 3 12 10 a 3 1 2 3 >>> 4 >>> 3 2 2 11 10 a 2 1 2 3 >>> 4 >>> 4 2 4 13 10 a 4 1 2 3 >>> 4 >>> >>> as I tried this, I R ran into some memory problems. >>> >>> This just as a warning, to make sure you really use a vector >>> here. Maybe something to put in the >>> type checking, or documentation? >>> >>> Anyway, thanks a lot again >>> Michael >>> >>> >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@stat.math.ethz.ch<mailto:bioconductor@stat.math.ethz.ch>>> > >>> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >>> >>> >> >> [[alternative HTML version deleted]] >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > > [[alternative HTML version deleted]]

ADD REPLY • link 15.8 years ago Michael Lawrence ★ 11k

Login before adding your answer.