DataFrame after splitting and back
2
1
Entering edit mode
@laurent-gatto-5645
Last seen 1 day ago
Belgium

I have the following DataFrame

> k <- sample(3, 10, replace = TRUE)
> df <- DataFrame(k = k,
+                 x = round(rnorm(length(k)), 2),
+                 y = seq_len(length(k)),
+                 z = sample(LETTERS, length(k), replace = TRUE),
+                 ir = IRanges(seq_along(k), width = 10),
+                 r = Rle(sample(5, length(k), replace = TRUE)))
> df
DataFrame with 10 rows and 6 columns
k         x         y           z        ir     r
<integer> <numeric> <integer> <character> <IRanges> <Rle>
1          2      -0.8         1           E      1-10     1
2          2     -0.43         2           U      2-11     5
3          2     -0.67         3           U      3-12     4
4          1      0.58         4           L      4-13     2
5          2     -0.95         5           K      5-14     1
6          1      0.47         6           J      6-15     1
7          2      1.24         7           S      7-16     3
8          2     -1.73         8           M      8-17     5
9          1     -0.89         9           F      9-18     5
10         2      -1.3        10           D     10-19     4


that stores information about three subgroups (defined by column k).

I can very efficiently group the rows into a new DataFrame by first splitting df based on k, then creating an new compressed one:

> df2 <- DataFrame(split(df, df$k)) > df2 DataFrame with 2 rows and 6 columns k x y z <IntegerList> <NumericList> <IntegerList> <CharacterList> 1 1,1,1 0.58,0.47,-0.89 4,6,9 L,J,F 2 2,2,2,... -0.8,-0.43,-0.67,... 1,2,3,... E,U,U,... ir r <IRangesList> <RleList> 1 4-13,6-15,9-18 2,1,5 2 1-10,2-11,3-12,... 1,5,4,...  Is there an easy and fast way to get back to df from df2? DataFrame S4Vectors • 438 views ADD COMMENT 2 Entering edit mode @martin-morgan-1513 Last seen 7 days ago United States Maybe DataFrame(lapply(df2, unsplit, df$k))


?

2
Entering edit mode
@michael-lawrence-3846
Last seen 5 months ago
United States

In devel there is a new recursive=TRUE argument on expand() that if FALSE will expand columns in parallel, so you can now do:

expand(df2, recursive=FALSE)


as long as you don't care that the data are sorted by "k".