Question

DataFrame after splitting and back

1

Entering edit mode

Laurent Gatto 1.6k

@laurent-gatto-5645

Last seen 4 months ago

Belgium

I have the following DataFrame

> k <- sample(3, 10, replace = TRUE)
> df <- DataFrame(k = k,
+                 x = round(rnorm(length(k)), 2),
+                 y = seq_len(length(k)),
+                 z = sample(LETTERS, length(k), replace = TRUE),
+                 ir = IRanges(seq_along(k), width = 10),
+                 r = Rle(sample(5, length(k), replace = TRUE)))
> df
DataFrame with 10 rows and 6 columns
           k         x         y           z        ir     r
   <integer> <numeric> <integer> <character> <IRanges> <Rle>
1          2      -0.8         1           E      1-10     1
2          2     -0.43         2           U      2-11     5
3          2     -0.67         3           U      3-12     4
4          1      0.58         4           L      4-13     2
5          2     -0.95         5           K      5-14     1
6          1      0.47         6           J      6-15     1
7          2      1.24         7           S      7-16     3
8          2     -1.73         8           M      8-17     5
9          1     -0.89         9           F      9-18     5
10         2      -1.3        10           D     10-19     4

that stores information about three subgroups (defined by column k).

I can very efficiently group the rows into a new DataFrame by first splitting df based on k, then creating an new compressed one:

> df2 <- DataFrame(split(df, df$k))
> df2
DataFrame with 2 rows and 6 columns
              k                    x             y               z
  <IntegerList>        <NumericList> <IntegerList> <CharacterList>
1         1,1,1      0.58,0.47,-0.89         4,6,9           L,J,F
2     2,2,2,... -0.8,-0.43,-0.67,...     1,2,3,...       E,U,U,...
                  ir         r
       <IRangesList> <RleList>
1     4-13,6-15,9-18     2,1,5
2 1-10,2-11,3-12,... 1,5,4,...

Is there an easy and fast way to get back to df from df2?

DataFrame S4Vectors • 1.3k views

ADD COMMENT • link updated 6.2 years ago by Michael Lawrence ★ 11k • written 6.2 years ago by Laurent Gatto 1.6k

Michael Lawrence · Accepted Answer · 2019-04-24

2

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 5 months ago

United States

Maybe

DataFrame(lapply(df2, unsplit, df$k))

?

ADD COMMENT • link updated 6.2 years ago by Michael Lawrence ★ 11k • written 6.2 years ago by Martin Morgan 25k

score 2 · Accepted Answer · 2019-04-24

2

Entering edit mode

Michael Lawrence ★ 11k

@michael-lawrence-3846

Last seen 3.6 years ago

United States

In devel there is a new recursive=TRUE argument on expand() that if FALSE will expand columns in parallel, so you can now do:

expand(df2, recursive=FALSE)

as long as you don't care that the data are sorted by "k".

ADD COMMENT • link 6.2 years ago Michael Lawrence ★ 11k