Question: Correct way to split and unsplit a DataFrame

0

Michael Steinbaugh •

**30**wrote:I'm having trouble splitting and unsplitting a `DataFrame`

, using the methods defined in IRanges. Here's an attempt at a minimal reprex.

```
library(IRanges)
df <- DataFrame(
a = seq_len(4L),
b = as.factor(rep(c("b", "a"), each = 2L)),
row.names = LETTERS[seq_len(4L)]
)
print(df)
```

```
DataFrame with 4 rows and 2 columns
a b
<integer> <factor>
A 1 b
B 2 b
C 3 a
D 4 a
```

```
split <- split(x = df, f = df[["b"]])
print(split)
```

```
SplitDataFrameList of length 2
$a
DataFrame with 2 rows and 2 columns
a b
<integer> <factor>
C 3 a
D 4 a
$b
DataFrame with 2 rows and 2 columns
a b
<integer> <factor>
A 1 b
B 2 b
```

This is all good and lets me manipulate the `DataFrame`

by a grouping `factor`

, similar to the approach in dplyr with `group_by`

. However, I'm having trouble coercing the split back to a standard `DataFrame`

via `unsplit()`

.

`unlist()`

will coerce back to `DataFrame`

but flips the row names, because we're not keeping track of our factor grouping:

```
unlist(split, use.names = FALSE)
```

```
DataFrame with 4 rows and 2 columns
a b
<integer> <factor>
C 3 a
D 4 a
A 1 b
B 2 b
```

Neither one of these approaches with `unsplit()`

seems to work:

```
unsplit(split, f = df[["b"]])
```

```
## Error in unsplit(split, f = df[["b"]]) :
## Length of 'unlist(value)' must equal length of 'f'
```

```
unsplit(split, f = split[, "b"])
```

```
## Error in `splitAsList<-`(`*tmp*`, f, drop = drop, value = value) :
## Length of 'value' must equal the length of a split on 'f'
```

See related S4 method definition:

```
getMethod(
f = "unsplit",
signature = "List",
where = asNamespace("IRanges")
)
```

ADD COMMENT
• link
•
modified 23 days ago
by
Michael Lawrence ♦

**11k**• written 24 days ago by Michael Steinbaugh •**30**
The

`stack()`

function also gets close but doesn't unsplit back to the original`DataFrame`

unmodified:30