Question: Correct way to split and unsplit a DataFrame
0
gravatar for Michael Steinbaugh
24 days ago by
Harvard
Michael Steinbaugh30 wrote:

I'm having trouble splitting and unsplitting a DataFrame, using the methods defined in IRanges. Here's an attempt at a minimal reprex.

library(IRanges)
df <- DataFrame(
    a = seq_len(4L),
    b = as.factor(rep(c("b", "a"), each = 2L)),
    row.names = LETTERS[seq_len(4L)]
)
print(df)
DataFrame with 4 rows and 2 columns
          a        b
  <integer> <factor>
A         1        b
B         2        b
C         3        a
D         4        a
split <- split(x = df, f = df[["b"]])
print(split)
SplitDataFrameList of length 2
$a
DataFrame with 2 rows and 2 columns
          a        b
  <integer> <factor>
C         3        a
D         4        a

$b
DataFrame with 2 rows and 2 columns
          a        b
  <integer> <factor>
A         1        b
B         2        b

This is all good and lets me manipulate the DataFrame by a grouping factor, similar to the approach in dplyr with group_by. However, I'm having trouble coercing the split back to a standard DataFrame via unsplit().

unlist() will coerce back to DataFrame but flips the row names, because we're not keeping track of our factor grouping:

unlist(split, use.names = FALSE)
DataFrame with 4 rows and 2 columns
          a        b
  <integer> <factor>
C         3        a
D         4        a
A         1        b
B         2        b

Neither one of these approaches with unsplit() seems to work:

unsplit(split, f = df[["b"]])
## Error in unsplit(split, f = df[["b"]]) : 
##   Length of 'unlist(value)' must equal length of 'f'
unsplit(split, f = split[, "b"])
## Error in `splitAsList<-`(`*tmp*`, f, drop = drop, value = value) : 
##   Length of 'value' must equal the length of a split on 'f'

See related S4 method definition:

getMethod(
    f = "unsplit",
    signature = "List",
    where = asNamespace("IRanges")
)
iranges s4vectors • 122 views
ADD COMMENTlink modified 23 days ago by Michael Lawrence11k • written 24 days ago by Michael Steinbaugh30

The stack() function also gets close but doesn't unsplit back to the original DataFrame unmodified:

help(topic = "SplitDataFrameList", package = "IRanges")
stack(x = split, index.var = ".idx")
DataFrame with 4 rows and 3 columns
   .idx         a        b
  <Rle> <integer> <factor>
C     a         3        a
D     a         4        a
A     b         1        b
B     b         2        b
ADD REPLYlink modified 24 days ago • written 24 days ago by Michael Steinbaugh30
Answer: Correct way to split and unsplit a DataFrame
1
gravatar for Michael Lawrence
23 days ago by
United States
Michael Lawrence11k wrote:

Thanks, fixed in version 2.18.2, to appear.

ADD COMMENTlink written 23 days ago by Michael Lawrence11k

Perfect, thanks Michael!

ADD REPLYlink written 21 days ago by Michael Steinbaugh30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 227 users visited in the last hour