Data frame to a nested list in R
1
0
Entering edit mode
bbhatt • 0
@a2a4910b
Last seen 3.0 years ago
Canada

Hi, I have a question regarding converting the data frame to a nested list in R. I have a data frame as shown in the screenshot below wherein column 1 depicts the Ensembl IDs and column 2 are the miRNAs.

data frame with Ensembl IDs (redundant, in column 1) and miRNAs in column2

Now, I want to convert this data frame to a nested list such that the redundancy in column 1 (i.e., Ensembl IDs) is removed and the list looks something like this as follows:

nested list without redundacy

It would be really great if you can please provide any suggestions or recommendations to accomplish this in R. Thanks very much!

RBioinf • 4.1k views
ADD COMMENT
1
Entering edit mode

Quite simply, you can use the nest function. This will create a column that contains a tibble for the data associated with each ID.

Sometimes it's handy to have that list column in a different form, such as an array or a JSON string (e.g. including in a database). Alternatively, you may want the data in a list as opposed to a tibble.

Here are some examples:

library(tidyverse)
library(jsonlite)
#> 
#> Attaching package: 'jsonlite'
#> The following object is masked from 'package:purrr':
#> 
#>     flatten

tb = tibble(A = c("a", "a", "a", "b", "b", "b", "b", "c"), B = letters[1:8])

tb
#> # A tibble: 8 x 2
#>   A     B    
#>   <chr> <chr>
#> 1 a     a    
#> 2 a     b    
#> 3 a     c    
#> 4 b     d    
#> 5 b     e    
#> 6 b     f    
#> 7 b     g    
#> 8 c     h

tb_nested = tb %>% nest(B = B)

tb_nested %>% pull(B)
#> [[1]]
#> # A tibble: 3 x 1
#>   B    
#>   <chr>
#> 1 a    
#> 2 b    
#> 3 c    
#> 
#> [[2]]
#> # A tibble: 4 x 1
#>   B    
#>   <chr>
#> 1 d    
#> 2 e    
#> 3 f    
#> 4 g    
#> 
#> [[3]]
#> # A tibble: 1 x 1
#>   B    
#>   <chr>
#> 1 h

tb_nested_json =
  tb_nested %>%
  rowwise() %>%
  mutate(B = toJSON(B))

tb_nested_json
#> # A tibble: 3 x 2
#> # Rowwise: 
#>   A     B                                        
#>   <chr> <json>                                   
#> 1 a     [{"B":"a"},{"B":"b"},{"B":"c"}]          
#> 2 b     [{"B":"d"},{"B":"e"},{"B":"f"},{"B":"g"}]
#> 3 c     [{"B":"h"}]

tb_nested_list =
  tb_nested %>%
  rowwise() %>%
  mutate(B = list(pull(B, B)))

tb_nested_list %>% pull(B)
#> [[1]]
#> [1] "a" "b" "c"
#> 
#> [[2]]
#> [1] "d" "e" "f" "g"
#> 
#> [[3]]
#> [1] "h"

tb_nested_list %>%
  filter(A == "a") %>%
  pull(B)
#> [[1]]
#> [1] "a" "b" "c"

tb_nested_list %>%
  rowwise() %>%
  mutate(B = toJSON(B))
#> # A tibble: 3 x 2
#> # Rowwise: 
#>   A     B                
#>   <chr> <json>           
#> 1 a     ["a","b","c"]    
#> 2 b     ["d","e","f","g"]
#> 3 c     ["h"]

tb_nested_string =
  tb_nested %>%
  rowwise() %>%
  mutate(B = paste0(c_across(B)[[1]], collapse="|"))

tb_nested_string
#> # A tibble: 3 x 2
#> # Rowwise: 
#>   A     B      
#>   <chr> <chr>  
#> 1 a     a|b|c  
#> 2 b     d|e|f|g
#> 3 c     h

Created on 2021-12-12 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)

ADD REPLY
0
Entering edit mode

Hi Ariel, thanks very much for responding. I really appreciate it. Have a good one!

ADD REPLY
1
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen 14 hours ago
EMBL Heidelberg

Ariel's answer look great and very comprehensive. If you want to stick to base R and data.frames you can use the split() function e.g.

df = data.frame(A = c("a", "a", "a", "b", "b", "b", "b", "c"), B = letters[1:8])
split(df$B, df$A)
#> $a
#> [1] "a" "b" "c"
#> 
#> $b
#> [1] "d" "e" "f" "g"
#> 
#> $c
#> [1] "h"
ADD COMMENT
0
Entering edit mode

Well, the OP did ask for a nested list, so YOUR answer is probably the better one :)

ADD REPLY
0
Entering edit mode

It worked fine. Thanks! :)

ADD REPLY
0
Entering edit mode

Hi Mike, thanks very much for responding. I really appreciate your help. Have a good one!

ADD REPLY

Login before adding your answer.

Traffic: 793 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6