Hi,
What is the proper way to read in a DataFrame from a text file that has CharacterList columns? With the code below, I can see that write.table() writes the text file in such a way that the CharacterList column has c() calls. I'm guessing that there's a simple argument change or a function that then allows you to read this information, but I'm not finding it.
Thank you,
Leonardo
> library('S4Vectors')
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:parallel’:
    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply,
    parSapplyLB
The following objects are masked from ‘package:stats’:
    IQR, mad, xtabs
The following objects are masked from ‘package:base’:
    anyDuplicated, append, as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply,
    lengths, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort, table,
    tapply, union, unique, unsplit, which, which.max, which.min
Attaching package: ‘S4Vectors’
The following objects are masked from ‘package:base’:
    colMeans, colSums, expand.grid, rowMeans, rowSums
> library('GenomicRanges')
Loading required package: IRanges
Loading required package: GenomeInfoDb
There were 12 warnings (use warnings() to see them)
> df <- DataFrame(x = 1:5, y = CharacterList(lapply(1:5, function(i) {
+     letters[seq_len(i)]}
+ )))
> 
> write.table(df, file = 'test.tsv', sep = '\t', row.names = FALSE, quote = FALSE)
> system('head test.tsv')
x    y
1    a
2    c("a", "b")
3    c("a", "b", "c")
4    c("a", "b", "c", "d")
5    c("a", "b", "c", "d", "e")
> 
> df2 <- read.table('test.tsv', header = TRUE, sep = '\t', stringsAsFactors = FALSE)
> df2
  x                y
1 1                a
2 2          c(a, b)
3 3       c(a, b, c)
4 4    c(a, b, c, d)
5 5 c(a, b, c, d, e)
> 
> options(width = 120)
> devtools::session_info()
Session info -----------------------------------------------------------------------------------------------------------
 setting  value                                 
 version  R version 3.3.0 RC (2016-05-01 r70572)
 system   x86_64, darwin13.4.0                  
 ui       AQUA                                  
 language (EN)                                  
 collate  en_US.UTF-8                           
 tz       America/New_York                      
 date     2016-06-16                            
Packages ---------------------------------------------------------------------------------------------------------------
 package       * version date       source        
 BiocGenerics  * 0.19.1  2016-06-11 Bioconductor  
 devtools        1.11.1  2016-04-21 CRAN (R 3.3.0)
 digest          0.6.9   2016-01-08 CRAN (R 3.3.0)
 GenomeInfoDb  * 1.9.1   2016-05-13 Bioconductor  
 GenomicRanges * 1.25.4  2016-06-10 Bioconductor  
 IRanges       * 2.7.6   2016-06-10 Bioconductor  
 memoise         1.0.0   2016-01-29 CRAN (R 3.3.0)
 S4Vectors     * 0.11.4  2016-06-11 Bioconductor  
 withr           1.0.1   2016-02-04 CRAN (R 3.3.0)
 XVector         0.13.0  2016-05-05 Bioconductor  
 zlibbioc        1.19.0  2016-05-05 Bioconductor  
## Doesn't work to simply use DataFrame
> DataFrame(df2)
DataFrame with 5 rows and 2 columns
          x                y
  <integer>      <character>
1         1                a
2         2          c(a, b)
3         3       c(a, b, c)
4         4    c(a, b, c, d)
5         5 c(a, b, c, d, e)
                    
                
                
Thanks for the info Michael. If I need to read these files, I'll use `strsplit()`.
Best,
Leonardo