Hi,
What is the proper way to read in a DataFrame from a text file that has CharacterList columns? With the code below, I can see that write.table() writes the text file in such a way that the CharacterList column has c() calls. I'm guessing that there's a simple argument change or a function that then allows you to read this information, but I'm not finding it.
Thank you,
Leonardo
> library('S4Vectors')
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:parallel’:
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply,
parSapplyLB
The following objects are masked from ‘package:stats’:
IQR, mad, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply,
lengths, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort, table,
tapply, union, unique, unsplit, which, which.max, which.min
Attaching package: ‘S4Vectors’
The following objects are masked from ‘package:base’:
colMeans, colSums, expand.grid, rowMeans, rowSums
> library('GenomicRanges')
Loading required package: IRanges
Loading required package: GenomeInfoDb
There were 12 warnings (use warnings() to see them)
> df <- DataFrame(x = 1:5, y = CharacterList(lapply(1:5, function(i) {
+ letters[seq_len(i)]}
+ )))
>
> write.table(df, file = 'test.tsv', sep = '\t', row.names = FALSE, quote = FALSE)
> system('head test.tsv')
x y
1 a
2 c("a", "b")
3 c("a", "b", "c")
4 c("a", "b", "c", "d")
5 c("a", "b", "c", "d", "e")
>
> df2 <- read.table('test.tsv', header = TRUE, sep = '\t', stringsAsFactors = FALSE)
> df2
x y
1 1 a
2 2 c(a, b)
3 3 c(a, b, c)
4 4 c(a, b, c, d)
5 5 c(a, b, c, d, e)
>
> options(width = 120)
> devtools::session_info()
Session info -----------------------------------------------------------------------------------------------------------
setting value
version R version 3.3.0 RC (2016-05-01 r70572)
system x86_64, darwin13.4.0
ui AQUA
language (EN)
collate en_US.UTF-8
tz America/New_York
date 2016-06-16
Packages ---------------------------------------------------------------------------------------------------------------
package * version date source
BiocGenerics * 0.19.1 2016-06-11 Bioconductor
devtools 1.11.1 2016-04-21 CRAN (R 3.3.0)
digest 0.6.9 2016-01-08 CRAN (R 3.3.0)
GenomeInfoDb * 1.9.1 2016-05-13 Bioconductor
GenomicRanges * 1.25.4 2016-06-10 Bioconductor
IRanges * 2.7.6 2016-06-10 Bioconductor
memoise 1.0.0 2016-01-29 CRAN (R 3.3.0)
S4Vectors * 0.11.4 2016-06-11 Bioconductor
withr 1.0.1 2016-02-04 CRAN (R 3.3.0)
XVector 0.13.0 2016-05-05 Bioconductor
zlibbioc 1.19.0 2016-05-05 Bioconductor
## Doesn't work to simply use DataFrame
> DataFrame(df2)
DataFrame with 5 rows and 2 columns
x y
<integer> <character>
1 1 a
2 2 c(a, b)
3 3 c(a, b, c)
4 4 c(a, b, c, d)
5 5 c(a, b, c, d, e)

Thanks for the info Michael. If I need to read these files, I'll use `strsplit()`.
Best,
Leonardo