Dear flowCore team,
I have recently started using flowCore (and other packages) to analyse flow cytometry data.
I have a collection of 8 FCS files of 80-180 Mb each and I can easily load these into R using read.FCS.
However, to initially practice using the packages, I wanted to limit the number of events read. Reading the read.FCS help, I wanted to use the which.lines parameter to limit what was being read. I was expecting this to make reading the files in faster, however the opposite was true.
ff <- read.FCS( my.fcs.file, transformation=FALSE)
takes 4 to 8 seconds per file.
ff <- read.FCS( my.fcs.file, transformation=FALSE, which.lines=1:100000) ff <- read.FCS( my.fcs.file, transformation=FALSE, which.lines=100000)
were much slower (neither had finished after 2 minutes for the first file).
So effectively, reading in the full files and then sub-selecting rows with either
ff <- ff[1:100000,]
ff <- ff[sample.int(nrow(ff),10000),]
is much faster (though, obviously, has higher memory requirements)!
Is this normal? Am I missing something? Should I just stick to my workaround?
Thanks in advance for your help!
My R session info is: > sessionInfo() R version 3.1.1 (2014-07-10) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale:  en_NZ.UTF-8/en_NZ.UTF-8/en_NZ.UTF-8/C/en_NZ.UTF-8/en_NZ.UTF-8 attached base packages:  stats graphics grDevices utils datasets methods base other attached packages:  flowType_2.4.0 BH_1.54.0-4 Rcpp_0.11.3 flowCore_1.32.1 loaded via a namespace (and not attached):  Biobase_2.26.0 BiocGenerics_0.12.0 clue_0.3-48 cluster_1.15.3 coda_0.16-1 corpcor_1.6.7  DEoptimR_1.0-2 feature_1.2.10 flowClust_3.4.0 flowMeans_1.18.0 flowMerge_2.14.0 flowViz_1.30.0  graph_1.44.0 grid_3.1.1 hexbin_1.27.0 IDPmisc_1.1.17 KernSmooth_2.23-13 ks_1.9.2  lattice_0.20-29 latticeExtra_0.6-26 MASS_7.3-35 MCMCpack_1.3-3 misc3d_0.8-4 mvtnorm_1.0-0  parallel_3.1.1 pcaPP_1.9-60 RColorBrewer_1.0-5 rgl_0.93.1098 Rgraphviz_2.10.0 robustbase_0.91-1  rrcov_1.3-4 sfsmisc_1.0-26 stats4_3.1.1 tools_3.1.1