Question: How to identify multiple datasets in FCS file through flowCore
0
14 months ago by
vinko.tosevski0 wrote:

Dear all,

I suspect CyTOF FCS files contain multiple datasets each (presumably randomized and non-randomized expression matrix) and I am wondering how to identify them? How to confirm they're indeed in there and, if yes, extract them? I tried importing the FCS file with flowCore::read.FCS(), specifying dataset = 1 and dataset = 2, but resulting flowFrame looks the same. Is the syntax correct? Intuitively, this would suggest there are no multiple datasets, but how to confirm this, as other procedures suggest they're there (for instance, reading in the file into flowJo and re-exporting it creates a fully functional file with less than half the size of the initial one)?

I welcome any feedback I can get on this.

Thanks and best,

Vinko

written 14 months ago by vinko.tosevski0
1

Hi,

I looked at flowcore's code (IO.R file). Datasets are identified using the $NEXTDATA keyword. If there is none, there is only one dataset in the FCS file. So load the TEXT segment of the FCS file using read.FCSheader() and find any$NEXTDATA keyword using grep. To go further take a look at the code at IO.R

HTH

1

dd[[1]][["$NEXTDATA"]] should report 0 for single-dataset case. ADD REPLYlink written 14 months ago by Jiang, Mike1.2k Dear both, thank you for your help. I can confirm the files in question have "0" value for the$NEXTDATA keyword, implying those are single-dataset files. However, can you help me understand the following outcome:

> file.size(file)
[1] 420292272
> write.FCS(ff, "temp.fcs")
[1] "temp.fcs"
> file.size("temp.fcs")
[1] 138205436
[1] TRUE

How come simply re-writing the same FCS file results in 300MB smaller file (25% of initial size)? I am not that experienced with intricacies of FCS file format but I would like to understand this better. These are CyTOF files shared over the network, so size matters.

Thanks,

Vinko

May be you scan and/or share the header of both files. I suspect something like the number of bits par data point.

hd = flowCore::read.FCSheader(files = file.name)
write.csv(hd[[1]], "before.csv")