Dear all,
I suspect CyTOF FCS files contain multiple datasets each (presumably randomized and non-randomized expression matrix) and I am wondering how to identify them? How to confirm they're indeed in there and, if yes, extract them? I tried importing the FCS file with flowCore::read.FCS()
, specifying dataset = 1
and dataset = 2,
but resulting flowFrame looks the same. Is the syntax correct? Intuitively, this would suggest there are no multiple datasets, but how to confirm this, as other procedures suggest they're there (for instance, reading in the file into flowJo and re-exporting it creates a fully functional file with less than half the size of the initial one)?
I welcome any feedback I can get on this.
Thanks and best,
Vinko
Hi,
I looked at flowcore's code (IO.R file). Datasets are identified using the $NEXTDATA keyword. If there is none, there is only one dataset in the FCS file. So load the TEXT segment of the FCS file using read.FCSheader() and find any $NEXTDATA keyword using grep. To go further take a look at the code at IO.R
HTH
dd <- read.FCSheader(file)
dd[[1]][["$NEXTDATA"]] should report 0 for single-dataset case.
Dear both,
thank you for your help. I can confirm the files in question have "0" value for the $NEXTDATA keyword, implying those are single-dataset files. However, can you help me understand the following outcome:
How come simply re-writing the same FCS file results in 300MB smaller file (25% of initial size)? I am not that experienced with intricacies of FCS file format but I would like to understand this better. These are CyTOF files shared over the network, so size matters.
Thanks,
Vinko
May be you scan and/or share the header of both files. I suspect something like the number of bits par data point.
Compare $PnB keywords for both to see if it is caused by bitwidth difference