How to identify multiple datasets in FCS file through flowCore
0
0
Entering edit mode
@vinkotosevski-12266
Last seen 6.6 years ago

Dear all,

I suspect CyTOF FCS files contain multiple datasets each (presumably randomized and non-randomized expression matrix) and I am wondering how to identify them? How to confirm they're indeed in there and, if yes, extract them? I tried importing the FCS file with flowCore::read.FCS(), specifying dataset = 1 and dataset = 2, but resulting flowFrame looks the same. Is the syntax correct? Intuitively, this would suggest there are no multiple datasets, but how to confirm this, as other procedures suggest they're there (for instance, reading in the file into flowJo and re-exporting it creates a fully functional file with less than half the size of the initial one)?

I welcome any feedback I can get on this.

Thanks and best,

Vinko

 

flowcore flow cytometry dataset • 2.1k views
ADD COMMENT
1
Entering edit mode

Hi,

I looked at flowcore's code (IO.R file). Datasets are identified using the $NEXTDATA keyword. If there is none, there is only one dataset in the FCS file. So load the TEXT segment of the FCS file using read.FCSheader() and find any $NEXTDATA keyword using grep. To go further take a look at the code at IO.R

HTH

ADD REPLY
1
Entering edit mode

dd <- read.FCSheader(file)

dd[[1]][["$NEXTDATA"]] should report 0 for single-dataset case.

ADD REPLY
0
Entering edit mode

Dear both,

thank you for your help. I can confirm the files in question have "0" value for the $NEXTDATA keyword, implying those are single-dataset files. However, can you help me understand the following outcome:

> file.size(file)
[1] 420292272
> ff <- read.FCS(file)
> write.FCS(ff, "temp.fcs")
[1] "temp.fcs"
> file.size("temp.fcs")
[1] 138205436
> identical(dim(ff), dim(read.FCS("temp.fcs")))
[1] TRUE

How come simply re-writing the same FCS file results in 300MB smaller file (25% of initial size)? I am not that experienced with intricacies of FCS file format but I would like to understand this better. These are CyTOF files shared over the network, so size matters.

Thanks,

Vinko

 

ADD REPLY
0
Entering edit mode

May be you scan and/or share the header of both files. I suspect something like the number of bits par data point.

hd = flowCore::read.FCSheader(files = file.name)
write.csv(hd[[1]], "before.csv")
ADD REPLY
0
Entering edit mode

Compare $PnB keywords for both to see if it is caused by bitwidth difference

ADD REPLY

Login before adding your answer.

Traffic: 418 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6