Question: How to identify multiple datasets in FCS file through flowCore
0
gravatar for vinko.tosevski
19 months ago by
vinko.tosevski0 wrote:

Dear all,

I suspect CyTOF FCS files contain multiple datasets each (presumably randomized and non-randomized expression matrix) and I am wondering how to identify them? How to confirm they're indeed in there and, if yes, extract them? I tried importing the FCS file with flowCore::read.FCS(), specifying dataset = 1 and dataset = 2, but resulting flowFrame looks the same. Is the syntax correct? Intuitively, this would suggest there are no multiple datasets, but how to confirm this, as other procedures suggest they're there (for instance, reading in the file into flowJo and re-exporting it creates a fully functional file with less than half the size of the initial one)?

I welcome any feedback I can get on this.

Thanks and best,

Vinko

 

ADD COMMENTlink written 19 months ago by vinko.tosevski0
1

Hi,

I looked at flowcore's code (IO.R file). Datasets are identified using the $NEXTDATA keyword. If there is none, there is only one dataset in the FCS file. So load the TEXT segment of the FCS file using read.FCSheader() and find any $NEXTDATA keyword using grep. To go further take a look at the code at IO.R

HTH

ADD REPLYlink written 19 months ago by SamGG190
1

dd <- read.FCSheader(file)

dd[[1]][["$NEXTDATA"]] should report 0 for single-dataset case.

ADD REPLYlink written 19 months ago by Jiang, Mike1.2k

Dear both,

thank you for your help. I can confirm the files in question have "0" value for the $NEXTDATA keyword, implying those are single-dataset files. However, can you help me understand the following outcome:

> file.size(file)
[1] 420292272
> ff <- read.FCS(file)
> write.FCS(ff, "temp.fcs")
[1] "temp.fcs"
> file.size("temp.fcs")
[1] 138205436
> identical(dim(ff), dim(read.FCS("temp.fcs")))
[1] TRUE

How come simply re-writing the same FCS file results in 300MB smaller file (25% of initial size)? I am not that experienced with intricacies of FCS file format but I would like to understand this better. These are CyTOF files shared over the network, so size matters.

Thanks,

Vinko

 

ADD REPLYlink written 18 months ago by vinko.tosevski0

May be you scan and/or share the header of both files. I suspect something like the number of bits par data point.

hd = flowCore::read.FCSheader(files = file.name)
write.csv(hd[[1]], "before.csv")
ADD REPLYlink written 18 months ago by SamGG190

Compare $PnB keywords for both to see if it is caused by bitwidth difference

ADD REPLYlink written 18 months ago by Jiang, Mike1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 204 users visited in the last hour