Calculate F measure to compare different gating approaches for a specific cell population
1
0
Entering edit mode
@stefankollinggithub-23271
Last seen 18 months ago
Germany

Hi there,

This is likely a beginner's question, but I failed to find an answer to it here.

I would like to calculate the F measure to compare cell populations derived via different gating approaches (e.g. manual vs automated; or manually gated by two different people), similar to what was done in the FlowCAP challenge (https://www.nature.com/articles/nmeth.2365).

Maybe there is an easy solution, but my initial idea was to create intersections between the flowFrames of a manually and automated gated specific cell population in order to determine my true positive and false negative subsets - this should be all I need to calculate the F measure as i already know the rest. I could not find out how to achieve this easily, i.e. create an intersection of two flowFrames in one step. Do I have to create a filter list to which I add all gate filters of the gating tree leading to my automatically gated population of interest (e.g. root->debris->singlets->lympho->cd3) and then apply this filter list to the manually gated population (e.g. cd3) to get the intersecting subset?

Any help with this is appreciated.

Cheers, Stefan

1
Entering edit mode

Hi Stefan,

You should also read (at least) those two articles: https://onlinelibrary.wiley.com/doi/full/10.1002/cyto.a.23030 https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1738-8

There is a lot of R code in the repository of Lukas, and of course, fro computing F-measure https://github.com/lmweber/cytometry-clustering-comparison

Best, Samuel

0
Entering edit mode

Moved comment to answer

2
Entering edit mode
Jake Wagner ▴ 280
@jake-wagner-19995
Last seen 12 months ago

Hi Stefan,

Your life will be much easier if you use GatingSet objects (from flowWorkspace) for this rather than working just with filters applied to flowFrames. The workflow there would be to:

1) Build a GatingSet using manual gating approaches.

This can be done in flowWorkspace itself using the gs_pop_add method to add manually-defined geometric gates, but if you're doing the manual gating in FlowJo you can import the gating from FlowJo directly in to a GatingSet using flowjo_to_gatingset from the CytoML package.

2) Build a GatingSet using automated gating approaches.

Similarly here you can use automated methods to determine geometric gates to then be added using gs_pop_add, but the better/more scalable approach would be to use the openCyto package and register_plugins to be able to directly apply your automated method to a GatingSet.

3) Just get logical vectors of membership in the gated subpopulations you want to compare for membership equality/inequality. GatingSets can represent multiple samples, while GatingHierarchy objects are single samples. I say this because the method you would be looking for would be gh_pop_get_indices (where gh stands for GatingHierarchy). That will return a logical vector of membership for each event for the given gate (TRUE if within the gated subpopulation, FALSE otherwise). There would not even be a need to have population names aligned. You could just pick the populations you want to compare and grab its membership indices.

Once you have those logical vectors of membership in the manual gate or automatic gate, you can get the counts you seek:

#TP = sum(manual & auto)

#FN = sum(manual & !auto)

#FP = sum(!manual & auto)

#TN = sum(!manual & !auto)

A little closer to your original approach, you could also apply both strategies (manual and automatic) in the same GatingSet, use booleanFilter to build subpopulations based on logical combinations, then let the GatingSet compute the counts, but the approach using the boolean vectors from gh_pop_get_indices is probably a little simpler.

0
Entering edit mode

Hi Jake,

Fantastic, your detailed answer has made my life so much easier. I had already done steps 1 & 2, but then I was stuck because I could not find the function I needed, i.e. gh_pop_get_indices. That's why I started thinking about a workaround using the flowFrames, but I didn't want to believe that others are doing it like this. Following your explanation works like a charm, thank you loads!

In general, the counts produced by gh_pop_get_indices seem to differ ~1% from the counts I get in FlowJo, but I suppose this is to be expected?

Thanks again, Stefan

1
Entering edit mode

Hi Stefan,

Great to hear it helped. To your question about the variance in counts, yeah there will generally be a slight difference. This is in part due to slight differences in the underlying representation of the gates which in turn results in slight differences in in/out calls of events near gate boundaries. If the differences in counts looks significant, however, please bring it to our attention either here or on a GitHub issue (probably for flowWorkspace would be best) and we'd happily look in to it.

Best, Jake