Question

Calculate F measure to compare different gating approaches for a specific cell population

0

Entering edit mode

stefankolling+github • 0

@stefankollinggithub-23271

Last seen 4.3 years ago

Germany

Hi there,

This is likely a beginner's question, but I failed to find an answer to it here.

I would like to calculate the F measure to compare cell populations derived via different gating approaches (e.g. manual vs automated; or manually gated by two different people), similar to what was done in the FlowCAP challenge (https://www.nature.com/articles/nmeth.2365).

Maybe there is an easy solution, but my initial idea was to create intersections between the flowFrames of a manually and automated gated specific cell population in order to determine my true positive and false negative subsets - this should be all I need to calculate the F measure as i already know the rest. I could not find out how to achieve this easily, i.e. create an intersection of two flowFrames in one step. Do I have to create a filter list to which I add all gate filters of the gating tree leading to my automatically gated population of interest (e.g. root->debris->singlets->lympho->cd3) and then apply this filter list to the manually gated population (e.g. cd3) to get the intersecting subset?

Any help with this is appreciated.

Cheers, Stefan

flowWorkspace flowCore openCyto flowFrame F-measure • 1.2k views

ADD COMMENT • link updated 4.3 years ago by Jake Wagner ▴ 310 • written 4.3 years ago by stefankolling+github • 0

1

Entering edit mode

Hi Stefan,

You should also read (at least) those two articles: https://onlinelibrary.wiley.com/doi/full/10.1002/cyto.a.23030 https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1738-8

There is a lot of R code in the repository of Lukas, and of course, fro computing F-measure https://github.com/lmweber/cytometry-clustering-comparison

Best, Samuel

ADD REPLY • link 4.3 years ago SamGG ▴ 350

0

Entering edit mode

Moved comment to answer

ADD REPLY • link 4.3 years ago Jake Wagner ▴ 310

score 2 · Accepted Answer · 2020-04-07

Hi Stefan,

Your life will be much easier if you use GatingSet objects (from flowWorkspace) for this rather than working just with filters applied to flowFrames. The workflow there would be to:

1) Build a GatingSet using manual gating approaches.

This can be done in flowWorkspace itself using the gs_pop_add method to add manually-defined geometric gates, but if you're doing the manual gating in FlowJo you can import the gating from FlowJo directly in to a GatingSet using flowjo_to_gatingset from the CytoML package.

2) Build a GatingSet using automated gating approaches.

Similarly here you can use automated methods to determine geometric gates to then be added using gs_pop_add, but the better/more scalable approach would be to use the openCyto package and register_plugins to be able to directly apply your automated method to a GatingSet.

3) Just get logical vectors of membership in the gated subpopulations you want to compare for membership equality/inequality. GatingSets can represent multiple samples, while GatingHierarchy objects are single samples. I say this because the method you would be looking for would be gh_pop_get_indices (where gh stands for GatingHierarchy). That will return a logical vector of membership for each event for the given gate (TRUE if within the gated subpopulation, FALSE otherwise). There would not even be a need to have population names aligned. You could just pick the populations you want to compare and grab its membership indices.

Once you have those logical vectors of membership in the manual gate or automatic gate, you can get the counts you seek:

#TP = sum(manual & auto)

#FN = sum(manual & !auto)

#FP = sum(!manual & auto)

#TN = sum(!manual & !auto)

A little closer to your original approach, you could also apply both strategies (manual and automatic) in the same GatingSet, use booleanFilter to build subpopulations based on logical combinations, then let the GatingSet compute the counts, but the approach using the boolean vectors from gh_pop_get_indices is probably a little simpler.