Question

flowBin: error when calculating medianFIDist

0

Entering edit mode

kelly.joanne.andrews • 0

@kellyjoanneandrews-14152

Last seen 7.7 years ago

I am trying to run flowbin on my multi-tube flow set and got error message when trying to calculate medianFIDist.

tube.combined <- flowBin(tube.list=X.sample@tube.set,
                         bin.pars=X.sample@bin.pars,
                         control.tubes=X.sample@control.tubes,
                         expr.method='medianFIDist',
                         scale.expr=T)

Applying flowBin to Unnamed Flow Expr Set
Quantile normalising binning parameters across tubes
Binning using kmeans
Filtering sparse bins.
78 bins removed, containing a total of 69991 or 34 % of events (averaged across tubes).
Calculating medianFIDist for all populations.
Error in if (res < 1) res <- 1 : missing value where TRUE/FALSE needed
In addition: Warning message:
Quick-TRANSfer stage steps exceeded maximum (= 14099950)

Traceback and some googling (https://rdrr.io/bioc/flowBin/src/R/getBinExpr.R) into the function told me that medianFIDist is assuming the data has been log transformed.

This assumption is false. All my FCS parameters are still linear. I did not perform a transformation on them. It seems to me that this is a foolish assumption to make. What if I run the function on data that has been transformed with a biexponentialTransformation, would it still fail? What can be done to fix this?

#Distance function: difference of MFIs
medianFIDist <- function(test, control)
{
    if(is.null(control))
      stop('NULL control frame -- no control tubes specified?')
    #Note: assumes log transform has been applied, so relinearises before subtracting
    res <- median(10^test) - median(10^control)
    if(res < 1) res <- 1
    log(res,10)
}

Thanks

Kelly

flowbin • 1.2k views

ADD COMMENT • link updated 7.7 years ago by koneill ▴ 30 • written 7.7 years ago by kelly.joanne.andrews • 0

score 0 · Answer 1 · 2017-10-11

Hi Kelly!

Thanks for digging deep into the code base to find this. Yes, having linear data is likely to be the problem -- R turns very large numbers into "Inf".

The assumption there is a log10 transformation may be a bit of a leap, but this is the most common case in flow data (or, rather, that some form of logarithmic transformation is present). It should work fairly well on biexponentially transformed data, as the MFIs are likely to fall within the logarithmic part of the biexponential scale.

One solution, if you fell strongly about not transforming your data, would be to write a new function that takes the simple difference between MFIs. Feel free to add that if you like -- I'd be happy to take a patch or pull request. It would basically mean copy/pasting one of the existing functions, altering the switch statement, and adding a case to the body of getBinExpr to call that function.

Another solution would be to use the proportionPositive method.

Some other comments: In your case, I would also inspect the actual data -- if 34% of events are being removed due to sparse bins, you may have too many bins, or your binning parameters may have drift between tubes beyond what can be corrected by the quantile normalisation.

I would also be wary of that warning from the k-means clustering:

https://stackoverflow.com/questions/21382681/kmeans-quick-transfer-stage-steps-exceeded-maximum

To be honest, the medianFIDistance was something I was using for a short while during my PhD, before moving over to using proportionPositive exclusively. Apart from anything else, it's a lot easier to interpret in the final results.

score 0 · Answer 2 · 2017-10-13

You're welcome!

Re: transformations, in general before further analysis you should be doing some preprocessing, including compensation (if that hasn't been applied), some kind of logarithmic transformation (logicle/biexponential is the preferred/FlowJo option), normalisation (optional), gating for debris on FSC/SSC, gating for doublets (usually on FCS-A vs FSC-H), and gating for live cells if there is a live cell marker. See this section of the Wikipedia article/review paper I wrote a few years ago. FlowCore should help to provide functionality for a lot of these, and I believe there are examples in the vignette.

As for subtracting the control tube MFI, I think the main issue is to ensure that the subtraction is on the linear scale, since subtraction in log scale is actually division. I don't think it actually matters much which base is used for the log transformation, as long as the data is turned back into linear.

Re: log transforming flow data, yes, this is an issue. What I used to do was to set all data points <1 to be equal to 1. This effectively truncates any negative values to zero on the log scale. The "correct" way is just to use the logicle (biexponential) transformation. There is a function for that in flowCore.

But yes, I'd recommend going with proportionPositive -- a lot of clinicians use that in their reporting (e.g. gate out the blast population based on SSC/CD45, then calculate what percent are CD34+).

And re: downsampling, yep, flowBin is actually a kind of downsampling itself, so it shouldn't need any downsampling prior to running. For making sense of its results / running it on larger data sets, I would run flowType on the flowBin data afterwards, as I did in this paper: https://www.ncbi.nlm.nih.gov/pubmed/25600947