Search
Question: flowBin: error when calculating medianFIDist
0
gravatar for kelly.joanne.andrews
13 days ago by
kelly.joanne.andrews0 wrote:

I am trying to run flowbin on my multi-tube flow set and got error message when trying to calculate medianFIDist.

tube.combined <- flowBin(tube.list=X.sample@tube.set,
                         bin.pars=X.sample@bin.pars,
                         control.tubes=X.sample@control.tubes,
                         expr.method='medianFIDist',
                         scale.expr=T)

 

Applying flowBin to Unnamed Flow Expr Set
Quantile normalising binning parameters across tubes
Binning using kmeans
Filtering sparse bins.
78 bins removed, containing a total of 69991 or 34 % of events (averaged across tubes).
Calculating medianFIDist for all populations.
Error in if (res < 1) res <- 1 : missing value where TRUE/FALSE needed

In addition: Warning message:
Quick-TRANSfer stage steps exceeded maximum (= 14099950)

 

Traceback and some googling (https://rdrr.io/bioc/flowBin/src/R/getBinExpr.R) into the function told me that medianFIDist is assuming the data has been log transformed.

This assumption is false. All my FCS parameters are still linear. I did not perform a transformation on them. It seems to me that this is a foolish assumption to make. What if I run the function on data that has been transformed with a biexponentialTransformation, would it still fail? What can be done to fix this?

 

#Distance function: difference of MFIs
medianFIDist <- function(test, control)
{
    if(is.null(control))
      stop('NULL control frame -- no control tubes specified?')
    #Note: assumes log transform has been applied, so relinearises before subtracting
    res <- median(10^test) - median(10^control)
    if(res < 1) res <- 1
    log(res,10)
}

Thanks

Kelly

ADD COMMENTlink modified 10 days ago by koneill30 • written 13 days ago by kelly.joanne.andrews0
0
gravatar for koneill
12 days ago by
koneill30
Canada
koneill30 wrote:

Hi Kelly!

Thanks for digging deep into the code base to find this. Yes, having linear data is likely to be the problem -- R turns very large numbers into "Inf". 


The assumption there is a log10 transformation may be a bit of a leap, but this is the most common case in flow data (or, rather, that some form of logarithmic transformation is present). It should work fairly well on biexponentially transformed data, as the MFIs are likely to fall within the logarithmic part of the biexponential scale.

One solution, if you fell strongly about not transforming your data, would be to write a new function that takes the simple difference between MFIs. Feel free to add that if you like -- I'd be happy to take a patch or pull request. It would basically mean copy/pasting one of the existing functions, altering the switch statement, and adding a case to the body of getBinExpr to call that function.

Another solution would be to use the proportionPositive method.

Some other comments: In your case, I would also inspect the actual data -- if 34% of events are being removed due to sparse bins, you may have too many bins, or your binning parameters may have drift between tubes beyond what can be corrected by the quantile normalisation. 

I would also be wary of that warning from the k-means clustering: 

https://stackoverflow.com/questions/21382681/kmeans-quick-transfer-stage-steps-exceeded-maximum

 

To be honest, the medianFIDistance was something I was using for a short while during my PhD, before moving over to using proportionPositive exclusively. Apart from anything else, it's a lot easier to interpret in the final results.

 

ADD COMMENTlink written 12 days ago by koneill30

I'm not worried too much about the other warnings just yet. I have a set for FCS files that I have been using to experiment with various flow analysis packages in R. I downloaded the flowBin package because I think it will be useful in the set of analysis that I would like to run, and I like the fact that it takes into account control data. I was walking through the workflow published for flowBin on Bioconductor and was actually a little surprised when the code started running as the files are rather large and have not been downsampled yet. Luckily it was the end of the day, so I let the code go to see what would happen, and you can see the warnings I got above when I returned the next day.

My concern for the package the way it's written, is there is no mention of it being necessary to transform the data. FCS files are exported from the cytometer as linear. It is true that most people would apply a transformation as part of their workflow, but there is no mention of installing flowCore and applying a log transformation in the documentation before running flowBin. However, it's actually not so easy to apply a log10 transformation to flow data as 0 data points are converted to NaN, and therefore other data manipulations are necessary before performing a log transformation. In addition, flowCore offers many different transformation options, so while the data may be transformed, it may not necessarily be in log10. Therefore, the assumption that the data is in log10 is specific to the aml.sample dataset, and not generally applicable.

I'll let you decide what you think is best in terms of patching the code. I don't know much about patch/pull requests. I'm pretty new at this. If you think proportionPositive is the better method and would be more flexible in terms of data input and easier to interpret in data output then I think that is definitely the way to go.

Thanks for your help

Kelly

ADD REPLYlink written 10 days ago by kelly.joanne.andrews0
0
gravatar for koneill
10 days ago by
koneill30
Canada
koneill30 wrote:

You're welcome!

Re: transformations, in general before further analysis you should be doing some preprocessing, including compensation (if that hasn't been applied), some kind of logarithmic transformation (logicle/biexponential is the preferred/FlowJo option), normalisation (optional), gating for debris on FSC/SSC, gating for doublets (usually on FCS-A vs FSC-H), and gating for live cells if there is a live cell marker. See this section of the Wikipedia article/review paper I wrote a few years ago. FlowCore should help to provide functionality for a lot of these, and I believe there are examples in the vignette.

As for subtracting the control tube MFI, I think the main issue is to ensure that the subtraction is on the linear scale, since subtraction in log scale is actually division. I don't think it actually matters much which base is used for the log transformation, as long as the data is turned back into linear. 

Re: log transforming flow data, yes, this is an issue. What I used to do was to set all data points <1 to be equal to 1. This effectively truncates any negative values to zero on the log scale. The "correct" way is just to use the logicle (biexponential) transformation. There is a function for that in flowCore.

But yes, I'd recommend going with proportionPositive -- a lot of clinicians use that in their reporting (e.g. gate out the blast population based on SSC/CD45, then calculate what percent are CD34+).

And re: downsampling, yep, flowBin is actually a kind of downsampling itself, so it shouldn't need any downsampling prior to running. For making sense of its results / running it on larger data sets, I would run flowType on the flowBin data afterwards, as I did in this paper: https://www.ncbi.nlm.nih.gov/pubmed/25600947

 

ADD COMMENTlink written 10 days ago by koneill30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 197 users visited in the last hour