flowCore: inverse logicle transformation of flow cytometry data
1
0
Entering edit mode
Josef Spidlen ▴ 140
@josef-spidlen-3720
Last seen 9.6 years ago
Hi Nishant, Chao-Jen, Pyne, et al. I thought I would add few notes on the Logicle transformation and its implementation in BioConductor/flowCore. Please do not take the email the wrong way, I am definitely not trying to complain about flowCore's implementation; just trying to bring some light into the Logicle issues. The implementation of Logicle is tricky, which is due to both, the transformation being quite complicated and the original documentation being targeted to readers with substantial mathematical background. Nishant, Florian, Byron and others did a great job when they opened the can of worms and implemented the transformation in flowCore. 1) I remember that there were minor issues with the Logicle transformation, especially when applied on very low and negative values. I believe, flowCore's implementation wasn't actually a monotone function around 0 and it probably relates to a typo in the Parks et. al. Logicle manuscript. Nishant, we had some email discussion on this topic around January 27, 2009. Recently, I talked to Dave Parks and Wayne Moore and they told me that flowCore's implementation of Logicle is still broken (no further details, not sure about validity of this statement and not sure if this is related to the original issue). Having said that, please note that these issues are minor and would not really affect typical users using flowCore/Logicle as part of their analysis pipeline. 2) Quoting Chao-Jen Wong: > Since the logicle transformation is an one-to-one and onto function, > it is possible to implement an inverse function. It is, however, not > straightforward... I believe Nishant has solved this by now but note that the inverse should actually be easier than the Logicle itself. This is because the Logicle transformation is defined as logicle(x)=root(S(y)-x), i.e., Logicle is defined as the inverse of S, where S is a ?simple? function. Eventually, you can just use S as the inverse of Logicle. 3) Recently, there have been some development related to Logicle that flowCore may want to consider/support/implement: - Since 2005, there is a patent on the Logicle implementation owned by Stanford. However, early this year, Stanford decided not to collect royalties on it anymore and it became free to be used by anyone. - As a result, Logicle has been incorporated in the latest version of Gating-ML (an ISAC standard for describing gates and data transformations in XML). This has been done with collaboration of the authors of the transformation, who decided to tweak the transformation a little bit: a) In the original manuscript, Parks et. al. are showing two different parameterizations of the Logicle function based on natural logarithm (base e) and decadic logarithm (base 10) respectively. The recent conclusion is that the decadic (base 10) version is better (i.e., easier) for the end user and should be used. Essentially, the transformation function is the same as long as you adjust the parameters accordingly. In the manuscript, the parameters are lower case for natural logarithm parameterization and upper case for the base 10 logarithm. The 'm' and 'w' are the two affected parameters where m=M*ln(10) and w=W*ln(10).** The "upper case" (i.e., decadic version) is better for the end user since M and W are expressed in normal decades, i.e., base 10 log units; M is the total plot width and W is the linearization width. Therefore, let's say the user wants the result to be a 4.5 decades plot, so they use M=4.5 rather than having to use m=10.36. The implementation in flowCore seems to be based on the natural logarithm but its parameterization is mixed ('w' seems to be in natural logarithm decades, while 'd' seems to be decadic logarithm decades). Eventually, flowCore could switch to the decadic logarithm implementation and harmonize the parameterization, ... and maybe use the same constants as in the paper?). ** I believe that flowCore calls 'd' what is called 'M' in the manuscript and 'r' what is called 'T' in the manuscript (Parks, et. al., Cytometry, 69A: 541-551; 2006). b) The authors of Logicle added one additional parameter to the Logicle function: 'A' - the additional negative display range in asymptotic decades (usually 0 or a negative value). Setting it to 0 produces the ?original? Logicle. In cases where low data values are dominated by statistical variation but the values are constrained to be non- negative (as seen in peak detected flow cytometry data), a Logicle plot with A = -W would include data zero and be near-linear at low data values thereby avoiding problems associated with log scales at the low end. 4) If you decided to adjust the implementation of Logicle in flowCore, a consistent description with (hopefully) all necessary details is included in the latest Gating-ML specification, which can be downloaded from http://flowcyt.sourceforge.net/gating/latest.full.zip. 5) The latest Gating-ML specification also includes compliance tests, which include the Logicle transformation. This may eventually help you adjust/debug the implementation of Logicle (as well as its inverse function) in flowCore. 6) I have some Java code that implements the updated Logicle. Specifically, I have the Logicle(T, M, W, A) class that allows you to create and apply the Logicle transformation; and I also have some code that calculates default values for T, M, W, and A based on the contents of an FCS file. Please let me know if you would like me to share these. I am not suggesting that you would reuse the implementation directly since it is quite naive and relatively slow (using a simple bisection method as a root-finding algorithm every time you call it); however, it may have some value in clarifying potential ambiguities related to that function. The crucial part is the updated S function that now includes the parameter 'A' and works with decadic parameterization (see Gating- ML for details). However, since flowCore's internal implementation seems to be based on the ?bi-exponential? like parameterization, i.e., a*e^(b*x) - c*e^(-d*x) + f, it may involve some effort to convert this correctly. 7) A minor note on flowCore's defaults for the logicle transformation: r = 262144; This works for data from BD's newer instrument since their range is 2^18; however, there is a lot of other FCS files with different max (e.g., 10^4), where the 262144 is not very good. An option would be to have r=NULL as the default and adjust it based on the data that the transformation is supposed to be applied to. d = 5; Parks et. al. are suggesting to use 4.5 but this does not really make a difference. More importantly, there seems to be a typo in the documentation of the function saying that d is the breath of the display in natural logarithm units. The code includes d <- d * log(10) and therefore, the documentation should probably say that d is the breath of the display in decades (i.e., decadic logarithm units). Also, Nishant, shouldn't the if (w > d) stop(...) in the logicleTransform function go after the d <- d * log(10)? w = 0; This does not perform very well if you have low and negative values to look at. Alternatively, you could have w=NULL as default and create the real value based on the data set. A recommended way to specify W to match particular data is to select a value 'z' approximating the most negative data value that must be included and calculate W as: W = (M ? log(T/abs(z)))/2. Setting 'z' at the fifth percentile of events that are below zero will yield an appropriate display in most cases. Please let me know if I could do anything to help or clarify things further. Cheers, Josef -- Josef Spidlen, Ph.D. Research Associate, Terry Fox Laboratory, BC Cancer Agency 675 West 10th Avenue, V5Z 1L3 Vancouver, BC, Canada Tel: +1 (604) 675-8000 x 7755
Cancer convert flowCore Cancer convert flowCore • 2.3k views
ADD COMMENT
0
Entering edit mode
@nishant-gopalakrishnan-3253
Last seen 9.6 years ago
Hi Josef, Thanks a lot for the detailed email and for the information on the additional parameters introduced in the logicle transformation. 1) Regarding the problems with low and negative values, the transformation for the negative values were implemented incorrectly in the earlier versions of flowCore. After the detailed email from Parks about the typo in the manuscript and the discussion we had in Jan 2009, I had modified the transformation to correct this issue. I do remember comparing the results generated by flowCore's transformation to those generated using your Java implementation. Now that you mention there are issues with the implementation of flowCores logicle transform it would be great if you could provide more information regarding what exactly is incorrect here so that the issue can be corrected. Has the results/software used for the Gating ML standards unit tests for the logicle transformation been verified by Parks and Moore to be correct so that everyone has a gold standard that has been verified to be correct from the original authors of the transformation ? Also what is the level of compliance to the standards definition for this transformation amongst the currently available flow cytometry software. 2) The inverse of the logicle was easier to implement with a direct call to the biexponential function and is available in flowCore (1.11.22 ) 3) Future updates will be made to flowCore to a. Add the new parameter "A" defined for the transform. Addition of this parameter to the biexponential function should not be difficult as this parameter just gets added to the old parameter w wherever it exists in the new definition of the transformation. b. Move all the inputs to the decade scale since that has been established as the standard and update the man pages accordingly 4) Default values a. Thank you for the suggestions for providing an option for a data driven selection of the transformation parameters. I will look into providing this option . However, this option might cause problems for users who what to use an inverse transform to get back to the original scale. b. You are correct regarding the man page, it should have said d is the breadth of the display in decades. Thanks Nishant -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Josef Spidlen Sent: Wednesday, October 07, 2009 4:59 PM To: BioConductor Subject: Re: [BioC] flowCore: inverse logicle transformation of flow cytometry data Hi Nishant, Chao-Jen, Pyne, et al. I thought I would add few notes on the Logicle transformation and its implementation in BioConductor/flowCore. Please do not take the email the wrong way, I am definitely not trying to complain about flowCore's implementation; just trying to bring some light into the Logicle issues. The implementation of Logicle is tricky, which is due to both, the transformation being quite complicated and the original documentation being targeted to readers with substantial mathematical background. Nishant, Florian, Byron and others did a great job when they opened the can of worms and implemented the transformation in flowCore. 1) I remember that there were minor issues with the Logicle transformation, especially when applied on very low and negative values. I believe, flowCore's implementation wasn't actually a monotone function around 0 and it probably relates to a typo in the Parks et. al. Logicle manuscript. Nishant, we had some email discussion on this topic around January 27, 2009. Recently, I talked to Dave Parks and Wayne Moore and they told me that flowCore's implementation of Logicle is still broken (no further details, not sure about validity of this statement and not sure if this is related to the original issue). Having said that, please note that these issues are minor and would not really affect typical users using flowCore/Logicle as part of their analysis pipeline. 2) Quoting Chao-Jen Wong: > Since the logicle transformation is an one-to-one and onto function, > it is possible to implement an inverse function. It is, however, not > straightforward... I believe Nishant has solved this by now but note that the inverse should actually be easier than the Logicle itself. This is because the Logicle transformation is defined as logicle(x)=root(S(y)-x), i.e., Logicle is defined as the inverse of S, where S is a "simple" function. Eventually, you can just use S as the inverse of Logicle. 3) Recently, there have been some development related to Logicle that flowCore may want to consider/support/implement: - Since 2005, there is a patent on the Logicle implementation owned by Stanford. However, early this year, Stanford decided not to collect royalties on it anymore and it became free to be used by anyone. - As a result, Logicle has been incorporated in the latest version of Gating-ML (an ISAC standard for describing gates and data transformations in XML). This has been done with collaboration of the authors of the transformation, who decided to tweak the transformation a little bit: a) In the original manuscript, Parks et. al. are showing two different parameterizations of the Logicle function based on natural logarithm (base e) and decadic logarithm (base 10) respectively. The recent conclusion is that the decadic (base 10) version is better (i.e., easier) for the end user and should be used. Essentially, the transformation function is the same as long as you adjust the parameters accordingly. In the manuscript, the parameters are lower case for natural logarithm parameterization and upper case for the base 10 logarithm. The 'm' and 'w' are the two affected parameters where m=M*ln(10) and w=W*ln(10).** The "upper case" (i.e., decadic version) is better for the end user since M and W are expressed in normal decades, i.e., base 10 log units; M is the total plot width and W is the linearization width. Therefore, let's say the user wants the result to be a 4.5 decades plot, so they use M=4.5 rather than having to use m=10.36. The implementation in flowCore seems to be based on the natural logarithm but its parameterization is mixed ('w' seems to be in natural logarithm decades, while 'd' seems to be decadic logarithm decades). Eventually, flowCore could switch to the decadic logarithm implementation and harmonize the parameterization, ... and maybe use the same constants as in the paper?). ** I believe that flowCore calls 'd' what is called 'M' in the manuscript and 'r' what is called 'T' in the manuscript (Parks, et. al., Cytometry, 69A: 541-551; 2006). b) The authors of Logicle added one additional parameter to the Logicle function: 'A' - the additional negative display range in asymptotic decades (usually 0 or a negative value). Setting it to 0 produces the "original" Logicle. In cases where low data values are dominated by statistical variation but the values are constrained to be non- negative (as seen in peak detected flow cytometry data), a Logicle plot with A = -W would include data zero and be near-linear at low data values thereby avoiding problems associated with log scales at the low end. 4) If you decided to adjust the implementation of Logicle in flowCore, a consistent description with (hopefully) all necessary details is included in the latest Gating-ML specification, which can be downloaded from http://flowcyt.sourceforge.net/gating/latest.full.zip. 5) The latest Gating-ML specification also includes compliance tests, which include the Logicle transformation. This may eventually help you adjust/debug the implementation of Logicle (as well as its inverse function) in flowCore. 6) I have some Java code that implements the updated Logicle. Specifically, I have the Logicle(T, M, W, A) class that allows you to create and apply the Logicle transformation; and I also have some code that calculates default values for T, M, W, and A based on the contents of an FCS file. Please let me know if you would like me to share these. I am not suggesting that you would reuse the implementation directly since it is quite naive and relatively slow (using a simple bisection method as a root-finding algorithm every time you call it); however, it may have some value in clarifying potential ambiguities related to that function. The crucial part is the updated S function that now includes the parameter 'A' and works with decadic parameterization (see Gating- ML for details). However, since flowCore's internal implementation seems to be based on the "bi-exponential" like parameterization, i.e., a*e^(b*x) - c*e^(-d*x) + f, it may involve some effort to convert this correctly. 7) A minor note on flowCore's defaults for the logicle transformation: r = 262144; This works for data from BD's newer instrument since their range is 2^18; however, there is a lot of other FCS files with different max (e.g., 10^4), where the 262144 is not very good. An option would be to have r=NULL as the default and adjust it based on the data that the transformation is supposed to be applied to. d = 5; Parks et. al. are suggesting to use 4.5 but this does not really make a difference. More importantly, there seems to be a typo in the documentation of the function saying that d is the breath of the display in natural logarithm units. The code includes d <- d * log(10) and therefore, the documentation should probably say that d is the breath of the display in decades (i.e., decadic logarithm units). Also, Nishant, shouldn't the if (w > d) stop(...) in the logicleTransform function go after the d <- d * log(10)? w = 0; This does not perform very well if you have low and negative values to look at. Alternatively, you could have w=NULL as default and create the real value based on the data set. A recommended way to specify W to match particular data is to select a value 'z' approximating the most negative data value that must be included and calculate W as: W = (M - log(T/abs(z)))/2. Setting 'z' at the fifth percentile of events that are below zero will yield an appropriate display in most cases. Please let me know if I could do anything to help or clarify things further. Cheers, Josef -- Josef Spidlen, Ph.D. Research Associate, Terry Fox Laboratory, BC Cancer Agency 675 West 10th Avenue, V5Z 1L3 Vancouver, BC, Canada Tel: +1 (604) 675-8000 x 7755 _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Hi Nishant, > Now that you mention there are issues with the implementation of flowCores > logicle transform it would be great if you could provide more information > regarding what exactly is incorrect here so that the issue can be corrected. I believe that there is a minor issue related to parameterization and maybe the use of natural vs. decadic logs in the code. When you use logicle with say d=5 and r=the range of your data, then the max of your data after the transformation should be d, i.e, 5. Your implementation seems to stretch it to 5*ln(10). Looking at the actual values, I wasn't able to match it to my implementation even when I set the range to 5*ln(10). So there may be issues with my implementation, your implementation, or both of them :-)... or maybe just with matching the parameters... Anyways, I guess I'll get back to you offline with more details... And I guess I should get my implementation validated before blaming you :-) > Has the results/software used for the Gating ML standards unit tests for > the logicle transformation been verified by Parks and Moore to be correct > so that everyone has a gold standard that has been verified to be correct > from the original authors of the transformation? Not yet but that is the plan and they agreed to doing so. I gave them all the necessary bits on September 22 and we have a call with them scheduled for October 27. These are busy people but I hope to hear back from them by than. > Also what is the level of compliance to the standards definition for this > transformation amongst the currently available flow cytometry software. I guess zero. But some of them are working on it and trying to refactor their own code. At this point, third party software is usually implementing some kind of logicle transformation that does something reasonable to the data but it is not fully compliant. My understanding is that they are usually trying to set the parameters automatically based on the data but it is not even easy for them to understand their own code. This is particularly due to external programmers having implemented some functionality in their software and particularly due to additional optimization and 'magic' with the data making it nicer and faster for the end users (e.g., combining the transformation with compensation or applying some smoothing to avoid instrumentation related artifacts, such as the fence effect with low values). Some of these software vendors are actually going to meet with the authors of the transformation during the CDW workshop in Asilomar next week with the purpose of sorting this out. Also, Gating-ML 2.0 (the one with Logicle included) has not been released yet. I believe the Logicle transformation in there is kind of fixed by now (this took like 8 months to accomplish) but some other features will likely be added before Gating-ML 2.0 is released (e.g., BD requests the quad gates to be added). > 3) Future updates will be made to flowCore to > a. Add the new parameter "A" defined for the transform. > b. Move all the inputs to the decade scale... That sounds great... so maybe once you have that, I'll also have my implementation validated (and fixed eventually) and than we could properly compare it to the updated implementation in flowCore. This should help us resolve all outstanding issues and if we can match, it would provide additional validation for the both of us. Thanks again for doing this! Cheers, Josef -- Josef Spidlen, Ph.D. Research Associate, Terry Fox Laboratory, BC Cancer Agency 675 West 10th Avenue, V5Z 1L3 Vancouver, BC, Canada Tel: +1 (604) 675-8000 x 7755
ADD REPLY

Login before adding your answer.

Traffic: 526 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6