Hi, I am struggling with updating/replacing keywords in a ncdfFlowset, as it seems that my code removes all other keywords as well (not wanted), and replaces the specified keyword to NA (as wanted).
I am grateful for any suggestions.
library(ncdfFlow)
data(GvHD)
nc <- ncdfFlowSet(GvHD[1:2])
keyword(nc,list('$FIL','$DATE'))
keyword(nc)<-list('$FIL'=NA)
keyword(nc,list('$FIL','$DATE'))#All other keywords are removed...
I have tried to specify the keyword, but the code doesn't work:
keyword(nc, keyword=list('$FIL'))<-NA#Doesn't work
Thank you, this works, but takes extremly long time with my ncdfFlowset of 2400 frames.
See also discussion 20 months ago: https://support.bioconductor.org/p/117490/ Is there a way to change selected keywords on multiple flowframes in a ncdfFlowset (or cytoset)? Time is also a issue, it seems. I actually need to set 4 keywords in every frame to 'NA'.
With regards,
Hi again Anders. Sorry for the delay. We are just now reworking the
flowWorkspace::cf_keyword_
methods to be signifcantly more efficient (which will then make the correpsondingcytoset
operations more efficient as well). See https://github.com/RGLab/flowWorkspace/pull/351. In particular, as you can see on that PR, setting individual keywords should see an approximately 50-60x speedup.I'll keep you posted as soon as those changes are merged to GitHub and Bioconductor and I will try to
cytoset
versions (that loop throughcytoframe
s at the C++ level instead of the R level) up quickly as well.Hi Jake, Thank you for great work and help! I am looking forward to updates, the 'cytoverse' packages have been superb in an ongoing research project. The reason for my interest in replacing keywords is that in studies on patient data, sensitive data may be stored in (several) keywords and samplenames, and should be removed.
With regards,
Anders
Hi again Anders. The changes have been made and should be available from both GitHub and Bioconductor now. This depends on cytolib changes as well, so you'll need to reinstall cytolib as well as flowCore, flowWorkspace, and CytoML which compile against it.
A few notes:
1) This is for cytoframe and cytoset as opposed to ncdfFlowSet, which is no longer in active development (because cytoset effectively supesedes it for HDF5 backend storage and also supports TileDB).
2) There are cytoframe, cytoset, GatingHierarchy, and Gatingset variants of all of these direct keyword manipulation methods. See the help doc using
?cf_keyword_set
for details.keyword<-
is still supported, but will do full replacement which is why you were getting that dramatic slowdown.cf_keyword_set
callscytolib
-level methods to selectively replace keywords in the cytoframe at the C++ level.3) Unfortunately, due to the time crunch of beating the Bioconductor deadline for API changes, as well as some code re-organizing issues, the process of iterating over cytoframes is still done at the R level. This can be remedied soon, but you should still see a significant speed improvement from the partial keyword replacement. Let me know if that's not the case.
4) In cytolib, keywords at the cytoframe level are stored as strings. If you assign
NA
to a keyword, it will be stored as the string"NA"
(probably not what you want). If you want to remove/empty a keyword using partial replacement as opposed to the full replacement ofkeyword<-
, you can usecf_keyword_remove
(orcs_keyword_remove
, etc).I hope that helps. Let me know if you run in to any more issues and thanks for reporting this.
-Jake
Excellent!
Thanks again for all help and these excellent FC-tools! Keyword-methods are working as expected now, using cytoset-methods.
One more question: When I save cytoset in temp-folder, they end up as files similar to the file-structure I know from saving ncdfFlow-objects (2-3 files, one very big .nc-file), whereas when I try to save the cytoset-object (with >2000 cytoframes) in my own folder using a path-argument, they end up as >2000 files with '.fcs.h5' or '.fcs.pb' file-extensions, and one '.gs'-file. The documentation on save_cytoset is not too descriptive, yet. Should I include any other arguments to control the folder (path, '\c:\user\...' or 'M:/...etc.') and file-structure/extension of 'save_cytoset'?
Anders
cytoset stores data sample-wise, which is why you see many files. save_cytoset borrows the underlying code of save_gs, so you see the extra pb and gs files. But you shouldn't need to care too much about the format, load_cytoset should take care of the details when you need to load it back from disk.
Hi Jake,
is it possible with these keyword functions to change certain keywords in all cytoframes within a cytoset to different values for each cytoframe? As far as my testing and understanding of these functions goes, I could only change a certain key in all cytoframes of a cytoset to the same value. Maybe the following example explains what I want to achieve:
The
cs_keyword_set
command leads to the following error:Is there a way using the cytoverse to do such a batch editing of keywords for cytoframes within a cytoset?