flowset/ncdfFlowset replace keywords doesn't work right
1
0
Entering edit mode
@anderstondell-9441
Last seen 10 months ago

Hi, I am struggling with updating/replacing keywords in a ncdfFlowset, as it seems that my code removes all other keywords as well (not wanted), and replaces the specified keyword to NA (as wanted).

I am grateful for any suggestions.

library(ncdfFlow)
data(GvHD)
nc <- ncdfFlowSet(GvHD[1:2])
keyword(nc,list('$FIL','$DATE'))

keyword(nc)<-list('$FIL'=NA) keyword(nc,list('$FIL','$DATE'))#All other keywords are removed...  I have tried to specify the keyword, but the code doesn't work: keyword(nc, keyword=list('$FIL'))<-NA#Doesn't work

ncdfFlow flowCore • 536 views
0
Entering edit mode
Jake Wagner ▴ 280
@jake-wagner-19995
Last seen 11 months ago

Hello. Sorry for the confusion with keyword replacement. There were actually a few recent changes for the sake of consistency across flowCore, flowWorkspace, and ncdfFlow. You can read some of the discussion here and it was mentioned in a NEWS update. Basically, keyword replacement now works in a manner similar to other assignment in R by completely replacing its target. That means that you sort of have two main options:

1) You pull all of the keywords, change the ones you want, then reassign them all

2) You index in to the keyword you want to replace and replace it directly

The approach of 1:

library(ncdfFlow)
data(GvHD)
nc <- ncdfFlowSet(GvHD[1:2])
keyword(nc,list('$FIL','$DATE'))

all_keys <- keyword(nc[[1]])
all_keys[['$FIL']] <- NA keyword(nc[[1]]) <- all_keys keyword(nc,list('$FIL','$DATE'))  The approach of 2 (probably a little easier with cleaner syntax): library(ncdfFlow) data(GvHD) nc <- ncdfFlowSet(GvHD[1:2]) keyword(nc,list('$FIL','$DATE')) keyword(nc[[1]])['$FIL'] <- NA
keyword(nc,list('$FIL','$DATE'))


0
Entering edit mode

Thank you, this works, but takes extremly long time with my ncdfFlowset of 2400 frames.

See also discussion 20 months ago: https://support.bioconductor.org/p/117490/ Is there a way to change selected keywords on multiple flowframes in a ncdfFlowset (or cytoset)? Time is also a issue, it seems. I actually need to set 4 keywords in every frame to 'NA'.

With regards,

0
Entering edit mode

Hi again Anders. Sorry for the delay. We are just now reworking the flowWorkspace::cf_keyword_ methods to be signifcantly more efficient (which will then make the correpsonding cytoset operations more efficient as well). See https://github.com/RGLab/flowWorkspace/pull/351. In particular, as you can see on that PR, setting individual keywords should see an approximately 50-60x speedup.

I'll keep you posted as soon as those changes are merged to GitHub and Bioconductor and I will try to cytoset versions (that loop through cytoframes at the C++ level instead of the R level) up quickly as well.

0
Entering edit mode

Hi Jake, Thank you for great work and help! I am looking forward to updates, the 'cytoverse' packages have been superb in an ongoing research project. The reason for my interest in replacing keywords is that in studies on patient data, sensitive data may be stored in (several) keywords and samplenames, and should be removed.

With regards,

Anders

0
Entering edit mode

Hi again Anders. The changes have been made and should be available from both GitHub and Bioconductor now. This depends on cytolib changes as well, so you'll need to reinstall cytolib as well as flowCore, flowWorkspace, and CytoML which compile against it.

A few notes:

1) This is for cytoframe and cytoset as opposed to ncdfFlowSet, which is no longer in active development (because cytoset effectively supesedes it for HDF5 backend storage and also supports TileDB).

2) There are cytoframe, cytoset, GatingHierarchy, and Gatingset variants of all of these direct keyword manipulation methods. See the help doc using ?cf_keyword_set for details. keyword<- is still supported, but will do full replacement which is why you were getting that dramatic slowdown. cf_keyword_set calls cytolib-level methods to selectively replace keywords in the cytoframe at the C++ level.

3) Unfortunately, due to the time crunch of beating the Bioconductor deadline for API changes, as well as some code re-organizing issues, the process of iterating over cytoframes is still done at the R level. This can be remedied soon, but you should still see a significant speed improvement from the partial keyword replacement. Let me know if that's not the case.

4) In cytolib, keywords at the cytoframe level are stored as strings. If you assign NA to a keyword, it will be stored as the string "NA" (probably not what you want). If you want to remove/empty a keyword using partial replacement as opposed to the full replacement of keyword<-, you can use cf_keyword_remove (or cs_keyword_remove, etc).

I hope that helps. Let me know if you run in to any more issues and thanks for reporting this.

-Jake

0
Entering edit mode

Excellent!

Thanks again for all help and these excellent FC-tools! Keyword-methods are working as expected now, using cytoset-methods.

One more question: When I save cytoset in temp-folder, they end up as files similar to the file-structure I know from saving ncdfFlow-objects (2-3 files, one very big .nc-file), whereas when I try to save the cytoset-object (with >2000 cytoframes) in my own folder using a path-argument, they end up as >2000 files with '.fcs.h5' or '.fcs.pb' file-extensions, and one '.gs'-file. The documentation on save_cytoset is not too descriptive, yet. Should I include any other arguments to control the folder (path, '\c:\user\...' or 'M:/...etc.') and file-structure/extension of 'save_cytoset'?

tmp <- tempfile()
save_cytoset(cs, tmp)


Anders

0
Entering edit mode

cytoset stores data sample-wise, which is why you see many files. save_cytoset borrows the underlying code of save_gs, so you see the extra pb and gs files. But you shouldn't need to care too much about the format, load_cytoset should take care of the details when you need to load it back from disk.