I'm trying to perform set operations (union, intersect, setdiff) on DNAStringSets, but doing so strips off the names. How can I do set operations while keeping the names intact?
I'm trying to perform set operations (union, intersect, setdiff) on DNAStringSets, but doing so strips off the names. How can I do set operations while keeping the names intact?
Hi,
The implementation of the set operations for XStringSet objects is a relic from prehistoric times. A better (and more generic) implementation is:
setMethod("union", c("Vector", "Vector"), function(x, y) unique(c(x, y)) ) setMethod("intersect", c("Vector", "Vector"), function(x, y) unique(x[x %in% y]) ) setMethod("setdiff", c("Vector", "Vector"), function(x, y) unique(x[!(x %in% y)]) )
They don't coerce to character vector internally (so are more efficient) and they propagate the names and metadata columns of the first argument (x
).
Note that right now if you define the above methods (by copy/past'ing the above code in your session), the more specific methods for XStringSet objects will get in the way, that is, dispatch will still get the methods for XStringSet objects. So for now, to work around this, you would need to replace the occurrences of Vector with XStringSet. I'm in the process of adding the above methods to the S4Vectors package (where they belong) and removing the old methods for XStringSet objects from the Biostrings package. I'll let you know when I'm done.
Cheers,
H.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Looking at the source... the SetOperation method does a unique on the arguments, turns them into character vectors, then performs the set operation on the vector. But as.character strips attributes. I can see the use of this in some cases, but they appear inconsistent with the rest of the class, as it doesn't appear that any other parts of the class assume or enforce that sequences are unique or that names should not be preserved if possible.
I don't know the proper R way this could be done, but for the purposes of the set operations, could the name be appended to the character vector prior to the set operation and then extracted afterwards?
Discovered that as.character accepts use.names, but this has no effect on the result of the set operation.