Using DNAStrings with Rcpp without converting to Character
1
0
Entering edit mode
James • 0
@1cecd82d
Last seen 5 months ago
United Kingdom

Hi there!

I'm writing some code to extract the Accumulated Natural Vectors from all the sequences in a DNAStringSet object. To speed things up I've written the code in C++ using Rcpp, and it works as long as I convert each of the DNAString objects to character vectors first.

For larger sequences this conversion is a bottleneck, and I was wondering if I can avoid it and pass the DNAString object directly. However, I can't find any documentation for passing a DNAString (or more generally, a BString object) to C++ with Rcpp - is there a best practice way of doing this?

All the best!

Biostrings DNAString Rcpp • 182 views
0
Entering edit mode
@herve-pages-1542
Last seen 45 minutes ago
Seattle, WA, United States

Hi James,

Sorry for missing this. Do you still need to do this?

The standard way to access the string data of an XStringSet object at the C level is to use the "XStringSet_holder interface". This is not documented sorry. Note that for DNAStringSet and RNAStringSet objects the string data is encoded, which can make things a little complicated. A few Bioconductor packages have figured out how to do this. See for example the XStringSet2ByteStringVec function in the kebabs package here.

Other Bioconductor packages using the "XStringSet_holder interface": ShortRead, VariantFiltering, and DECIPHER.

Let me know if you need further help for this.

Best,

H.