strsplit method for DNAStringSet objects
1
0
Entering edit mode
jma1991 ▴ 70
@jma1991-11856
Last seen 19 hours ago
Cumbernauld

Is there a strsplit method for DNAStringSet objects? I have a DNAStringSet object generated from the readFastq function (ShortRead package). In the middle of my reads is a barcode which I would like to use to split the reads into two reads (effectively treating the split read as paired-end reads).

biostrings shortread • 1.7k views
ADD COMMENT
2
Entering edit mode

The DNAStringSet object sequences can be convert to a character vector to be used in strsplit.  For instance:

library(ShortRead)
sp <- SolexaPath(system.file('extdata', package='ShortRead'))
rfq <- readFastq(analysisPath(sp), pattern="s_1_sequence.txt")
strsplit(as.character(id(rfq)), split="_1_1_1_")
strsplit(as.character(sread(rfq)), split="CGCG")
 
ADD REPLY
0
Entering edit mode

Ah so simple, silly me. Thanks for the answer.

ADD REPLY
3
Entering edit mode
@herve-pages-1542
Last seen 10 hours ago
Seattle, WA, United States

Hi,

I just added an strsplit method for XStringSet objects to the Biostrings packages. So you shouldn't need to convert to character vector anymore before calling strsplit(). This conversion can be expensive for an XStringSet object with tens of millions of sequences in it.

It's in Biostrings 2.43.8 (BioC devel only), which will propagate and become available via biocLite() in the next 24 hours or so.

Cheers,

H.

ADD COMMENT
0
Entering edit mode

That's fantastic, thank you very much. I'll test when it becomes available.

ADD REPLY

Login before adding your answer.

Traffic: 439 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6