Question: strsplit method for DNAStringSet objects
0
gravatar for jma1991
2.3 years ago by
jma199130
jma199130 wrote:

Is there a strsplit method for DNAStringSet objects? I have a DNAStringSet object generated from the readFastq function (ShortRead package). In the middle of my reads is a barcode which I would like to use to split the reads into two reads (effectively treating the split read as paired-end reads).

biostrings shortread • 602 views
ADD COMMENTlink modified 2.3 years ago by Hervé Pagès ♦♦ 14k • written 2.3 years ago by jma199130
2

The DNAStringSet object sequences can be convert to a character vector to be used in strsplit.  For instance:

library(ShortRead)
sp <- SolexaPath(system.file('extdata', package='ShortRead'))
rfq <- readFastq(analysisPath(sp), pattern="s_1_sequence.txt")
strsplit(as.character(id(rfq)), split="_1_1_1_")
strsplit(as.character(sread(rfq)), split="CGCG")
 
ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by shepherl ♦♦ 1.4k

Ah so simple, silly me. Thanks for the answer.

ADD REPLYlink written 2.3 years ago by jma199130
Answer: strsplit method for DNAStringSet objects
3
gravatar for Hervé Pagès
2.3 years ago by
Hervé Pagès ♦♦ 14k
United States
Hervé Pagès ♦♦ 14k wrote:

Hi,

I just added an strsplit method for XStringSet objects to the Biostrings packages. So you shouldn't need to convert to character vector anymore before calling strsplit(). This conversion can be expensive for an XStringSet object with tens of millions of sequences in it.

It's in Biostrings 2.43.8 (BioC devel only), which will propagate and become available via biocLite() in the next 24 hours or so.

Cheers,

H.

ADD COMMENTlink written 2.3 years ago by Hervé Pagès ♦♦ 14k

That's fantastic, thank you very much. I'll test when it becomes available.

ADD REPLYlink written 2.3 years ago by jma199130
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 114 users visited in the last hour