Trim a variable number of bases off read, depending on read length
2
0
Entering edit mode
knaxerova ▴ 10
@knaxerova-7541
Last seen 3.7 years ago
United States

Dear list,

this seems like a trivial question, but I haven't been able to find a function to serve this particular need:

I want to trim a variable number of bases off the end of a read, in the following manner:

If the length of a read is x, trim y bases.
If the length of a read is x+1, trim y+1 bases.
And so on.

I am just getting to know the Biostrings package, but haven't seen a function that would help here. Any suggestions would be much appreciated. 

Thanks so much.
Kamila


 

 

Biostrings • 1.1k views
ADD COMMENT
2
Entering edit mode
@martin-morgan-1513
Last seen 4 months ago
United States

Use the narrow() function, specifying end=. Be careful not to trim past nothing, using pmax().

If you are working with fastq files, then you'll want to use the ShortRead package.

For big files, iterate through in chunks using FastqStreamer().

A complete solution might be

fin = "path/to/some.fastq"
fout <- tempfile(fileext='.fastq')

strm = FastqStreamer(fin)
repeat {
    fq = readFastq(strm)
    if (length(fq) == 0)
        break
    fq = narrow(fq, end=pmax(0, width(fq) - 5))
    writeFastq(fq, fout, "a"))
}
ADD COMMENT
0
Entering edit mode
knaxerova ▴ 10
@knaxerova-7541
Last seen 3.7 years ago
United States

Perfect! Thanks so much.

ADD COMMENT

Login before adding your answer.

Traffic: 657 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6