The ShortRead package does not handle quality score '~' (Q=93) found in PacBio fastq files
1
1
Entering edit mode
@benjaminjcallahan-9771
Last seen 4.1 years ago

I am working with PacBio data in fastq format. The PacBio software outputs fastqs that contain quality scores ranging all the way up to Q=93, encoded as the ASCII 33+93 = `~`.

This is causing a problem within the ShortRead package, because the fastq quality score alphabet is hardcoded to extend only to ASCII 33+92='}' (see below for the problematic code). As a result, all ShortRead functions that work with quality scores ignore the Q=93 scores.

Is there a workaround for this problem that does not require case-by-case rewriting of ShortRead functions (e.g. somehow substituting the S4 method to return the 32:126 alphabet)? And why are quality scores cut-off at 92?

---

> selectMethod("alphabet", "FastqQuality")
Method Definition:

function (x, ...)
rawToChar(as.raw(32:125), TRUE)
<environment: namespace:ShortRead>

Signatures:
        x           
target  "FastqQuality"
defined "FastqQuality"
shortread pacbio fastq • 1.4k views
ADD COMMENT
2
Entering edit mode
@martin-morgan-1513
Last seen 4 weeks ago
United States

Thanks I updated this in ShortRead 1.36.1 (release) or 1.37.2 (devel); these will propagate and be available via biocLite() either tomorrow morning or Wednesday morning, Eastern time.

ADD COMMENT
1
Entering edit mode

I just tested this in 1.36.1 and am still getting the same behavior (ignoring Q=93) and the underlying code for alphabet still looks the same (aphabet not extended to Q=93).

Edit: Nvm, looks good now, just hadn't fully detached the old package version. Thanks again!

ADD REPLY
0
Entering edit mode

Thanks Martin!

ADD REPLY

Login before adding your answer.

Traffic: 789 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6