I am working with PacBio data in fastq format. The PacBio software outputs fastqs that contain quality scores ranging all the way up to Q=93, encoded as the ASCII 33+93 = `~`.
This is causing a problem within the ShortRead package, because the fastq quality score alphabet is hardcoded to extend only to ASCII 33+92='}' (see below for the problematic code). As a result, all ShortRead functions that work with quality scores ignore the Q=93 scores.
Is there a workaround for this problem that does not require case-by-case rewriting of ShortRead functions (e.g. somehow substituting the S4 method to return the 32:126 alphabet)? And why are quality scores cut-off at 92?
---
> selectMethod("alphabet", "FastqQuality") Method Definition: function (x, ...) rawToChar(as.raw(32:125), TRUE) <environment: namespace:ShortRead> Signatures: x target "FastqQuality" defined "FastqQuality"
I just tested this in 1.36.1 and am still getting the same behavior (ignoring Q=93) and the underlying code for alphabet still looks the same (aphabet not extended to Q=93).
Edit: Nvm, looks good now, just hadn't fully detached the old package version. Thanks again!
Thanks Martin!