Question

The ShortRead package does not handle quality score '~' (Q=93) found in PacBio fastq files

1

Entering edit mode

benjamin.j.callahan ▴ 50

@benjaminjcallahan-9771

Last seen 4.2 years ago

I am working with PacBio data in fastq format. The PacBio software outputs fastqs that contain quality scores ranging all the way up to Q=93, encoded as the ASCII 33+93 = `~`.

This is causing a problem within the ShortRead package, because the fastq quality score alphabet is hardcoded to extend only to ASCII 33+92='}' (see below for the problematic code). As a result, all ShortRead functions that work with quality scores ignore the Q=93 scores.

Is there a workaround for this problem that does not require case-by-case rewriting of ShortRead functions (e.g. somehow substituting the S4 method to return the 32:126 alphabet)? And why are quality scores cut-off at 92?

---

> selectMethod("alphabet", "FastqQuality")
Method Definition:

function (x, ...)
rawToChar(as.raw(32:125), TRUE)
<environment: namespace:ShortRead>

Signatures:
        x           
target  "FastqQuality"
defined "FastqQuality"

shortread pacbio fastq • 1.5k views

ADD COMMENT • link updated 6.2 years ago by Martin Morgan 25k • written 6.2 years ago by benjamin.j.callahan ▴ 50

score 2 · Accepted Answer · 2018-02-20

2

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 11 hours ago

United States

Thanks I updated this in ShortRead 1.36.1 (release) or 1.37.2 (devel); these will propagate and be available via biocLite() either tomorrow morning or Wednesday morning, Eastern time.

ADD COMMENT • link 6.2 years ago Martin Morgan 25k

1

Entering edit mode

I just tested this in 1.36.1 and am still getting the same behavior (ignoring Q=93) and the underlying code for alphabet still looks the same (aphabet not extended to Q=93).

Edit: Nvm, looks good now, just hadn't fully detached the old package version. Thanks again!