Question

how to catch an error when using subseq() in vectorized way

1

Entering edit mode

tobias.kockmann ▴ 20

@tobiaskockmann-11966

Last seen 7.1 years ago

Hi bioC developers,

I am trying to extract the downstream sequences of peptides that have been matched to proteins. The function to do this is the subseq() function. My call of this function looks like this:

> down.aa <- subseq(ref.proteome[df$acc], start = df$start+nchar(as.character(df$pep)), width = 1)
Error in .Call2("solve_user_SEW", refwidths, start, end, width, translate.negative.coord,  :
  solving row 2: 'allow.nonnarrowing' is FALSE and the solved end (106) is > refwidth

and creates an error because the sec. peptide in df is exactly terminal with respect to its matching protein, so subseq() tries to extract a subsequence "out-of-bounds" and throws an error. So I thaught well, no problem, just put subseq() into a tryCatch() statement so in case of an error it returns NA. Unfortunately, the vectorisation in combination with tryCatch does not really do what I want:

down.aa <- tryCatch(subseq(ref.proteome[df$acc], start = df$start+nchar(as.character(df$pep)), width = 1, ), error = function(e){return(NA)})

The call works, but now it get NA as soon as one of the subseq calls throws an error:

> down.aa
[1] NA

biostrings subseq • 1.7k views

ADD COMMENT • link updated 7.5 years ago by Martin Morgan 25k • written 7.5 years ago by tobias.kockmann ▴ 20

score 2 · Accepted Answer · 2017-02-06

2

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 1 day ago

United States

Rather than catch an error you could try to avoid it, e.g.,

aa = AAStringSet(c("ABCD", "DCBA"))
start = c(3, 4)
width = ifelse(start >= width(aa), 0, 1)
subseq(aa, start, width = width)

ADD COMMENT • link 7.5 years ago Martin Morgan 25k

0

Entering edit mode

Hi Martin,

yes that is of course an option, although I feel that solveUserSEW() does this job already quite exhaustively when subseq() is called. What I do not really understand is why subseq() stops when it is called on AAStringSet and only a single SEW-triplet creates an error. Wouldn't it be more convenient if return NA for that particular case and a warning, but process the rest of the AAStringSet?

> aa = AAStringSet(c("ABCD", "DCBA"))
> start = c(3, 4)
> subseq(aa, start, width=1)
  A AAStringSet instance of length 2
    width seq
[1]     1 C
[2]     1 A
> subseq(aa, start, width=2)
Error in .Call2("solve_user_SEW", refwidths, start, end, width, translate.negative.coord,  :
  solving row 2: 'allow.nonnarrowing' is FALSE and the solved end (5) is > refwidth

Why this all or nothing behavior?

ADD REPLY • link 7.5 years ago tobias.kockmann ▴ 20

0

Entering edit mode

A practical answer is that StringSet objects don't have the concept of an 'NA'.

ADD REPLY • link 7.5 years ago Martin Morgan 25k

0

Entering edit mode

Hi Martin,

ok let's compare how subseq() and substr() behave:

> aa <- AAStringSet(x = c("ABCD", "ABC"))

> aa
  A AAStringSet instance of length 2
    width seq
[1]     4 ABCD
[2]     3 ABC
> bb <- c("ABCD", "ABC")

> bb
[1] "ABCD" "ABC"

> subseq(aa, start = 3, width = 2)
Error in .Call2("solve_user_SEW", refwidths, start, end, width, translate.negative.coord,  :
  solving row 2: 'allow.nonnarrowing' is FALSE and the solved end (4) is > refwidth

> substr(bb, start = 3, stop = 5)
[1] "CD" "C"

Don't you think a comparable behavior for subseq() would make it more userfriendly (incl. a warning)?

ADD REPLY • link 7.5 years ago tobias.kockmann ▴ 20

1

Entering edit mode

But substr on StringSets does behave as substr on a character vector?

> substr(aa, start=3, stop=5)
[1] "CD" "C"

If you're asking about my opinion, then I'd rather be alerted to logical errors in my code. It forces me to deal with the issue rather than propagating misunderstanding to later parts of the analysis. Also, I wouldn't want subseq to replicate behavior already available via substr.

ADD REPLY • link 7.5 years ago Martin Morgan 25k

0

Entering edit mode

Ohhh...I did not know that substr() can process instances of the AAStringSet class, but that is of very useful information. THX!

ADD REPLY • link 7.5 years ago tobias.kockmann ▴ 20

0

Entering edit mode

In retrospect it seems like the algorithm is to 'capture downstream sequences'. So a precondition is that there are actually downstream sequences. Rather than using the solution in my answer to introduce complexity (e.g., checking for zero-width subsequences), it seems like one should just ensure that the precondition is met, e.g.,

fail <- start >= width(aa)
if (any(fail)) {
    warning("removing ", sum(fail),
            " sequences without downstream amino acids")
    aa <- aa[!fail]
}

ADD REPLY • link 7.5 years ago Martin Morgan 25k