Search
Question: how to catch an error when using subseq() in vectorized way
1
gravatar for tobias.kockmann
9 months ago by
tobias.kockmann20 wrote:

Hi bioC developers,

I am trying to extract the downstream sequences of peptides that have been matched to proteins. The function to do this is the subseq() function. My call of this function looks like this:

> down.aa <- subseq(ref.proteome[df$acc], start = df$start+nchar(as.character(df$pep)), width = 1)
Error in .Call2("solve_user_SEW", refwidths, start, end, width, translate.negative.coord,  :
  solving row 2: 'allow.nonnarrowing' is FALSE and the solved end (106) is > refwidth

and creates an error because the sec. peptide in df is exactly terminal with respect to its matching protein, so subseq() tries to extract a subsequence "out-of-bounds" and throws an error. So I thaught well, no problem, just put subseq() into a tryCatch() statement so in case of an error it returns NA. Unfortunately, the vectorisation in combination with tryCatch does not really do what I want:

down.aa <- tryCatch(subseq(ref.proteome[df$acc], start = df$start+nchar(as.character(df$pep)), width = 1, ), error = function(e){return(NA)})

The call works, but now it get NA as soon as one of the subseq calls throws an error:

> down.aa
[1] NA

 

 

ADD COMMENTlink modified 9 months ago by Martin Morgan ♦♦ 20k • written 9 months ago by tobias.kockmann20
1
gravatar for Martin Morgan
9 months ago by
Martin Morgan ♦♦ 20k
United States
Martin Morgan ♦♦ 20k wrote:

Rather than catch an error you could try to avoid it, e.g., 

aa = AAStringSet(c("ABCD", "DCBA"))
start = c(3, 4)
width = ifelse(start >= width(aa), 0, 1)
subseq(aa, start, width = width)
ADD COMMENTlink written 9 months ago by Martin Morgan ♦♦ 20k

Hi Martin,

yes that is of course an option, although I feel that solveUserSEW() does this job already quite exhaustively when subseq() is called. What I do not really understand is why subseq() stops when it is called on AAStringSet and only a single SEW-triplet creates an error. Wouldn't it be more convenient if return NA for that particular case and a warning, but process the rest of the AAStringSet?

> aa = AAStringSet(c("ABCD", "DCBA"))
> start = c(3, 4)
> subseq(aa, start, width=1)
  A AAStringSet instance of length 2
    width seq
[1]     1 C
[2]     1 A
> subseq(aa, start, width=2)
Error in .Call2("solve_user_SEW", refwidths, start, end, width, translate.negative.coord,  :
  solving row 2: 'allow.nonnarrowing' is FALSE and the solved end (5) is > refwidth

Why this all or nothing behavior?

ADD REPLYlink modified 9 months ago • written 9 months ago by tobias.kockmann20

A practical answer is that StringSet objects don't have the concept of an 'NA'.

ADD REPLYlink written 9 months ago by Martin Morgan ♦♦ 20k

Hi Martin,

ok let's compare how subseq() and substr() behave:

> aa <- AAStringSet(x = c("ABCD", "ABC"))

> aa
  A AAStringSet instance of length 2
    width seq
[1]     4 ABCD
[2]     3 ABC
> bb <- c("ABCD", "ABC")

> bb
[1] "ABCD" "ABC"

> subseq(aa, start = 3, width = 2)
Error in .Call2("solve_user_SEW", refwidths, start, end, width, translate.negative.coord,  :
  solving row 2: 'allow.nonnarrowing' is FALSE and the solved end (4) is > refwidth

> substr(bb, start = 3, stop = 5)
[1] "CD" "C"

Don't you think a comparable behavior for subseq() would make it more userfriendly (incl. a warning)?

ADD REPLYlink written 9 months ago by tobias.kockmann20
1

But substr on StringSets does behave as substr on a character vector?

> substr(aa, start=3, stop=5)
[1] "CD" "C" 

If you're asking about my opinion, then I'd rather be alerted to logical errors in my code. It forces me to deal with the issue rather than propagating misunderstanding to later parts of the analysis. Also, I wouldn't want subseq to replicate behavior already available via substr.

ADD REPLYlink written 9 months ago by Martin Morgan ♦♦ 20k

Ohhh...I did not know that substr() can process instances of the AAStringSet class, but that is of very useful information. THX!

ADD REPLYlink written 9 months ago by tobias.kockmann20

In retrospect it seems like the algorithm is to 'capture downstream sequences'. So a precondition is that there are actually downstream sequences. Rather than using the solution in my answer to introduce complexity (e.g., checking for zero-width subsequences), it seems like one should just ensure that the precondition is met, e.g.,

fail <- start >= width(aa)
if (any(fail)) {
    warning("removing ", sum(fail),
            " sequences without downstream amino acids")
    aa <- aa[!fail]
}
ADD REPLYlink modified 9 months ago • written 9 months ago by Martin Morgan ♦♦ 20k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 173 users visited in the last hour