how to catch an error when using subseq() in vectorized way
1
1
Entering edit mode
@tobiaskockmann-11966
Last seen 4.5 years ago

Hi bioC developers,

I am trying to extract the downstream sequences of peptides that have been matched to proteins. The function to do this is the subseq() function. My call of this function looks like this:

> down.aa <- subseq(ref.proteome[df$acc], start = df$start+nchar(as.character(df$pep)), width = 1)
Error in .Call2("solve_user_SEW", refwidths, start, end, width, translate.negative.coord,  :
  solving row 2: 'allow.nonnarrowing' is FALSE and the solved end (106) is > refwidth

and creates an error because the sec. peptide in df is exactly terminal with respect to its matching protein, so subseq() tries to extract a subsequence "out-of-bounds" and throws an error. So I thaught well, no problem, just put subseq() into a tryCatch() statement so in case of an error it returns NA. Unfortunately, the vectorisation in combination with tryCatch does not really do what I want:

down.aa <- tryCatch(subseq(ref.proteome[df$acc], start = df$start+nchar(as.character(df$pep)), width = 1, ), error = function(e){return(NA)})

The call works, but now it get NA as soon as one of the subseq calls throws an error:

> down.aa
[1] NA

 

 

biostrings subseq • 878 views
ADD COMMENT
2
Entering edit mode
@martin-morgan-1513
Last seen 1 day ago
United States

Rather than catch an error you could try to avoid it, e.g., 

aa = AAStringSet(c("ABCD", "DCBA"))
start = c(3, 4)
width = ifelse(start >= width(aa), 0, 1)
subseq(aa, start, width = width)
ADD COMMENT
0
Entering edit mode

Hi Martin,

yes that is of course an option, although I feel that solveUserSEW() does this job already quite exhaustively when subseq() is called. What I do not really understand is why subseq() stops when it is called on AAStringSet and only a single SEW-triplet creates an error. Wouldn't it be more convenient if return NA for that particular case and a warning, but process the rest of the AAStringSet?

> aa = AAStringSet(c("ABCD", "DCBA"))
> start = c(3, 4)
> subseq(aa, start, width=1)
  A AAStringSet instance of length 2
    width seq
[1]     1 C
[2]     1 A
> subseq(aa, start, width=2)
Error in .Call2("solve_user_SEW", refwidths, start, end, width, translate.negative.coord,  :
  solving row 2: 'allow.nonnarrowing' is FALSE and the solved end (5) is > refwidth

Why this all or nothing behavior?

ADD REPLY
0
Entering edit mode

A practical answer is that StringSet objects don't have the concept of an 'NA'.

ADD REPLY
0
Entering edit mode

Hi Martin,

ok let's compare how subseq() and substr() behave:

> aa <- AAStringSet(x = c("ABCD", "ABC"))

> aa
  A AAStringSet instance of length 2
    width seq
[1]     4 ABCD
[2]     3 ABC
> bb <- c("ABCD", "ABC")

> bb
[1] "ABCD" "ABC"

> subseq(aa, start = 3, width = 2)
Error in .Call2("solve_user_SEW", refwidths, start, end, width, translate.negative.coord,  :
  solving row 2: 'allow.nonnarrowing' is FALSE and the solved end (4) is > refwidth

> substr(bb, start = 3, stop = 5)
[1] "CD" "C"

Don't you think a comparable behavior for subseq() would make it more userfriendly (incl. a warning)?

ADD REPLY
1
Entering edit mode

But substr on StringSets does behave as substr on a character vector?

> substr(aa, start=3, stop=5)
[1] "CD" "C" 

If you're asking about my opinion, then I'd rather be alerted to logical errors in my code. It forces me to deal with the issue rather than propagating misunderstanding to later parts of the analysis. Also, I wouldn't want subseq to replicate behavior already available via substr.

ADD REPLY
0
Entering edit mode

Ohhh...I did not know that substr() can process instances of the AAStringSet class, but that is of very useful information. THX!

ADD REPLY
0
Entering edit mode

In retrospect it seems like the algorithm is to 'capture downstream sequences'. So a precondition is that there are actually downstream sequences. Rather than using the solution in my answer to introduce complexity (e.g., checking for zero-width subsequences), it seems like one should just ensure that the precondition is met, e.g.,

fail <- start >= width(aa)
if (any(fail)) {
    warning("removing ", sum(fail),
            " sequences without downstream amino acids")
    aa <- aa[!fail]
}
ADD REPLY

Login before adding your answer.

Traffic: 263 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6