Question

Subsetting/subsequencing AAStringSet containing sequences of unequal length

0

Entering edit mode

bri.isabella • 0

@briisabella-12913

Last seen 7.0 years ago

United States/Phoenix, AZ

Hi all!

I have a question about subsetting with Biostrings that I hope I can get some insight on. I've read in a fasta file with multiple sequences as a DNAStringSet and translated that to an AAStringSet. From here, I would like to extract a subsequence from those AA sequences according to position within the sequence. For example, from start point 65-75 or 14-38. When I try to use the subseq function, I get the following error:

> subseq(myAAStringSet, start=14, end=38)
Error in .Call2("solve_user_SEW", refwidths, start, end, width, translate.negative.coord,  : 
  solving row 685: 'allow.nonnarrowing' is FALSE and the supplied end (38) is > refwidth

This only occurs when I use certain combinations of start and end points, as end=37 printed an output without error. I know that some of the sequences in my set are shorter than others and will not include the subsequence I'm looking to extract. Is there a way I can remove these shorter sequences from StringSet objects so I can run the subseq function without error? Or am I misinterpreting this error? I've read through the Biostrings reference documentation and can't seem to find the answer I'm looking for, hopefully I'm not overlooking anything. Any help is deeply appreciated!

biostrings aastringset subsetting • 2.0k views

ADD COMMENT • link updated 7.0 years ago by Hervé Pagès 16k • written 7.0 years ago by bri.isabella • 0

score 0 · Answer 1 · 2017-04-26

Hi,

Yes subseq() will fail if the subsequences specified by start and end go beyond the ends of the sequences in myAAStringSet. One way to remedy this is to subset myAAStringSet e.g. with myAAStringSet[width(myAAStringSet) <= 38]. Another approach is to stop the extraction when reaching the end of the sequence from which to extract. This can be done with:

subseq(myAAStringSet, start=14, end=pmin(width(myAAStringSet), 38))

Hope this helps,

H.