Question: Subsetting/subsequencing AAStringSet containing sequences of unequal length
0
2.7 years ago by
United States/Phoenix, AZ
bri.isabella0 wrote:

Hi all!

I have a question about subsetting with Biostrings that I hope I can get some insight on. I've read in a fasta file with multiple sequences as a DNAStringSet and translated that to an AAStringSet. From here, I would like to extract a subsequence from those AA sequences according to position within the sequence. For example, from start point 65-75 or 14-38. When I try to use the subseq function, I get the following error:

> subseq(myAAStringSet, start=14, end=38)
Error in .Call2("solve_user_SEW", refwidths, start, end, width, translate.negative.coord,  :
solving row 685: 'allow.nonnarrowing' is FALSE and the supplied end (38) is > refwidth

This only occurs when I use certain combinations of start and end points, as end=37 printed an output without error. I know that some of the sequences in my set are shorter than others and will not include the subsequence I'm looking to extract. Is there a way I can remove these shorter sequences from StringSet objects so I can run the subseq function without error? Or am I misinterpreting this error? I've read through the Biostrings reference documentation and can't seem to find the answer I'm looking for, hopefully I'm not overlooking anything. Any help is deeply appreciated!

modified 2.7 years ago by Hervé Pagès ♦♦ 14k • written 2.7 years ago by bri.isabella0
Answer: Subsetting/subsequencing AAStringSet containing sequences of unequal length
0
2.7 years ago by
Hervé Pagès ♦♦ 14k
United States
Hervé Pagès ♦♦ 14k wrote:

Hi,

Yes subseq() will fail if the subsequences specified by start and end go beyond the ends of the sequences in myAAStringSet. One way to remedy this is to subset myAAStringSet e.g. with myAAStringSet[width(myAAStringSet) <= 38]. Another approach is to stop the extraction when reaching the end of the sequence from which to extract. This can be done with:

subseq(myAAStringSet, start=14, end=pmin(width(myAAStringSet), 38))

Hope this helps,

H.