Question

convert a sequence to Ranges object

0

Entering edit mode

Assa Yeroslaviz ★ 1.5k

@assa-yeroslaviz-1597

Last seen 6 days ago

Germany

Hi,

is there a way to convert a sequence (in my case a fastA character vector) into a IRanges object based on a numeric vector? the vector contains the positions of a specific pattern in the fastA sequence.

> myseq
"MKLSVNEAQLGFPESLKTGQMMDESDEDFKELCASFFQRVKKHGIKEVSGE"
> Positions <- words.pos("K", myseq)
 [1]  2 17 30 41 42 46

I would like to convert the sequence into a IRanges object were the positions of the pattern give me the end positions of each range in the list. the start position should be one bigger than the last end position

it should be something like that:

IRanges object with 90 ranges and 0 metadata columns:
           start       end     width
       <integer> <integer> <integer>
   [1]         1         2         2
   [2]         3        17        15
   [3]        18        30        13 ...

What I have until now is this:

> Start <- c(1, Positions+1)
> End <- c(Positions, nchar(myseq))
> myRanges <- IRanges(start = Start, end = End)

Is there a more efficient method to do it?

I also have the constrain here, that I take the positions as the end position, But what if i want to have it at the beginning pf my pattern and not the end?

thanks for any advices

Assa

iranges fasta split • 1.7k views

ADD COMMENT • link updated 8.4 years ago by Michael Lawrence ★ 11k • written 8.4 years ago by Assa Yeroslaviz ★ 1.5k

score 1 · Accepted Answer · 2017-02-06

1

Entering edit mode

Michael Lawrence ★ 11k

@michael-lawrence-3846

Last seen 3.6 years ago

United States

PartitioningByEnd(c(Positions, nchar(myseq)))

ADD COMMENT • link 8.4 years ago Michael Lawrence ★ 11k

0

Entering edit mode

this case covers my problem, if the pattern i am looking for is at the end of the sub-sequences, as in the case above. But what if I would like to have the pattern as the beginning of my sub-sequences? (here I can probably do Positions -1) or if I have two different amino-acids I am looking for (like "K" and "R"), and would like to cut the sequence before "K", but after "R" etc.

I know it sounds very complicated, but is there a more flexible way of looking for a specific pattern and decide how to handle it based on the pattern(s) I am looking for?

ADD REPLY • link 8.4 years ago Assa Yeroslaviz ★ 1.5k

1

Entering edit mode

It depends on the specific case. In special cases, just do the math directly and pass the endpoints to the IRanges constructor.