Question: convert a sequence to Ranges object
0
2.2 years ago by
Assa Yeroslaviz1.4k
Munich, Germany
Assa Yeroslaviz1.4k wrote:

Hi,

is there a way to convert a sequence (in my case a fastA character vector) into a IRanges object based on a numeric vector? the vector contains the positions of a specific pattern in the fastA sequence.

> myseq
"MKLSVNEAQLGFPESLKTGQMMDESDEDFKELCASFFQRVKKHGIKEVSGE"
> Positions <- words.pos("K", myseq)
[1]  2 17 30 41 42 46

I would like to convert the sequence into a IRanges object were the positions of the pattern give me the end positions of each range in the list. the start position should be one bigger than the last end position

it should be something like that:

IRanges object with 90 ranges and 0 metadata columns:
start       end     width
<integer> <integer> <integer>
[1]         1         2         2
[2]         3        17        15
[3]        18        30        13 ...


What I have until now is this:

> Start <- c(1, Positions+1)
> End <- c(Positions, nchar(myseq))
> myRanges <- IRanges(start = Start, end = End)

Is there a more efficient method to do it?

I also have the constrain here, that I take the positions as the end position, But what if i want to have it at the beginning pf my pattern and not the end?

Assa

iranges fasta split • 500 views
modified 2.2 years ago by Michael Lawrence10k • written 2.2 years ago by Assa Yeroslaviz1.4k
Answer: convert a sequence to Ranges object
1
2.2 years ago by
United States
Michael Lawrence10k wrote:
PartitioningByEnd(c(Positions, nchar(myseq)))

this case covers my problem, if the pattern i am looking for is at the end of the sub-sequences, as in the case above. But what if I would like to have the pattern as the beginning of my sub-sequences? (here I can probably do Positions -1) or if I have two different amino-acids I am looking for (like "K" and "R"), and would like to cut the sequence before "K", but after "R" etc.

I know it sounds very complicated, but is there a more flexible way of looking for a specific pattern and decide how to handle it based on the pattern(s) I am looking for?

1

It depends on the specific case. In special cases, just do the math directly and pass the endpoints to the IRanges constructor.