Hi,
is there a way to convert a sequence (in my case a fastA character vector) into a IRanges object based on a numeric vector? the vector contains the positions of a specific pattern in the fastA sequence.
> myseq "MKLSVNEAQLGFPESLKTGQMMDESDEDFKELCASFFQRVKKHGIKEVSGE" > Positions <- words.pos("K", myseq) [1] 2 17 30 41 42 46
I would like to convert the sequence into a IRanges object were the positions of the pattern give me the end positions of each range in the list. the start position should be one bigger than the last end position
it should be something like that:
IRanges object with 90 ranges and 0 metadata columns: start end width <integer> <integer> <integer> [1] 1 2 2 [2] 3 17 15 [3] 18 30 13 ...
What I have until now is this:
> Start <- c(1, Positions+1) > End <- c(Positions, nchar(myseq)) > myRanges <- IRanges(start = Start, end = End)
Is there a more efficient method to do it?
I also have the constrain here, that I take the positions as the end position, But what if i want to have it at the beginning pf my pattern and not the end?
thanks for any advices
Assa
this case covers my problem, if the pattern i am looking for is at the end of the sub-sequences, as in the case above. But what if I would like to have the pattern as the beginning of my sub-sequences? (here I can probably do
Positions -1
) or if I have two different amino-acids I am looking for (like"K"
and"R"
), and would like to cut the sequence before"K"
, but after"R"
etc.I know it sounds very complicated, but is there a more flexible way of looking for a specific pattern and decide how to handle it based on the pattern(s) I am looking for?
It depends on the specific case. In special cases, just do the math directly and pass the endpoints to the IRanges constructor.
Thanks, that what I was doing, but this is sometimes not so straightforward.