Entering edit mode
Marcin Imielinski
▴
30
@marcin-imielinski-5383
Last seen 10.3 years ago
Hi -
I'm confused about the output of deletion(pa) and insertion(pa)
functions
for pa = pairwiseAlignment(). My understanding is that they should
output
IRanges corresponding to gaps in the pattern (deletion()) and gaps in
the
subject (insertion()) in terms of alignment coordinates.
However, it appears that the outputted ranges can overlap. For
example,
the alignment (below) of a 101 letter pattern and 404 letter subject.
The
deletion ranges overlap. What sequence positions do these ranges
refer to
(pattern, subject, or alignment)?
Is this is a bug or am I misinterpreting this output?
Thanks
Marcin
##############################################################
> str[1]
[1]
"TCCTTGCACATTGATAAGTTCACATCTGAAATTTGCATGACATAAACATACAGTTGAGAAGGAGAGAAC
GTATGCCCTATGGTAAATATTGACATTTTAAA"
> str[2]
[1]
"CTGGGCTTTCGATGAAATAGTTCATTTATCTGTGGGTAGATATTACTTACTGGTTGAGTTAAACTGGGT
TAAACATCAATTCTATTTCCATTTTTCATTTTTATAAATAGGTACTGAGAATCTTTGTTCATATAAATAG
ATGGATAGGATTAGCCACTTCTTTGAATTTCTTTTTCAAGTTTCATGCCAAGATTCACATCATAACACAT
GTAACTGCATGTCTGGATGGAGAACAGATGTACCTATGCAGCGGCAGGGACATCAACACTCTCACTGATG
AATTGGCCGAGGAATGAGGAATAGCACAAATCAGCTACGGAACATTGACAAACTGGGAGCTAAACTTTGC
TTCATGCCTGTGAGGCAGTATTTTGATGAGCGGTGGATGCCCAGTGCTTCCTTGT"
>
> pa = pairwiseAlignment(str[1], str[2])
> deletion(pa)
IRangesList of length 1
[[1]]
IRanges of length 9
start end width
[1] 4 6 3
[2] 18 36 19
[3] 27 54 28
[4] 34 78 45
[5] 40 77 38
[6] 50 53 4
[7] 58 67 10
[8] 65 68 4
[9] 94 133 40
> insertion(pa)
IRangesList of length 1
[[1]]
IRanges of length 1
start end width
[1] 234 235 2
> nchar(pa)
[1] 292
> as.character(aligned(pattern(pa)))
[1]
"TCC---TTGCACATTGATAA-------------------GTTCACATC
----------------------------TGAAATT
---------------------------------------------TGCATG
--------------------------------------ACATAAACAT----ACAGTTGA----------
GAAGGAG----AGAACGTATGCCCTATGGTAAATATTGAC
----------------------------------------ATTTTAAA"
> as.character(aligned(subject(pa)))
[1]
"TCCATTTTTCATTTTTATAAATAGGTACTGAGAATCTTTGTTCATATAAATAGATGGATAGGATTAGCC
ACTTCTTTGAATTTCTTTTTCAAGTTTCATGCCAAGATTCACATCATAACACATGTAACTGCATGTCTGG
ATGGAGAACAGATGTACCTATGCAGCGGCAGGGACATCAACACTCTCACTGATGAATTGGCCGAGGAATG
AGGAATAGCACAAATCAGCTACGG--
AACATTGACAAACTGGGAGCTAAACTTTGCTTCATGCCTGTGAGGCAGTATTTTGAT"
### here is what I would expect deletion(pa) to output ... notice that
it
resembles
### the above deletion(pa) output with a shift corresponding to
cumsum(width)
### is this a bug?
> as(which(strsplit(as.character(aligned(subject(pa))), '')[[1]] ==
"-"),
'IRanges')
IRanges of length 9
start end width
[1] 4 6 3
[2] 21 39 19
[3] 49 76 28
[4] 84 128 45
[5] 135 172 38
[6] 183 186 4
[7] 195 204 10
[8] 212 215 4
[9] 245 284 40
> sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] C
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods
[8] base
other attached packages:
[1] Biostrings_2.28.0 IRanges_1.18.4 BiocGenerics_0.6.0
[4] BiocInstaller_1.10.3 multicore_0.1-7
loaded via a namespace (and not attached):
[1] stats4_3.0.0 tools_3.0.0
[[alternative HTML version deleted]]