BioStrings PairwiseAlignment deletion() insertion() function
0
0
Entering edit mode
@marcin-imielinski-5383
Last seen 9.7 years ago
Hi - I'm confused about the output of deletion(pa) and insertion(pa) functions for pa = pairwiseAlignment(). My understanding is that they should output IRanges corresponding to gaps in the pattern (deletion()) and gaps in the subject (insertion()) in terms of alignment coordinates. However, it appears that the outputted ranges can overlap. For example, the alignment (below) of a 101 letter pattern and 404 letter subject. The deletion ranges overlap. What sequence positions do these ranges refer to (pattern, subject, or alignment)? Is this is a bug or am I misinterpreting this output? Thanks Marcin ############################################################## > str[1] [1] "TCCTTGCACATTGATAAGTTCACATCTGAAATTTGCATGACATAAACATACAGTTGAGAAGGAGAGAAC GTATGCCCTATGGTAAATATTGACATTTTAAA" > str[2] [1] "CTGGGCTTTCGATGAAATAGTTCATTTATCTGTGGGTAGATATTACTTACTGGTTGAGTTAAACTGGGT TAAACATCAATTCTATTTCCATTTTTCATTTTTATAAATAGGTACTGAGAATCTTTGTTCATATAAATAG ATGGATAGGATTAGCCACTTCTTTGAATTTCTTTTTCAAGTTTCATGCCAAGATTCACATCATAACACAT GTAACTGCATGTCTGGATGGAGAACAGATGTACCTATGCAGCGGCAGGGACATCAACACTCTCACTGATG AATTGGCCGAGGAATGAGGAATAGCACAAATCAGCTACGGAACATTGACAAACTGGGAGCTAAACTTTGC TTCATGCCTGTGAGGCAGTATTTTGATGAGCGGTGGATGCCCAGTGCTTCCTTGT" > > pa = pairwiseAlignment(str[1], str[2]) > deletion(pa) IRangesList of length 1 [[1]] IRanges of length 9 start end width [1] 4 6 3 [2] 18 36 19 [3] 27 54 28 [4] 34 78 45 [5] 40 77 38 [6] 50 53 4 [7] 58 67 10 [8] 65 68 4 [9] 94 133 40 > insertion(pa) IRangesList of length 1 [[1]] IRanges of length 1 start end width [1] 234 235 2 > nchar(pa) [1] 292 > as.character(aligned(pattern(pa))) [1] "TCC---TTGCACATTGATAA-------------------GTTCACATC ----------------------------TGAAATT ---------------------------------------------TGCATG --------------------------------------ACATAAACAT----ACAGTTGA---------- GAAGGAG----AGAACGTATGCCCTATGGTAAATATTGAC ----------------------------------------ATTTTAAA" > as.character(aligned(subject(pa))) [1] "TCCATTTTTCATTTTTATAAATAGGTACTGAGAATCTTTGTTCATATAAATAGATGGATAGGATTAGCC ACTTCTTTGAATTTCTTTTTCAAGTTTCATGCCAAGATTCACATCATAACACATGTAACTGCATGTCTGG ATGGAGAACAGATGTACCTATGCAGCGGCAGGGACATCAACACTCTCACTGATGAATTGGCCGAGGAATG AGGAATAGCACAAATCAGCTACGG-- AACATTGACAAACTGGGAGCTAAACTTTGCTTCATGCCTGTGAGGCAGTATTTTGAT" ### here is what I would expect deletion(pa) to output ... notice that it resembles ### the above deletion(pa) output with a shift corresponding to cumsum(width) ### is this a bug? > as(which(strsplit(as.character(aligned(subject(pa))), '')[[1]] == "-"), 'IRanges') IRanges of length 9 start end width [1] 4 6 3 [2] 21 39 19 [3] 49 76 28 [4] 84 128 45 [5] 135 172 38 [6] 183 186 4 [7] 195 204 10 [8] 212 215 4 [9] 245 284 40 > sessionInfo() R version 3.0.0 (2013-04-03) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] Biostrings_2.28.0 IRanges_1.18.4 BiocGenerics_0.6.0 [4] BiocInstaller_1.10.3 multicore_0.1-7 loaded via a namespace (and not attached): [1] stats4_3.0.0 tools_3.0.0 [[alternative HTML version deleted]]
Alignment Alignment • 842 views
ADD COMMENT

Login before adding your answer.

Traffic: 675 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6