Longest continuous sequence from multiple alignment
0
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 10.3 years ago
I have a number of ShortRead sequences aligned and stored in a DNAMultipleAlignment object. I???m trying to find a way to extract the common longest sequence between the reads. i.e., it is a combination of consensus sequence and longest read here is an example of 6 reads that I have aligned: [1] "CTATGGGAGGGAAAAAGGATCCAGTATTAGGGGGAGGCAGGAAGTGCATGTGAGGCCACAGTGTC AACA-------------------------------------------------------------" [2] "CTATGGGAGGG-AAAAGGATCCAGTATTA- GGGGAGGCAGGAAGTGCATGTGAGGCCACAGTGTCAACAACATACTTTCACTTGGC- AAAAAAAACATGGTGACTGCAAAGAG-----------------" [3] "CTATGGGAGGG-AAAAGGATCCAGTATTA-GGGGAGGCAGGAAGTGCATGTGAGGCCACAGTGTC AACAACATACTTTCACTTGGC--------------------------------------------" [4] "CTATGGGAGGG-AAAAGGATCCAGTATTA-GGGGAGGCAGGAAGTGCATGTGAGGCCACAGTGTC AAC--------------------------------------------------------------" [5] "CTATGGGAGGG-AAAAGGATCCAGTATTA-GGGGAGGCAGGAAGTGCATGTGAGGCCACAGTGTC AACAACATACTTTCACTTGGCAAAAAAAAACATGGTGACTGCAAAGAGAGCTCTGTCTGGCTTCT??? [6] "CTATGGGAGGG-AAAAGGATCCAGTATTA-GGGGAGGCAGGAAGTGCATGTGAGGCCACAGTGTC AAC--------------------------------------------------------------" >From these, I would like to go extract the longest common sequence, but retain ambiguity information where available i.e producing the following sequence: "CTATGGGAGGG-AAAAGGATCCAGTATTA- GGGGAGGCAGGAAGTGCATGTGAGGCCACAGTGTCAACAACATACTTTCACTTGGC- AAAAAAAACATGGTGACTGCAAAGAGAGCTCTGTCTGGCTTCT??? The consensusString function does not work well for me as it truncates the sequence in the 3??? end in this example. Thank you for your help All the best Tomas -- output of sessionInfo(): R version 3.1.0 (2014-04-10) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] Rsubread_1.14.0 Rlibstree_0.3-2 BiocInstaller_1.14.2 muscle_3.8.31-1 ShortRead_1.22.0 GenomicAlignments_1.0.1 [7] BSgenome_1.32.0 Rsamtools_1.16.0 GenomicRanges_1.16.3 GenomeInfoDb_1.0.2 Biostrings_2.32.0 XVector_0.4.0 [13] IRanges_1.22.6 BiocParallel_0.6.0 BiocGenerics_0.10.0 loaded via a namespace (and not attached): [1] BatchJobs_1.2 BBmisc_1.6 Biobase_2.24.0 bitops_1.0-6 brew_1.0-6 codetools_0.2-8 DBI_0.2-7 [8] digest_0.6.4 fail_1.2 foreach_1.4.2 grid_3.1.0 hwriter_1.3 iterators_1.0.7 lattice_0.20-29 [15] latticeExtra_0.6-26 plyr_1.8.1 RColorBrewer_1.0-5 Rcpp_0.11.1 RSQLite_0.11.4 sendmailR_1.1-2 stats4_3.1.0 [22] stringr_0.6.2 tools_3.1.0 zlibbioc_1.10.0 -- Sent via the guest posting facility at bioconductor.org.
GO ShortRead GO ShortRead • 1.1k views
ADD COMMENT

Login before adding your answer.

Traffic: 480 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6