Entering edit mode
                    Guest User
        
    
        ★
    
    13k
        @guest-user-4897
        Last seen 11.2 years ago
        
    
I have a number of ShortRead sequences aligned and stored in a
DNAMultipleAlignment object. I???m trying to find a way to extract the
common longest sequence between the reads. i.e., it is a combination
of consensus sequence and longest read
here is an example of 6 reads that I have aligned:
[1] "CTATGGGAGGGAAAAAGGATCCAGTATTAGGGGGAGGCAGGAAGTGCATGTGAGGCCACAGTGTC
AACA-------------------------------------------------------------"
[2] "CTATGGGAGGG-AAAAGGATCCAGTATTA-
GGGGAGGCAGGAAGTGCATGTGAGGCCACAGTGTCAACAACATACTTTCACTTGGC-
AAAAAAAACATGGTGACTGCAAAGAG-----------------"
[3] "CTATGGGAGGG-AAAAGGATCCAGTATTA-GGGGAGGCAGGAAGTGCATGTGAGGCCACAGTGTC
AACAACATACTTTCACTTGGC--------------------------------------------"
[4] "CTATGGGAGGG-AAAAGGATCCAGTATTA-GGGGAGGCAGGAAGTGCATGTGAGGCCACAGTGTC
AAC--------------------------------------------------------------"
[5] "CTATGGGAGGG-AAAAGGATCCAGTATTA-GGGGAGGCAGGAAGTGCATGTGAGGCCACAGTGTC
AACAACATACTTTCACTTGGCAAAAAAAAACATGGTGACTGCAAAGAGAGCTCTGTCTGGCTTCT???
[6] "CTATGGGAGGG-AAAAGGATCCAGTATTA-GGGGAGGCAGGAAGTGCATGTGAGGCCACAGTGTC
AAC--------------------------------------------------------------"
>From these, I would like to go extract the longest common sequence,
but retain ambiguity information where available i.e producing the
following sequence:
 "CTATGGGAGGG-AAAAGGATCCAGTATTA-
GGGGAGGCAGGAAGTGCATGTGAGGCCACAGTGTCAACAACATACTTTCACTTGGC-
AAAAAAAACATGGTGACTGCAAAGAGAGCTCTGTCTGGCTTCT???
The consensusString function does not work well for me as it truncates
the sequence in the 3??? end in this example.
Thank you for your help
All the best
Tomas
 -- output of sessionInfo():
R version 3.1.0 (2014-04-10)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets
methods   base
other attached packages:
 [1] Rsubread_1.14.0         Rlibstree_0.3-2
BiocInstaller_1.14.2    muscle_3.8.31-1         ShortRead_1.22.0
GenomicAlignments_1.0.1
 [7] BSgenome_1.32.0         Rsamtools_1.16.0
GenomicRanges_1.16.3    GenomeInfoDb_1.0.2      Biostrings_2.32.0
XVector_0.4.0
[13] IRanges_1.22.6          BiocParallel_0.6.0
BiocGenerics_0.10.0
loaded via a namespace (and not attached):
 [1] BatchJobs_1.2       BBmisc_1.6          Biobase_2.24.0
bitops_1.0-6        brew_1.0-6          codetools_0.2-8     DBI_0.2-7
 [8] digest_0.6.4        fail_1.2            foreach_1.4.2
grid_3.1.0          hwriter_1.3         iterators_1.0.7
lattice_0.20-29
[15] latticeExtra_0.6-26 plyr_1.8.1          RColorBrewer_1.0-5
Rcpp_0.11.1         RSQLite_0.11.4      sendmailR_1.1-2
stats4_3.1.0
[22] stringr_0.6.2       tools_3.1.0         zlibbioc_1.10.0
--
Sent via the guest posting facility at bioconductor.org.
                    
                
                