Entering edit mode
Guest User
★
13k
@guest-user-4897
Last seen 10.3 years ago
Hi, I have what I thought should be simple procedure...
I have a multiple alignment. I want to find the locations (and
sequences) of all the differences.
I can read the file into a DNAMultipleAlignment class:
aln = read.DNAMultipleAlignment(filepath=alnFiles[i], format="fasta")
This is a 4-species alignment, 300bp long. I finally figured out a way
find all the "differences" (or, bases where the 4 species are not
identical), like so:
which(apply(consensusMatrix(aln), 2, max) !=4)
It seems like there should be a more straightforward way, but this
just looks at the count matrix and finds all the places that don't
have all the species in the same nucleotide...
Now, I found no easy way to subset the aln object to pull out the
sequences at those positions. Is there one?
I think maybe I want something like "mismatchTable", but according to
the documentation, this won't work on a DNAMultipleAlignment...
mismatchTable(aln)
Error in function (classes, fdef, mtable) :
unable to find an inherited method for function "mismatchTable", for
signature "DNAMultipleAlignment"
Instead, it works on a AlignedXStringSet. Well, I can't figure out the
difference between an AlignedXStringSet and a DNAMultipleAlignment,
which both appear to hold aligned strings, even XStringSets...and I
cannot convert one to another.
Can anyone tell me where I should start to figure out the differences
between these objects? And, how to solve my original problem of
getting the the sequences at the locations that are different in an
alignment?
Thanks for your help.
-- output of sessionInfo():
R version 2.13.0 (2011-04-13)
Platform: x86_64-redhat-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] grid stats graphics grDevices utils datasets
methods
[8] base
other attached packages:
[1] seqLogo_1.18.0 MotIV_1.6.0 rtracklayer_1.12.2
[4] RCurl_1.6-4 bitops_1.0-4.1 GenomicRanges_1.4.3
[7] Biostrings_2.20.1 IRanges_1.10.3
loaded via a namespace (and not attached):
[1] BSgenome_1.20.0 lattice_0.19-23 rGADEM_1.4.0 tools_2.13.0
[5] XML_3.4-0
--
Sent via the guest posting facility at bioconductor.org.