What is the '?' in the result of R package 'msa' function 'msa'?
1
0
Entering edit mode
@yuchen-chang-20718
Last seen 2.7 years ago
Melbourne

Hi everyone, Recently I am using the R package called 'msa', and I have several questions very confused after getting the result. 1. For my understanding, the last line of MSA result (which title is 'con') is the consensus sequence. In this case, by the handbook of MSA, the consensus residue is the frequency over 80% (by default). Then I am confused about the question mark appear in this line. Is it indicating that in this position we have several candidates which are both in a high frequency? 2. Since the conserved score (compute by function 'msaConservationScore') is a bit differ from the consensus sequence, how can I know which part of the sequence alignment is matched together? I am guessing maybe the regions which consensus sequence have symbols but not a hyphen (-)? Thanks everyone!

1
Entering edit mode
@ulrichbodenhofer-8624
Last seen 2.8 years ago
Austria

By default, the print() function implemented in the 'msa' package uses the consensusString() function from the 'Biostrings' package. So please be referred to the documentation of the 'Biostrings' package for more information on how consensus strings are computed. In any case, I guess '?' is an ambiguity character. You can change the computation of the consensus sequence by specifying 'type="upperlower"' (see documentation of the msaConsensusSequence() method). This variant offers quite a standard implementation, where upper case and lower case characters are used depending on the relative frequencies of the majority character (the thresholds can be specified). These options can be specified when using the print() function too, since additional arguments of the print() function are forwarded to the msaConsensusSequence() method.

0
Entering edit mode

Thanks, Ulrich! I will try using 'masConsensusSequence(..., type = 'upperlower')'.