Hi ,
Is there any one who knows how does shearwater determine the reference sequence?
Thank you.
Hi ,
Is there any one who knows how does shearwater determine the reference sequence?
Thank you.
The consensus sequence is inferred from the matrix of nucleotide counts. For each position (i.e. row in the matrix), the base with the most counts is identified as the consensus. In principle, it boils down to a 'max.col
' function call in R.
An interesting case arises if two or more bases have the same number of counts: This is generally caused due to a) regions with coverages close to 0 or b) at a heterozygous SNP. deepSNV
will in this case prefer the first column in the nucleotide count matrix.
In a comparative variant call setting, considering the consensus sequence instead of the reference sequence is often desired. One is less interested in the difference to the reference genome, but rather in newly occurring variants between two samples.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thank you very much Julian. So the consensus sequence can be different from reference sequence at some positions.
What if two or more nucleotides have the same count for a certain position??
Is there a way to use Shearwater where reference sequence is used instead of inferring the sequence??