How does shearwater determine the reference sequence?
2
0
Entering edit mode
Asma rabe ▴ 290
@asma-rabe-4697
Last seen 6.9 years ago
Japan

Hi ,

Is there any one who knows how does shearwater determine the reference sequence?

Thank you.

sequencing • 1.1k views
ADD COMMENT
0
Entering edit mode
Julian Gehring ★ 1.3k
@julian-gehring-5818
Last seen 5.6 years ago

The consensus sequence is inferred from the matrix of nucleotide counts.  For each position (i.e. row in the matrix), the base with the most counts is identified as the consensus.  In principle, it boils down to a 'max.col' function call in R.

An interesting case arises if two or more bases have the same number of counts: This is generally caused due to a) regions with coverages close to 0 or b) at a heterozygous SNP. deepSNV will in this case prefer the first column in the nucleotide count matrix.

In a comparative variant call setting, considering the consensus sequence instead of the reference sequence is often desired. One is less interested in the difference to the reference genome, but rather in newly occurring variants between two samples.

ADD COMMENT
0
Entering edit mode

Thank you very much Julian. So the consensus sequence can be different from reference sequence at some positions.

What if two or more nucleotides have the same count for a certain position??

Is there a way to use Shearwater where reference sequence is used instead of inferring the sequence??

ADD REPLY
0
Entering edit mode
Asma rabe ▴ 290
@asma-rabe-4697
Last seen 6.9 years ago
Japan

Does the order of test and control bam files matters in running shearwater?

Thank you

ADD COMMENT

Login before adding your answer.

Traffic: 716 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6