Question

How does shearwater determine the reference sequence?

0

Entering edit mode

Asma rabe ▴ 290

@asma-rabe-4697

Last seen 7.2 years ago

Japan

Hi ,

Is there any one who knows how does shearwater determine the reference sequence?

Thank you.

sequencing • 1.1k views

ADD COMMENT • link 10.6 years ago • updated 10.5 years ago Asma rabe ▴ 290

score 0 · Answer 1 · 2014-10-07

The consensus sequence is inferred from the matrix of nucleotide counts. For each position (i.e. row in the matrix), the base with the most counts is identified as the consensus. In principle, it boils down to a 'max.col' function call in R.

An interesting case arises if two or more bases have the same number of counts: This is generally caused due to a) regions with coverages close to 0 or b) at a heterozygous SNP. deepSNV will in this case prefer the first column in the nucleotide count matrix.

In a comparative variant call setting, considering the consensus sequence instead of the reference sequence is often desired. One is less interested in the difference to the reference genome, but rather in newly occurring variants between two samples.

score 0 · Answer 2 · 2014-10-13

0

Entering edit mode

Asma rabe ▴ 290

@asma-rabe-4697

Last seen 7.2 years ago

Japan

Does the order of test and control bam files matters in running shearwater?

Thank you

ADD COMMENT • link 10.5 years ago Asma rabe ▴ 290