Gviz - display/indicate soft-clipped bases - indicate supplementary alignments
2
0
Entering edit mode
thackl • 0
@thackl-10704
Last seen 5.2 years ago

I am using Gviz to visualize mappings of PacBio data (bam files) onto contigs. I was wondering if there is any way to display soft-clipped bases at read ends, or at least indicate reads with soft-/hard-clipped ends. That would help a lot in spotting structural variations and potential misassemblies.

Similarly, being able to indicate supplementary alignments of the same query read - e.g. display them in the same way as gapped mappings - would help a lot.

Would be great if anyone had an idea on how this can be done with Gviz. (I am also open to alternative approaches)

gviz bam pacbio • 1.0k views
1
Entering edit mode
@herve-pages-1542
Last seen 3 hours ago
Seattle, WA, United States

Hi,

A blunt approach would be to segregate the alignments first (i.e. before using Gviz) and then display each group separately by creating one AlignmentsTrack object per group. For example separating the soft-clipped alignments from the non-soft-clipped ones could be done by splitting the GAlignments object containing the reads with split(gal, grepl("S", cigar(gal))). The result is a GAlignmentsList object of length 2 with 1 list element per group. To separate the secondary alignments from the primary ones you can either load the 2 groups from the BAM file separately (by calling readGAlignments() twice and passing a ScanBamParam object with the appropriate scanBamFlag() each time) or load all the alignments with the flag field as a metadata column and split the GAlignments object with split(gal, bamFlagAsBitMatrix(mcols(gal)\$flag, "isSecondaryAlignment")).

The resulting plot might not be very appealing though so hopefully there is a better way to go...

H.

0
Entering edit mode

I like the idea of splitting into clipped/non-clipped reads. But as you said, it is a practical, but not really satisfying solution :)

1
Entering edit mode
@florianhahnenovartiscom-3784
Last seen 2.9 years ago
Switzerland

I guess we could indicate the soft-clipping information for the individual reads in a similar fashion as it is done in IGV. This involves parsing the CIGAR string, and I am not sure how complex that is going to be. Will take a look at this if I find the time, but this is certainly not going to be a quick fix.

Florian

0
Entering edit mode

Yes, that's what I was afraid of. But rather than including the entire stretch of unmapped bases (as IGV does and which needs to be accounted for in stacking etc), would it be possible to just color or mark the start/end of  a read if the cigar starts/ends with an S/H?

0
Entering edit mode

Still means that we have to touch the CIGAR string for each read. Whether we parse the whole thing or just look for the heads and tails doesn't make much of a difference.The drawing later on is already vectorized, and breaking up a read stretch into sub-sections doesn't make much of a difference as far as I can tell.

If you feel compelled please go ahead and take a look at the Gviz source code. Maybe you can come up with a quick solution before I find the time to look at this.