How does csaw handle multiply aligned reads?
1
0
Entering edit mode
endrebak85 ▴ 30
@endrebak85-10660
Last seen 2.1 years ago
github.com/endrebak/

How does csaw handle multiply aligned reads?

I guess csaw would not want to use such reads, but I could not find anything in the users guide or manual about what csaw does with multimapped reads.

Does it read a flag in the bam file telling it the alignment is multimapped and then discards it? If the onus is on the user to remove such reads, it would be nice if the manual could be updated with a heads-up.

Thanks for csaw, btw.

csaw chip-seq • 635 views
2
Entering edit mode
Aaron Lun ★ 27k
@alun
Last seen 7 hours ago
The city by the bay

It depends on what your aligner is spitting out:

• If there are several alignments corresponding to a single read, csaw will not distinguish between secondary alignments and the main alignment, and will treat them (incorrectly) as separate reads. This becomes a headache if you're trying to process paired-end data, because relying on the read name is no longer sufficient to identify alignments for distinct reads in the same pair. I guess this will probably break things in pe="both" mode, especially some of the optimizations. In this case, the onus is on you to remove secondary alignments; I've added a sentence to the user's guide warning about this.
• If you're getting one alignment per read, then the mapping quality is often an indication of whether it's uniquely mapped or not (e.g., see the Bowtie2 documentation). In this case, setting a threshold on the MAPQ score with minq will be sufficient to remove non-uniquely mapped reads. This is described in Section 2.2.2 of the csaw user's guide. In other programs, reads with non-unique alignments will be reported as being unmapped (e.g., Rsubread with unique=TRUE) so this isn't a problem; those reads are just ignored like any other unmapped read. If you want to use your own definition of uniqueness, then that's up to you, so you'll have to modify the BAM file before feeding it into csaw.

The second case is probably what you're referring to. The first case is stranger because the default for most aligners is just to report a single alignment per read. In fact, I've never done any mapping step where I requested the mapping software to give me multiple mapping locations for a read. I'd end up removing all but the best alignment prior to any downstream analysis anyway, so what's the point?