How does csaw handle multiply aligned reads?

How does csaw handle multiply aligned reads?

0

Entering edit mode

endrebak85 ▴ 40

@endrebak85-10660

Last seen 4.7 years ago

github.com/endrebak/

How does csaw handle multiply aligned reads?

I guess csaw would not want to use such reads, but I could not find anything in the users guide or manual about what csaw does with multimapped reads.

Does it read a flag in the bam file telling it the alignment is multimapped and then discards it? If the onus is on the user to remove such reads, it would be nice if the manual could be updated with a heads-up.

Thanks for csaw, btw.

csaw chip-seq • 996 views

ADD COMMENT • link updated 8.0 years ago by Aaron Lun ★ 28k • written 8.0 years ago by endrebak85 ▴ 40

2

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 8 hours ago

The city by the bay

It depends on what your aligner is spitting out:

If there are several alignments corresponding to a single read, csaw will not distinguish between secondary alignments and the main alignment, and will treat them (incorrectly) as separate reads. This becomes a headache if you're trying to process paired-end data, because relying on the read name is no longer sufficient to identify alignments for distinct reads in the same pair. I guess this will probably break things in pe="both" mode, especially some of the optimizations. In this case, the onus is on you to remove secondary alignments; I've added a sentence to the user's guide warning about this.
If you're getting one alignment per read, then the mapping quality is often an indication of whether it's uniquely mapped or not (e.g., see the Bowtie2 documentation). In this case, setting a threshold on the MAPQ score with minq will be sufficient to remove non-uniquely mapped reads. This is described in Section 2.2.2 of the csaw user's guide. In other programs, reads with non-unique alignments will be reported as being unmapped (e.g., Rsubread with unique=TRUE) so this isn't a problem; those reads are just ignored like any other unmapped read. If you want to use your own definition of uniqueness, then that's up to you, so you'll have to modify the BAM file before feeding it into csaw.

The second case is probably what you're referring to. The first case is stranger because the default for most aligners is just to report a single alignment per read. In fact, I've never done any mapping step where I requested the mapping software to give me multiple mapping locations for a read. I'd end up removing all but the best alignment prior to any downstream analysis anyway, so what's the point?

ADD COMMENT • link 8.0 years ago Aaron Lun ★ 28k

Login before adding your answer.

Similar Posts

Loading Similar Posts

Traffic: 859 users visited in the last hour

Content Search
Users
Tags
Badges

Help About
FAQ

Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the

version 2.3.6