Mimic Counting of HTSeq-count Using countOverlaps
1
1
Entering edit mode
Dario Strbenac ★ 1.5k
@dario-strbenac-5916
Last seen 1 hour ago
Australia

If alignment is done allowing a read to map to multiple locations, is there an easy way to count only the uniquely mapped reads? HTSeq-count "... does not count reads that map to multiple genes.". However, for GAlignments, "... multi-reads (i.e. reads with multiple hits in the reference) won't receive any special treatment ..." meaning that the reads mapping to multiple genomic locations will be counted many times. The reason for allowing a read to map to multiple locations is to make the one of the two resulting BAM files for each sample from STAR useful for RSEM FPKM calculation but also being able to get ordinary counts of uniquely mapped reads for modelling using GLMs.

GenomicAlignments countOverlaps • 1.2k views
ADD COMMENT
2
Entering edit mode
@martin-morgan-1513
Last seen 5 days ago
United States

The STAR manual says that alignments are unique when the NH tag is 1, or when the map quality is 255. One could use these in ScanBamParam(), e.g., ScanBamParam(mapqFilter = 255) or ScanBamParam(tagFilter=list(NH = 1)) or scanBamParam(mapqFilter = 255, tagFilter = list(NH = 1)).

The HTseq documentation seem to suggest that a slightly different heuristic is used, since there is no knowledge of aligner assumed. The heuristic uses a map quality of 10 as a threshold to filter multiple aligners. Also, the documentation implies an 'or' condition on the NH tag, which may or may not be present; ScanBamParam() does not allow this flexibility, but for STAR alignments it seems HTSeq would use the equivalent of ScanBamParam(mapqFilter = 10, tagFilter = list(NH =1)).

ADD COMMENT
0
Entering edit mode

The code example should use NH but you have written NM. Is there a coercion function that can convert scaBam's list-of-lists data structure into a GAlignments or GAlignmentsPairs object quickly and concisely?

ADD REPLY
0
Entering edit mode

Probably the easiest is to use the ScanBamParam() as the param= argument to readGAlignments() / readGAlignmentPairs(). Sorry about the NM / NH typo, corrected now.

ADD REPLY
1
Entering edit mode

I thought the easy/standard way for counting uniquely mapped reads was to filter out secondary alignments. Any reason this cannot be used here?

FWIW with Rsamtools/GenomicAlignments this can be done by setting the isSecondaryAlignment flag to FALSE when loading the reads from the BAM file e.g. readGAlignments(..., param=ScanBamParam(flag=scanBamFlag(isSecondaryAlignment=FALSE))).

ADD REPLY

Login before adding your answer.

Traffic: 703 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6