Search
1
9 months ago by
C Lin60
United States
C Lin60 wrote:

Hello...

Is there an R package that can calculate alignment statistics from bam/sam files? By alignment statistics I mean number of reads, number of reads uniquely mapped, number of reads mapped to multiple locations.

I found rsubread package can count number of  reads that can be mapped. However, it doesn't differentiate between unique alignment and multiple alignment.

modified 8 months ago by Hervé Pagès ♦♦ 13k • written 9 months ago by C Lin60
2
9 months ago by
United States
James W. MacDonald46k wrote:

Yes, countBam in Rsamtools will do that sort of thing. See in particular ?ScanBamParam for filtering what you do and do not count.

Thanks for the hints on countBam. However, I still cannot get the number of uniquely mapped reads.

I thought the following should return number of multiple mapped reads. However, it returns 0 when I know there are multiple mapped reads. bowtie2 shows ~14 millions reads

file='input.bam'

countBam(file,index=file,param=ScanBamParam(flag=scanBamFlag(isSecondaryAlignment=TRUE))

I was able to get the number of unmapped correctly using isUnmappedQuery=TRUE

1

Works for me:

samtools flagstat 173342.accepted_hits.merged.markeddups.recal.bam
24242590 + 0 secondary

> z <- countBam("173342.accepted_hits.merged.markeddups.recal.bam", param = ScanBamParam(flag = scanBamFlag(isSecondaryAlignment = TRUE)))

> z\$records
[1] 24242590
1

Thanks for your help. I figured it out.

I ran bowtie retaining only unique mapped reads. Therefore, my bam file does not contain multiple mapped reads. Duh!

James, thanks again for your help.

1
8 months ago by
Hervé Pagès ♦♦ 13k
United States
Hervé Pagès ♦♦ 13k wrote:

Hi,

The quickBamFlagSummary() function in Rsamtools will display various stats about the FLAG field. For example:

> quickBamFlagSummary(bamfile)
|          |         | mean / max
group |    nb of |   nb of | records
of |  records |  unique | per unique
records | in group |  QNAMEs | QNAME
All records................... A |     3307 |    1699 | 1.95 / 2
o template has single segment S |        0 |       0 |   NA / NA
o template has multple sgmnts M |     3307 |    1699 | 1.95 / 2
- first segment.......... F |     1654 |    1654 |    1 / 1
- last segment........... L |     1653 |    1653 |    1 / 1
- other segment.......... O |        0 |       0 |   NA / NA

Note that (S, M) is a partitioning of A, and (F, L, O) is a
partitioning of M. Indentation reflects this.

Details for group M:
o record is mapped......... M1 |     3271 |    1699 | 1.93 / 2
- primary alignment.... M2 |     3271 |    1699 | 1.93 / 2
- secondary alignment.. M3 |        0 |       0 |   NA / NA
o record is unmapped....... M4 |       36 |      36 |    1 / 1

Details for group F:
o record is mapped......... F1 |     1641 |    1641 |    1 / 1
- primary alignment.... F2 |     1641 |    1641 |    1 / 1
- secondary alignment.. F3 |        0 |       0 |   NA / NA
o record is unmapped....... F4 |       13 |      13 |    1 / 1

Details for group L:
o record is mapped......... L1 |     1630 |    1630 |    1 / 1
- primary alignment.... L2 |     1630 |    1630 |    1 / 1
- secondary alignment.. L3 |        0 |       0 |   NA / NA
o record is unmapped....... L4 |       23 |      23 |    1 / 1

This output shows that the reads are paired-end. This file does not contain multiple mapped reads because the number of secondary alignments is 0. This is confirmed by the max records per unique QNAME equal to 1 for the F and L groups.

H.