Confused about understanding the output and statistics of BAM file after STAR aligning
Entering edit mode
Mohamed ▴ 30
Last seen 8 months ago
United Kingdom

Dear experts, I would like to understand better the output of STAR file alongside with samtools output: I run STAR command to align a paired end reads without including --outSAMunmapped Within and without outFilterMultimapNmax nor outFilterMismatchNmax my code was

STAR --runMode alignReads --genomeDir IndexRef/GRCg6a/ --outSAMtype BAM SortedByCoordinate --readFilesIn R0629-S0002_L10AU2_A56593_1_HGFCJDSX2_TCGTCTGA-TCAAGGAC_L003_R1_trimmed.fastq R0629-S0002_L10AU2_A56593_1_HGFCJDSX2_TCGTCTGA-TCAAGGAC_L003_R2_trimmed.fastq --outFileNamePrefix mapped/L10/BAM_L10_GRC6a/L10A2 --runThreadN 16

, so this is default STAR: I then got statistics summary from the file. as below

Started job on | Nov 03 23:03:37 Started mapping on | Nov 03 23:03:46 Finished on | Nov 03 23:14:49 Mapping speed, Million of reads per hour | 201.48

                      Number of input reads |       37106706
                  Average input read length |       298
                                UNIQUE READS:
               Uniquely mapped reads number |       31347653
                    Uniquely mapped reads % |       84.48%
                      Average mapped length |       290.62
                   Number of splices: Total |       28565716
        Number of splices: Annotated (sjdb) |       27854207
                   Number of splices: GT/AG |       28127279
                   Number of splices: GC/AG |       310611
                   Number of splices: AT/AC |       25399
           Number of splices: Non-canonical |       102427
                  Mismatch rate per base, % |       0.38%
                     Deletion rate per base |       0.04%
                    Deletion average length |       2.30
                    Insertion rate per base |       0.04%
                   Insertion average length |       2.00
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |       607582
         % of reads mapped to multiple loci |       1.64%
    Number of reads mapped to too many loci |       18388
         % of reads mapped to too many loci |       0.05%
                              UNMAPPED READS:

Number of reads unmapped: too many mismatches | 0 % of reads unmapped: too many mismatches | 0.00% Number of reads unmapped: too short | 5095820 % of reads unmapped: too short | 13.73% Number of reads unmapped: other | 37263 % of reads unmapped: other | 0.10% CHIMERIC READS: Number of chimeric reads | 0 % of chimeric reads | 0.00%

my questions are:

  1. Are the number of input reads is exactly the same as the one that I have from trimmed versions of R1 and R2 summed up or ?
  2. What is the exact meaning of uniquely mapped reads number ? is that the number of reads that map to only 1 position on the genome ? OR those reads map to < 10 position (so basically between 1 -10 position)
  3. In the output, there is also the multimapped reads: Does the Uniquely mapped reads number include the multimpped reads, which I assume those mapped to < 10 position (default STAR)
  4. What does the reads with 255 mappng Q means ? I counted them using

    samtools view -c -q 255 file.bam

but this gave even higher number than the unique mapped reads. I was thinking that reads with 255 are unique mapped or the ones mapped to only one position ? not sure... , but realized that it also contains singletones (could one explain meaning of singletones)

Finally and most important: the BAM file I obtained from this code line does not contain by default any duplicate reads >>> I knew this by counting the PCR duplicates in the file but was 0 using this command

samtools view -c -f 1024 file.bam

Am I right that in such file, no duplicates are there ? when I triled to rerun alignment but including the command

--outSAMunmapped Within

where I should have unmaped and mapped reads in the same BAM file, I found also that duplicate number was 0, so where have these duplicates gone ? i.e. are they removed by default from the BAM file in STAR ? I could not find these info in the STAR manual at all.

Any advice is highly appreciated


RNAseq123 VariantFiltering Alignment VariantAnnotation • 1.5k views
Entering edit mode
Last seen 1 day ago
United States

This support site is intended to help people with questions about Bioconductor packages. And STAR isn't a Bioconductor package. You might try on instead, or perhaps there is a STAR-specific help forum.


Login before adding your answer.

Traffic: 335 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6