Confused about understanding the output and statistics of BAM file after STAR aligning
1
0
Entering edit mode
Mohamed ▴ 30
@aa1ae679
Last seen 7 months ago
United Kingdom

Dear experts, I would like to understand better the output of STAR log.final file alongside with samtools output: I run STAR command to align a paired end reads without including --outSAMunmapped Within and without outFilterMultimapNmax nor outFilterMismatchNmax my code was

STAR --runMode alignReads --genomeDir IndexRef/GRCg6a/ --outSAMtype BAM SortedByCoordinate --readFilesIn R0629-S0002_L10AU2_A56593_1_HGFCJDSX2_TCGTCTGA-TCAAGGAC_L003_R1_trimmed.fastq R0629-S0002_L10AU2_A56593_1_HGFCJDSX2_TCGTCTGA-TCAAGGAC_L003_R2_trimmed.fastq --outFileNamePrefix mapped/L10/BAM_L10_GRC6a/L10A2 --runThreadN 16

, so this is default STAR: I then got statistics summary from the log.final file. as below

Started job on | Nov 03 23:03:37 Started mapping on | Nov 03 23:03:46 Finished on | Nov 03 23:14:49 Mapping speed, Million of reads per hour | 201.48

                      Number of input reads |       37106706
                  Average input read length |       298
                                UNIQUE READS:
               Uniquely mapped reads number |       31347653
                    Uniquely mapped reads % |       84.48%
                      Average mapped length |       290.62
                   Number of splices: Total |       28565716
        Number of splices: Annotated (sjdb) |       27854207
                   Number of splices: GT/AG |       28127279
                   Number of splices: GC/AG |       310611
                   Number of splices: AT/AC |       25399
           Number of splices: Non-canonical |       102427
                  Mismatch rate per base, % |       0.38%
                     Deletion rate per base |       0.04%
                    Deletion average length |       2.30
                    Insertion rate per base |       0.04%
                   Insertion average length |       2.00
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |       607582
         % of reads mapped to multiple loci |       1.64%
    Number of reads mapped to too many loci |       18388
         % of reads mapped to too many loci |       0.05%
                              UNMAPPED READS:

Number of reads unmapped: too many mismatches | 0 % of reads unmapped: too many mismatches | 0.00% Number of reads unmapped: too short | 5095820 % of reads unmapped: too short | 13.73% Number of reads unmapped: other | 37263 % of reads unmapped: other | 0.10% CHIMERIC READS: Number of chimeric reads | 0 % of chimeric reads | 0.00%

my questions are:

  1. Are the number of input reads is exactly the same as the one that I have from trimmed versions of R1 and R2 summed up or ?
  2. What is the exact meaning of uniquely mapped reads number ? is that the number of reads that map to only 1 position on the genome ? OR those reads map to < 10 position (so basically between 1 -10 position)
  3. In the output, there is also the multimapped reads: Does the Uniquely mapped reads number include the multimpped reads, which I assume those mapped to < 10 position (default STAR)
  4. What does the reads with 255 mappng Q means ? I counted them using

    samtools view -c -q 255 file.bam

but this gave even higher number than the unique mapped reads. I was thinking that reads with 255 are unique mapped or the ones mapped to only one position ? not sure... , but realized that it also contains singletones (could one explain meaning of singletones)

Finally and most important: the BAM file I obtained from this code line does not contain by default any duplicate reads >>> I knew this by counting the PCR duplicates in the file but was 0 using this command

samtools view -c -f 1024 file.bam

Am I right that in such file, no duplicates are there ? when I triled to rerun alignment but including the command

--outSAMunmapped Within

where I should have unmaped and mapped reads in the same BAM file, I found also that duplicate number was 0, so where have these duplicates gone ? i.e. are they removed by default from the BAM file in STAR ? I could not find these info in the STAR manual at all.

Any advice is highly appreciated

Thank

RNAseq123 VariantFiltering Alignment VariantAnnotation • 1.5k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 8 hours ago
United States

This support site is intended to help people with questions about Bioconductor packages. And STAR isn't a Bioconductor package. You might try on biostars.org instead, or perhaps there is a STAR-specific help forum.

ADD COMMENT

Login before adding your answer.

Traffic: 681 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6