Question

Confused about understanding the output and statistics of BAM file after STAR aligning

0

Entering edit mode

Mohamed ▴ 30

@aa1ae679

Last seen 10 months ago

United Kingdom

Dear experts, I would like to understand better the output of STAR log.final file alongside with samtools output: I run STAR command to align a paired end reads without including --outSAMunmapped Within and without outFilterMultimapNmax nor outFilterMismatchNmax my code was

STAR --runMode alignReads --genomeDir IndexRef/GRCg6a/ --outSAMtype BAM SortedByCoordinate --readFilesIn R0629-S0002_L10AU2_A56593_1_HGFCJDSX2_TCGTCTGA-TCAAGGAC_L003_R1_trimmed.fastq R0629-S0002_L10AU2_A56593_1_HGFCJDSX2_TCGTCTGA-TCAAGGAC_L003_R2_trimmed.fastq --outFileNamePrefix mapped/L10/BAM_L10_GRC6a/L10A2 --runThreadN 16

, so this is default STAR: I then got statistics summary from the log.final file. as below

Started job on | Nov 03 23:03:37 Started mapping on | Nov 03 23:03:46 Finished on | Nov 03 23:14:49 Mapping speed, Million of reads per hour | 201.48

                      Number of input reads |       37106706
                  Average input read length |       298
                                UNIQUE READS:
               Uniquely mapped reads number |       31347653
                    Uniquely mapped reads % |       84.48%
                      Average mapped length |       290.62
                   Number of splices: Total |       28565716
        Number of splices: Annotated (sjdb) |       27854207
                   Number of splices: GT/AG |       28127279
                   Number of splices: GC/AG |       310611
                   Number of splices: AT/AC |       25399
           Number of splices: Non-canonical |       102427
                  Mismatch rate per base, % |       0.38%
                     Deletion rate per base |       0.04%
                    Deletion average length |       2.30
                    Insertion rate per base |       0.04%
                   Insertion average length |       2.00
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |       607582
         % of reads mapped to multiple loci |       1.64%
    Number of reads mapped to too many loci |       18388
         % of reads mapped to too many loci |       0.05%
                              UNMAPPED READS:

my questions are:

Are the number of input reads is exactly the same as the one that I have from trimmed versions of R1 and R2 summed up or ?
What is the exact meaning of uniquely mapped reads number ? is that the number of reads that map to only 1 position on the genome ? OR those reads map to < 10 position (so basically between 1 -10 position)
In the output, there is also the multimapped reads: Does the Uniquely mapped reads number include the multimpped reads, which I assume those mapped to < 10 position (default STAR)
What does the reads with 255 mappng Q means ? I counted them using

samtools view -c -q 255 file.bam

but this gave even higher number than the unique mapped reads. I was thinking that reads with 255 are unique mapped or the ones mapped to only one position ? not sure... , but realized that it also contains singletones (could one explain meaning of singletones)

Finally and most important: the BAM file I obtained from this code line does not contain by default any duplicate reads >>> I knew this by counting the PCR duplicates in the file but was 0 using this command

samtools view -c -f 1024 file.bam

Am I right that in such file, no duplicates are there ? when I triled to rerun alignment but including the command

--outSAMunmapped Within

where I should have unmaped and mapped reads in the same BAM file, I found also that duplicate number was 0, so where have these duplicates gone ? i.e. are they removed by default from the BAM file in STAR ? I could not find these info in the STAR manual at all.

Any advice is highly appreciated

Thank

RNAseq123 VariantFiltering Alignment VariantAnnotation • 1.6k views

ADD COMMENT • link updated 20 months ago by James W. MacDonald 66k • written 21 months ago by Mohamed ▴ 30

score 0 · Answer 1 · 2022-11-07

0

Entering edit mode

James W. MacDonald 66k

@james-w-macdonald-5106

Last seen 2 hours ago

United States

This support site is intended to help people with questions about Bioconductor packages. And STAR isn't a Bioconductor package. You might try on biostars.org instead, or perhaps there is a STAR-specific help forum.

ADD COMMENT • link 20 months ago James W. MacDonald 66k