Dear experts, I would like to understand better the output of STAR log.final file alongside with samtools output: I run STAR command to align a paired end reads without including --outSAMunmapped Within
and without outFilterMultimapNmax
nor outFilterMismatchNmax
my code was
STAR --runMode alignReads --genomeDir IndexRef/GRCg6a/ --outSAMtype BAM SortedByCoordinate --readFilesIn R0629-S0002_L10AU2_A56593_1_HGFCJDSX2_TCGTCTGA-TCAAGGAC_L003_R1_trimmed.fastq R0629-S0002_L10AU2_A56593_1_HGFCJDSX2_TCGTCTGA-TCAAGGAC_L003_R2_trimmed.fastq --outFileNamePrefix mapped/L10/BAM_L10_GRC6a/L10A2 --runThreadN 16
, so this is default STAR: I then got statistics summary from the log.final file. as below
Started job on | Nov 03 23:03:37 Started mapping on | Nov 03 23:03:46 Finished on | Nov 03 23:14:49 Mapping speed, Million of reads per hour | 201.48
Number of input reads | 37106706
Average input read length | 298
UNIQUE READS:
Uniquely mapped reads number | 31347653
Uniquely mapped reads % | 84.48%
Average mapped length | 290.62
Number of splices: Total | 28565716
Number of splices: Annotated (sjdb) | 27854207
Number of splices: GT/AG | 28127279
Number of splices: GC/AG | 310611
Number of splices: AT/AC | 25399
Number of splices: Non-canonical | 102427
Mismatch rate per base, % | 0.38%
Deletion rate per base | 0.04%
Deletion average length | 2.30
Insertion rate per base | 0.04%
Insertion average length | 2.00
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 607582
% of reads mapped to multiple loci | 1.64%
Number of reads mapped to too many loci | 18388
% of reads mapped to too many loci | 0.05%
UNMAPPED READS:
Number of reads unmapped: too many mismatches | 0 % of reads unmapped: too many mismatches | 0.00% Number of reads unmapped: too short | 5095820 % of reads unmapped: too short | 13.73% Number of reads unmapped: other | 37263 % of reads unmapped: other | 0.10% CHIMERIC READS: Number of chimeric reads | 0 % of chimeric reads | 0.00%
my questions are:
- Are the number of input reads is exactly the same as the one that I have from trimmed versions of R1 and R2 summed up or ?
- What is the exact meaning of uniquely mapped reads number ? is that the number of reads that map to only 1 position on the genome ? OR those reads map to < 10 position (so basically between 1 -10 position)
- In the output, there is also the multimapped reads: Does the Uniquely mapped reads number include the multimpped reads, which I assume those mapped to < 10 position (default STAR)
What does the reads with 255 mappng Q means ? I counted them using
samtools view -c -q 255 file.bam
but this gave even higher number than the unique mapped reads. I was thinking that reads with 255 are unique mapped or the ones mapped to only one position ? not sure... , but realized that it also contains singletones (could one explain meaning of singletones)
Finally and most important: the BAM file I obtained from this code line does not contain by default any duplicate reads >>> I knew this by counting the PCR duplicates in the file but was 0 using this command
samtools view -c -f 1024 file.bam
Am I right that in such file, no duplicates are there ? when I triled to rerun alignment but including the command
--outSAMunmapped Within
where I should have unmaped and mapped reads in the same BAM file, I found also that duplicate number was 0, so where have these duplicates gone ? i.e. are they removed by default from the BAM file in STAR ? I could not find these info in the STAR manual at all.
Any advice is highly appreciated
Thank