I am studying the ATAC-seq data analysis pipeline recently. For the pre-alignment QC step, I used FastQC and trimmomatic tools. Then more than half of the reads were reserved and mapped to the reference genome using Bowtie2. After sequence alignment, duplicated reads, mitochondrial reads, non-unique alignments and improperly mapped reads were filterd. To evaluate the quality metrics of ATAC-seq data, fragement size distribution plots were generated using ATACseqQC.
Typically, there should be a large proportion of reads with less than 100 bp, which represents the nucleosome-free region. However, in my plots, the highest peaks is not in 50~100 bp but 200 bp, which corresponds to where Tn5 inserted around a single nucleosome. And I observed that in both two biological replicates, no matter filtering or not. And I wonder Why this happend? Is there any wrong in my data processing?
Thanks in advance
Figure 1. Fragment size distribution before filtering (There are many mitochondrial reads)
Figure 2. Fragment size distribution after filtering
# include your problematic code here with any corresponding output # please also include the results of running the following in an R session sessionInfo( )