Dear all,
I have a very large and open question concerning the methods to analyse genome editing experiments with sequencing. I am using the CRISPR system to create indels at precise locations in the genome and I am interesting to determine the efficiency of gene disruption by determining the percentage of reads containing indels. I am planning to amplify by PCR the target region and then to use paired-end sequencing with the MiSeq system. Before doing the experiment, I looked for the different published methods to count the number of reads with indels and I found a lot of different ways to obtain this result. And now I would say that it is confusing for me which method could be better than the other ones ("the paradox of choice"...). So, does anyone know if there is a validated method for indels quantifications?
In particular, I found papers using Bowtie/BWA alignment but also some of them rather use Smith-Waterman algorithm. Is there a prefered method for indel quantification? I found that BWA-MEM is efficient for variant analysis and compatible with longer read length (300-600bp for MiSeq), is it really the case?
Then, there are various methods to count the number of reads containing indels. I found mainly methods determining the percent of NHEJ based on the number of reads that could be aligned to the reference amplicon sequence in 2 blocks, other ones using GATK for whole genome sequencing in such experiments and another one estimating the probability that a read carries a real indel based on the mean quality score of each read (I think this is the quality score of aligment resulting from Smith-Waterman algorithm, but I am not sure). Is there an efficient and correct method already known for this quantification? Is it possible to use GATK for such analysis on PCR products?
I hope my explanations are sufficiently clear. Thank you very much for your help.
Best,
Nicolas