I'm trying to understand how the preprocessing of the mappability is implemented.
In the CopyhelpeR package the mappability is defined as followed:
"The mappability data were obtained by aligning all possible 51 base pair
genomic fragments using BWA (http://bio-bwa.sourceforge.net/). The
mappability of every fragment was binarized, and the mappability of a specific region
is taken as the average mappability of all fragments that fall into this region."
Now I'm wondering what the mappability of a fragment exactly is since
there is no such value defined in the SAM-Format and why you chose a bp length of 51.
Thank you very much for your interest in CopywriteR. As a measure for mappability at position x we tested whether the 51 base pairs surrounding position x were uniquely mapped (mappability = 1) or not (mappability = 0). Since we use a binned approach we calculate the mappability for a specific region by averaging the individual mappabilities at all the positions contained within a particular bin. The approach was initially designed for single-end reads, but works well for paired-end reads too (to check this you can open the .png files in the CNAprofiles/qc folder).
With regard to your question why we chose 51 bp length: there is no particular reason why we chose this length and we could have chosen a bigger length as well. As far as I am aware all mappability data are (and should be) depending on read length / kmer size though. Unfortunately we cannot provide the helper files for commonly used read lengths / kmers due to space restrictions so we have settled for 51 bp. I know this is an imperfect solution so if you would have a better alternative I would be happy to know.