Copyhelper: Preprocession of mappability
Entering edit mode
Felix • 0
Last seen 5.7 years ago


I'm trying to understand how the preprocessing of the mappability is implemented.

In the CopyhelpeR package the mappability is defined as followed:

"The mappability data were obtained by aligning all possible 51 base pair
genomic fragments using BWA ( The
mappability of every fragment was binarized, and the mappability of a specific region
is taken as the average mappability of all fragments that fall into this region."

Now I'm wondering what the mappability of a fragment exactly is since
there is no such value defined in the SAM-Format and why you chose a bp length of 51.


CopywriteR • 766 views
Entering edit mode
Last seen 18 months ago

Hi Felix,

Thank you very much for your interest in CopywriteR. As a measure for mappability at position x we tested whether the 51 base pairs surrounding position x were uniquely mapped (mappability = 1) or not (mappability = 0). Since we use a binned approach we calculate the mappability for a specific region by averaging the individual mappabilities at all the positions contained within a particular bin. The approach was initially designed for single-end reads, but works well for paired-end reads too (to check this you can open the .png files in the CNAprofiles/qc folder).

With regard to your question why we chose 51 bp length: there is no particular reason why we chose this length and we could have chosen a bigger length as well. As far as I am aware all mappability data are (and should be) depending on read length / kmer size though. Unfortunately we cannot provide the helper files for commonly used read lengths / kmers due to space restrictions so we have settled for 51 bp. I know this is an imperfect solution so if you would have a better alternative I would be happy to know.

I hope this answers your question.




Login before adding your answer.

Traffic: 247 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6