Question

Copyhelper: Preprocession of mappability

0

Entering edit mode

Felix • 0

@felix-9458

Last seen 9.0 years ago

Hallo!

I'm trying to understand how the preprocessing of the mappability is implemented.

In the CopyhelpeR package the mappability is defined as followed:

"The mappability data were obtained by aligning all possible 51 base pair
genomic fragments using BWA (http://bio-bwa.sourceforge.net/). The
mappability of every fragment was binarized, and the mappability of a specific region
is taken as the average mappability of all fragments that fall into this region."

Now I'm wondering what the mappability of a fragment exactly is since
there is no such value defined in the SAM-Format and why you chose a bp length of 51.

Thanks,
Felix

CopywriteR • 1.4k views

ADD COMMENT • link updated 9.0 years ago by t.kuilman ▴ 170 • written 9.0 years ago by Felix • 0

score 2 · Accepted Answer · 2016-01-05

Hi Felix,

Thank you very much for your interest in CopywriteR. As a measure for mappability at position x we tested whether the 51 base pairs surrounding position x were uniquely mapped (mappability = 1) or not (mappability = 0). Since we use a binned approach we calculate the mappability for a specific region by averaging the individual mappabilities at all the positions contained within a particular bin. The approach was initially designed for single-end reads, but works well for paired-end reads too (to check this you can open the .png files in the CNAprofiles/qc folder).

With regard to your question why we chose 51 bp length: there is no particular reason why we chose this length and we could have chosen a bigger length as well. As far as I am aware all mappability data are (and should be) depending on read length / kmer size though. Unfortunately we cannot provide the helper files for commonly used read lengths / kmers due to space restrictions so we have settled for 51 bp. I know this is an imperfect solution so if you would have a better alternative I would be happy to know.

I hope this answers your question.

Best,

Thomas