Question: diffHic --sig option for HiC libraries prepared with restriction enzyme cocktail from Arima genomics
gravatar for shopnil99
3 months ago by
shopnil990 wrote:


I prepared some of my Hi-C libraries with the Arima Genomics Hi-C prep kit that uses a restriction enzyme cocktail. If anyone works with similar libraries, do you know what option I should use for --sig when running to produce bam files.

thanks, - Iftekhar

ADD COMMENTlink modified 3 months ago by Aaron Lun25k • written 3 months ago by shopnil990
Answer: diffHic --sig option for HiC libraries prepared with restriction
gravatar for Aaron Lun
3 months ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

The mapping scripts inside diffHic don't support multiple restriction enzymes. Mostly because I haven't added it, but also because the split-and-map strategy is not very good when there are multiple short ligation signatures. will identify the ligation signature (i.e., the sequence formed by ligating two filled-in sticky ends) and split the read to create fragments that are separately mapped. If you have several short signatures, reads will get split indiscriminately due to random matches, which unnecessarily reduces alignment accuracy.

Several people I know have reported success using Nicolas' HiC-Pro pipeline to get from FASTQs to BAM files for Arima data. From a brief inspection of the code, this uses a map-and-split approach where it first aligns the reads to the genome, keeps everything that mapped, and splits the unmapped reads at their ligation signatures for a second round of alignment. In theory, this should be more robust to the presence of multiple short signatures as reads are only split if there was a problem with their initial alignment - in which case, you don't have anything to lose by splitting them and trying again.

(Now, the obvious question is "why didn't you do a map-and-split approach in the first place?" This was to avoid alignments being dominated by the longer 3' end of chimeric reads. The 5' fragment of the chimeric read is the informative part about interactions, but if the 3' fragment is long enough, the read gets mapped according to the 3' fragment - even in end-to-end-mode, if the 5' fragment is relatively short. This results in loss of information as you now have a dangling end rather than a valid read pair. The "split-and-map" avoided this problem by splitting first so that the 5' and 3' ends were never in competition with each other. However, it assumed that the signature rarely occurred in the genome, which is no longer the case if you are cutting at multiple short restriction sites.)

ADD COMMENTlink written 3 months ago by Aaron Lun25k

Thank you for the explanation Aaron. I went looking for the Hic-Pro pipeline you mentioned and I found a mapping pipeline from Arima at This creates a combined bam file from paired end reads and Marks duplicates using Picard tools. Do you think this output bam file is good to to feed into diffhic?

ADD REPLYlink modified 3 months ago • written 3 months ago by shopnil990

It should be fine as long as the paired reads have the same name. Multiple alignments for segments of chimeric reads are also supported, as long as they have the same name, are hard-clipped, and the 3' segments are marked as secondary alignments. But that's only necessary for calculating some diagnostics; if you don't care about that, then you only need to guarantee that the alignment of the 5' segments are present in the BAM file somewhere.

ADD REPLYlink modified 12 weeks ago • written 12 weeks ago by Aaron Lun25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 321 users visited in the last hour