Question

from denove contig assembly to chromosome bins

0

Entering edit mode

mictadlo ▴ 10

@mictadlo-10885

Last seen 4.5 years ago

Hi, By any chance, do anyone have any recommendations which software could provide me with chromosome bins from a denovo contig assembly by using HiC data and without any annotation?

Thank you in advance.

diffhic gothic hicup hic-c hic • 1.6k views

ADD COMMENT • link updated 5.6 years ago by Aaron Lun ★ 28k • written 5.6 years ago by mictadlo ▴ 10

score 0 · Answer 1 · 2018-12-14

0

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 8 hours ago

The city by the bay

If you want a contact matrix, many software packages can do this without any particular need for annotation. I can't speak for the others, but with diffHic, the workflow would be something like (i) align reads to the reference genome, (ii) call squareCounts to count the number of reads in each pair of chromosomal bins, and (iii) use InteractionSet::inflate to convert to a ContactMatrix.

ADD COMMENT • link 5.6 years ago Aaron Lun ★ 28k

0

Entering edit mode

How is it possible to extract chromosome bins out of `ContactMatrix`?

ADD REPLY • link 5.6 years ago mictadlo ▴ 10

0

Entering edit mode

Each row and column is a bin. You can call anchors to get the genomic coordinates.

ADD REPLY • link 5.6 years ago Aaron Lun ★ 28k

0

Entering edit mode

Does those anchors would allow me to extract for each chromosomes a FASTA file e.g. we expecting to get 19 chromosomes so we would like to get 19 FASTA files? Do you have any code examples?

ADD REPLY • link 5.6 years ago mictadlo ▴ 10

0

Entering edit mode

Where did the FASTA files come from? Once you've done the alignment, the sequence of the assembly has nothing to do with Hi-C data analysis. If you want to get subsequences, use the relevant functions from Biostrings.

ADD REPLY • link 5.6 years ago Aaron Lun ★ 28k

0

Entering edit mode

Our goal is to scaffold our 4000 contings into 19 scaffolds/chromosomes with help of HiC data. Similar to [this](https://www.nature.com/articles/s41438-017-0013-y/figures/1). Unfortunately, LACHESIS is not working on our cluster and therefore I am looking for alternatives.

ADD REPLY • link 5.6 years ago mictadlo ▴ 10

0

Entering edit mode

AFAIK, all of the packages you listed in your question are designed for analyzing interaction intensities - which is, after all, the purpose of doing Hi-C in the first place. I don't think they have any functions to perform/refine a genome assembly from Hi-C data. In fact, I don't know of many Bioconductor packages for genome assembly, I think most people would consider that to be a heavy-duty "pre-processing" step ("pre" because it occurs before biological interpretation) that is better handled by command line tools.

ADD REPLY • link 5.6 years ago Aaron Lun ★ 28k

0

Entering edit mode

I used [HiCExplorer](https://academic.oup.com/nar/article/46/W1/W11/5036837) with help of [snakePipes](https://www.biorxiv.org/content/early/2018/09/04/407312). Do you think that HiCExplorer's output could help to get chromosome groups?

ADD REPLY • link 5.6 years ago mictadlo ▴ 10

0

Entering edit mode

Those don't seem to be Biocondcutor packages, so you'd have to ask the authors directly.

ADD REPLY • link 5.6 years ago Aaron Lun ★ 28k