extract HiC contacts for a given BEDPE set of coordinates
Entering edit mode
gryderart • 0
Last seen 4.1 years ago
Bethesda, MD

I was wondering if you could answer a technical question about computational methods.  I have generated contact maps (.hic format), but I can’t seem to figure out a nice way to extract the contacts for a list of paired region sets.  Example input in BEDPE format:


chr5      1500000             1505000             chr5      1520000             1525000

chr11    1701000             1706000             chr5      1730000             1735000



with the output being something like:

chr5      1500000             1505000             chr5      1520000             1525000              105

chr11    1701000             1706000             chr5      1730000             1735000              78


where the new final column is the number of valid contact pairs between the two regions. 


Are there tools for doing this that I just don’t know about?  The juicer tool “dump” only works one coordinate set at a time, for large regions typically, and gets a whole matrix.






Berkley Gryder, Ph.D.

Genetics Branch


Building 37, Room 2016

37 Convent Dr.

Bethesda, MD 20892

240-760-6898 (lab), 864-906-2506 (cell)



hic diffhic chromatin bedpe • 645 views
Entering edit mode
Aaron Lun ★ 27k
Last seen 8 hours ago
The city by the bay

It's always easier to demonstrate when there's a mock-up data set ready to go:

    rep(paste(c("chr5", "1500000", "1505000", "chr5", "1520000", "1525000"), collapse="\t"), 105),
    rep(paste(c("chr11", "1701000", "1706000", "chr5", "1730000", "1735000"), collapse="\t"), 74)
    ), con="example.bedpe"

In theory, you should be able to use import from rtracklayer to load this in, but it seems that import for BEDPE files fails without strand information. So let's just construct our GInteractions object manually.

pairings <- read.delim("example.bedpe", header=FALSE)
anchor1 <- GRanges(pairings[[1]], IRanges(pairings[[2]], pairings[[3]]))
anchor2 <- GRanges(pairings[[4]], IRanges(pairings[[5]], pairings[[6]]))

gi <- GInteractions(anchor1, anchor2, mode="strict")

The strict mode is important as it ensures that interactions are not considered to be different when the order of anchor regions differs. However, if the order of anchor regions has some meaning, then you should stick with mode="normal".

After that, you can use a variety of strategies to get the number of times each interaction occurs. The easiest is probably:

unique.gi <- unique(gi)
uid <- match(gi, unique.gi)
n <- tabulate(uid)

... where n is the number of times each entry of unique.gi is present in gi. Then you can just do:

unique.gi$Count <- n
to.save <- as.data.frame(unique.gi)

... and use write.table to save whatever you want to a new file. You could also export it with rtracklayer:

export(con="output.bedpe", as(unique.gi, "Pairs"))

... where the last field is the count.


Login before adding your answer.

Traffic: 329 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6