Question: extract HiC contacts for a given BEDPE set of coordinates
0
10 months ago by
Bethesda, MD
gryderart0 wrote:

I was wondering if you could answer a technical question about computational methods.  I have generated contact maps (.hic format), but I can’t seem to figure out a nice way to extract the contacts for a list of paired region sets.  Example input in BEDPE format:

chr5      1500000             1505000             chr5      1520000             1525000

chr11    1701000             1706000             chr5      1730000             1735000

with the output being something like:

chr5      1500000             1505000             chr5      1520000             1525000              105

chr11    1701000             1706000             chr5      1730000             1735000              78

where the new final column is the number of valid contact pairs between the two regions.

Are there tools for doing this that I just don’t know about?  The juicer tool “dump” only works one coordinate set at a time, for large regions typically, and gets a whole matrix.

Thanks!

Berkley

Berkley Gryder, Ph.D.

Genetics Branch

CCR, NCI, NIH

Building 37, Room 2016

37 Convent Dr.

Bethesda, MD 20892

240-760-6898 (lab), 864-906-2506 (cell)

@GryderArt

chromatin diffhic hic bedpe • 192 views
modified 10 months ago by Aaron Lun24k • written 10 months ago by gryderart0
Answer: extract HiC contacts for a given BEDPE set of coordinates
1
10 months ago by
Aaron Lun24k
Cambridge, United Kingdom
Aaron Lun24k wrote:

It's always easier to demonstrate when there's a mock-up data set ready to go:

writeLines(c(
rep(paste(c("chr5", "1500000", "1505000", "chr5", "1520000", "1525000"), collapse="\t"), 105),
rep(paste(c("chr11", "1701000", "1706000", "chr5", "1730000", "1735000"), collapse="\t"), 74)
), con="example.bedpe"
)

In theory, you should be able to use import from rtracklayer to load this in, but it seems that import for BEDPE files fails without strand information. So let's just construct our GInteractions object manually.

pairings <- read.delim("example.bedpe", header=FALSE)
anchor1 <- GRanges(pairings[[1]], IRanges(pairings[[2]], pairings[[3]]))
anchor2 <- GRanges(pairings[[4]], IRanges(pairings[[5]], pairings[[6]]))

library(InteractionSet)
gi <- GInteractions(anchor1, anchor2, mode="strict")

The strict mode is important as it ensures that interactions are not considered to be different when the order of anchor regions differs. However, if the order of anchor regions has some meaning, then you should stick with mode="normal".

After that, you can use a variety of strategies to get the number of times each interaction occurs. The easiest is probably:

unique.gi <- unique(gi)
uid <- match(gi, unique.gi)
n <- tabulate(uid)

... where n is the number of times each entry of unique.gi is present in gi. Then you can just do:

unique.gi\$Count <- n
to.save <- as.data.frame(unique.gi)

... and use write.table to save whatever you want to a new file. You could also export it with rtracklayer:

export(con="output.bedpe", as(unique.gi, "Pairs"))

... where the last field is the count.