Search
Question: extract HiC contacts for a given BEDPE set of coordinates
0
gravatar for gryderart
3 months ago by
gryderart0
Bethesda, MD
gryderart0 wrote:

I was wondering if you could answer a technical question about computational methods.  I have generated contact maps (.hic format), but I can’t seem to figure out a nice way to extract the contacts for a list of paired region sets.  Example input in BEDPE format:

 

chr5      1500000             1505000             chr5      1520000             1525000

chr11    1701000             1706000             chr5      1730000             1735000

 

 

with the output being something like:

chr5      1500000             1505000             chr5      1520000             1525000              105

chr11    1701000             1706000             chr5      1730000             1735000              78

 

where the new final column is the number of valid contact pairs between the two regions. 

 

Are there tools for doing this that I just don’t know about?  The juicer tool “dump” only works one coordinate set at a time, for large regions typically, and gets a whole matrix.

 

Thanks!

Berkley

 

 

Berkley Gryder, Ph.D.

Genetics Branch

CCR, NCI, NIH

Building 37, Room 2016

37 Convent Dr.

Bethesda, MD 20892

240-760-6898 (lab), 864-906-2506 (cell)

@GryderArt

 

ADD COMMENTlink modified 3 months ago by Aaron Lun21k • written 3 months ago by gryderart0
1
gravatar for Aaron Lun
3 months ago by
Aaron Lun21k
Cambridge, United Kingdom
Aaron Lun21k wrote:

It's always easier to demonstrate when there's a mock-up data set ready to go:

writeLines(c(
    rep(paste(c("chr5", "1500000", "1505000", "chr5", "1520000", "1525000"), collapse="\t"), 105),
    rep(paste(c("chr11", "1701000", "1706000", "chr5", "1730000", "1735000"), collapse="\t"), 74)
    ), con="example.bedpe"
)

In theory, you should be able to use import from rtracklayer to load this in, but it seems that import for BEDPE files fails without strand information. So let's just construct our GInteractions object manually.

pairings <- read.delim("example.bedpe", header=FALSE)
anchor1 <- GRanges(pairings[[1]], IRanges(pairings[[2]], pairings[[3]]))
anchor2 <- GRanges(pairings[[4]], IRanges(pairings[[5]], pairings[[6]]))

library(InteractionSet)
gi <- GInteractions(anchor1, anchor2, mode="strict")

The strict mode is important as it ensures that interactions are not considered to be different when the order of anchor regions differs. However, if the order of anchor regions has some meaning, then you should stick with mode="normal".

After that, you can use a variety of strategies to get the number of times each interaction occurs. The easiest is probably:

unique.gi <- unique(gi)
uid <- match(gi, unique.gi)
n <- tabulate(uid)

... where n is the number of times each entry of unique.gi is present in gi. Then you can just do:

unique.gi$Count <- n
to.save <- as.data.frame(unique.gi)

... and use write.table to save whatever you want to a new file. You could also export it with rtracklayer:

export(con="output.bedpe", as(unique.gi, "Pairs"))

... where the last field is the count.

ADD COMMENTlink modified 3 months ago • written 3 months ago by Aaron Lun21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 365 users visited in the last hour