Question

Loading a precomputed interaction matrix into diffHiC

0

Entering edit mode

Darya Vanichkina ▴ 130

@darya-vanichkina-6050

Last seen 8.4 years ago

Australia/Centenary Institute Universit…

I've got a precomputed HiC matrix from HiC-Pro (both raw and iced), and a bed file with bin coordinates. Briefly, these take the form of

# bed

chr start end binID

chr1 0 10000 1
chr1 10000 20000 2

# raw (so bin1-bin2-#readsInteractingBetweenThem)

bin1 bin2 #reads

2 8165 1
2 17384 1

#iced

2 8165 9.657282
2 17384 7.582952

I've got this for different bin sizes, and for the in silico restriction enzyme digested genome.

Can I feed this into diffHiC/edgeR? How do I create a DGElist object from the iced matrix, and process it? Thanks in advance!

diffhic • 1.8k views

ADD COMMENT • link updated 9.1 years ago by Aaron Lun ★ 28k • written 9.1 years ago by Darya Vanichkina ▴ 130

score 1 · Answer 1 · 2016-05-18

There's a couple of steps to get this into an InteractionSet that can be processed by diffHic. I'm going to take some liberties with your file names, but you should be able to figure it out. First, we load the bin coordinates:

library(rtracklayer)
bin.coords <- import("bin.bed")
bin.coords <- as(bin.coords, "GRanges")

We then construct a InteractionSet object using the raw indices and the imported GRanges.

# assuming raw is a data.frame
library(InteractionSet)
gi <- GInteractions(raw$bin1, raw$bin2, bin.coords, mode="reverse")
iset <- InteractionSet(as.matrix(raw$reads), gi)

Now it gets complicated, depending on whether HiC-Pro reports the same interactions for each sample or not. If it does, then you can simply cbind all of the iset objects together. If it doesn't, then you need to do something like this:

all.gi <- unique(c(gi1, gi2))
all.counts <- matrix(0, ncol=2, nrow=length(all.gi))
all.counts[match(gi1, all.gi), 1] <- assay(iset1)
all.counts[match(gi2, all.gi), 2] <- assay(iset2)
iset.final <- InteractionSet(all.counts, all.gi)

The simpler solution would just be to run preparePairs off the BAM files from HiC-Pro (if available) and then call squareCounts on the resulting HDF5 files. This will generate the InteractionSet object directly from all samples, without the need for any of the hacking around shown above. Or, as an intermediate approach, you can save the restriction fragment assignments from HiC-Pro with savePairs, and then call squareCounts on the HDF5 files that are produced.

Finally, don't use the ICE-adjusted values in diffHic, as it makes no sense to use normalized values as input into edgeR. (See various discussions about this on the support site, e.g., regarding FPKMs and CPMs.) ICE is mostly unnecessary for a differential analysis as genomic biases should largely cancel out when you compare between samples for a particular interaction. If you really, really wanted to use them, then I would suggest computing the log-fold change between the ICE and raw counts, and using that as the offset during GLM fitting for each observation in each sample - see Section 5.3 of the diffHic user's guide for an example.