There's a couple of steps to get this into an InteractionSet
that can be processed by diffHic. I'm going to take some liberties with your file names, but you should be able to figure it out. First, we load the bin coordinates:
library(rtracklayer)
bin.coords <- import("bin.bed")
bin.coords <- as(bin.coords, "GRanges")
We then construct a InteractionSet
object using the raw
indices and the imported GRanges
.
# assuming raw is a data.frame
library(InteractionSet)
gi <- GInteractions(raw$bin1, raw$bin2, bin.coords, mode="reverse")
iset <- InteractionSet(as.matrix(raw$reads), gi)
Now it gets complicated, depending on whether HiC-Pro reports the same interactions for each sample or not. If it does, then you can simply cbind
all of the iset
objects together. If it doesn't, then you need to do something like this:
all.gi <- unique(c(gi1, gi2))
all.counts <- matrix(0, ncol=2, nrow=length(all.gi))
all.counts[match(gi1, all.gi), 1] <- assay(iset1)
all.counts[match(gi2, all.gi), 2] <- assay(iset2)
iset.final <- InteractionSet(all.counts, all.gi)
The simpler solution would just be to run preparePairs
off the BAM files from HiC-Pro (if available) and then call squareCounts
on the resulting HDF5 files. This will generate the InteractionSet
object directly from all samples, without the need for any of the hacking around shown above. Or, as an intermediate approach, you can save the restriction fragment assignments from HiC-Pro with savePairs
, and then call squareCounts
on the HDF5 files that are produced.
Finally, don't use the ICE-adjusted values in diffHic, as it makes no sense to use normalized values as input into edgeR. (See various discussions about this on the support site, e.g., regarding FPKMs and CPMs.) ICE is mostly unnecessary for a differential analysis as genomic biases should largely cancel out when you compare between samples for a particular interaction. If you really, really wanted to use them, then I would suggest computing the log-fold change between the ICE and raw counts, and using that as the offset during GLM fitting for each observation in each sample - see Section 5.3 of the diffHic user's guide for an example.