I have raw count HiC interaction matrices (using restriction enzyme fragments) computed using HiCExplorer tool. I am trying to use them to detect differential contacts using the diffHic
R package. I have the matrices in h5 format but exported txt matrices (through cooler version). I.e. I have the following
- The bed like fragment matrix (char, start, end):
chr2L 0 5663
chr2L 5663 6452
chr2L 6452 7307
- The count matrix (idx1, idx2, counts ; where idx1&idx2 refer to the fragment index in the fragment matrix) :
0 0 1
0 1 1
0 2 6
My intention was to follow the advice from the diffHic vignette (https://bioconductor.org/packages/release/bioc/vignettes/diffHic/inst/doc/diffHicUsersGuide.pdf section 3.6) and load my data into several ContactMatrix (1 per sample) from the InteractionSet but I am stuck at creating such a ContactMatrix.
What I did :
> library(rtracklayer)
> library(tidyverse)
# load fragments (bed like matrix)
> restrSites = rtracklayer::import( file("RestrSites_bins.txt"), format = "bed")
# read in the count matrix (second matrix), increase index number by 1 since R is 1-based
> M <- read_delim("mysecondmatrix.txt", delim = "\t", col_names = c("idx1", "idx2", "cnt") , col_types = > cols( idx1 = col_integer(), idx2 = col_integer(), cnt = col_integer() ) ) %>%
> dplyr::mutate( start=as.integer(idx1+1), end=as.integer(idx2+1) ) %>%
> dplyr::select(start, end, cnt)
# create a GInteractions
> gi <- GInteractions(M$start, M$end, restrSites)
# convert to ContactMatrix --version1 : use boolean to exclude all fragments from "weird" chr
> keep = as.logical( seqnames(regions(gi)) %in% c("chr2L","chr2R","chr3L","chr3R", "chrX", "chr4") )
> cm <- InteractionSet::inflate(gi, keep, keep, fill=M$cnt[keep])
Error in out.mat[(ac2[relevantA] - 1L) * nR + ar1[relevantA]] <- fill[relevantA] :
NAs are not allowed in subscripted assignments
In addition: Warning message:
In (ac2[relevantA] - 1L) * nR : NAs produced by integer overflow
# convert to ContactMatrix --version2: keep all
> cm <- InteractionSet::inflate(gi, NULL, NULL, fill=M$cnt[keep])
Error in out.mat[(ac2[relevantA] - 1L) * nR + ar1[relevantA]] <- fill[relevantA] :
NAs are not allowed in subscripted assignments
In addition: Warning message:
In (ac2[relevantA] - 1L) * nR : NAs produced by integer overflow
if anyone could point me at the mistake or suggest another way to load the data into diffHic package, that'd be great (I can also convert h5 to hic format, bedpe, ...).