How to load counts from existing count matrices in diffHic package
1
0
Entering edit mode
girardot • 0
@girardot-23437
Last seen 3.9 years ago

I have raw count HiC interaction matrices (using restriction enzyme fragments) computed using HiCExplorer tool. I am trying to use them to detect differential contacts using the diffHic R package. I have the matrices in h5 format but exported txt matrices (through cooler version). I.e. I have the following

  1. The bed like fragment matrix (char, start, end):
chr2L   0   5663
chr2L   5663    6452
chr2L   6452    7307
  1. The count matrix (idx1, idx2, counts ; where idx1&idx2 refer to the fragment index in the fragment matrix) :
0   0   1
0   1   1
0   2   6

My intention was to follow the advice from the diffHic vignette (https://bioconductor.org/packages/release/bioc/vignettes/diffHic/inst/doc/diffHicUsersGuide.pdf section 3.6) and load my data into several ContactMatrix (1 per sample) from the InteractionSet but I am stuck at creating such a ContactMatrix.

What I did :

> library(rtracklayer)
> library(tidyverse)
# load fragments (bed like matrix)
> restrSites = rtracklayer::import( file("RestrSites_bins.txt"), format = "bed")

# read in the count matrix (second matrix), increase index number by 1 since R is 1-based
> M <- read_delim("mysecondmatrix.txt", delim = "\t", col_names = c("idx1", "idx2", "cnt") , col_types = > cols( idx1 = col_integer(), idx2 = col_integer(), cnt = col_integer() ) ) %>% 
>   dplyr::mutate( start=as.integer(idx1+1), end=as.integer(idx2+1) ) %>% 
>   dplyr::select(start, end, cnt) 

# create a GInteractions
> gi <- GInteractions(M$start, M$end, restrSites)

# convert to ContactMatrix --version1 : use boolean to exclude all fragments from "weird" chr
> keep = as.logical( seqnames(regions(gi)) %in% c("chr2L","chr2R","chr3L","chr3R", "chrX", "chr4") )
> cm <- InteractionSet::inflate(gi, keep, keep, fill=M$cnt[keep])

Error in out.mat[(ac2[relevantA] - 1L) * nR + ar1[relevantA]] <- fill[relevantA] : 
  NAs are not allowed in subscripted assignments
In addition: Warning message:
In (ac2[relevantA] - 1L) * nR : NAs produced by integer overflow

# convert to ContactMatrix --version2: keep all
> cm <- InteractionSet::inflate(gi, NULL, NULL, fill=M$cnt[keep])

Error in out.mat[(ac2[relevantA] - 1L) * nR + ar1[relevantA]] <- fill[relevantA] : 
  NAs are not allowed in subscripted assignments
In addition: Warning message:
In (ac2[relevantA] - 1L) * nR : NAs produced by integer overflow

if anyone could point me at the mistake or suggest another way to load the data into diffHic package, that'd be great (I can also convert h5 to hic format, bedpe, ...).

diffHic HiC InteractionSet • 795 views
ADD COMMENT
0
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 19 hours ago
The city by the bay

I can't reproduce the error, so you'll have to provide some kind of minimum reproducible example. I'll just mention that keep is subsetting the regions, but M$cnts refers to the counts for interactions. It doesn't make much sense to subset the interactions by a logical vector for the regions, they shouldn't even be of the same length.

ADD COMMENT

Login before adding your answer.

Traffic: 846 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6