Question

saving raw interaction matrices using diffHic

0

Entering edit mode

roladali ▴ 20

@roladali-9193

Last seen 8.5 years ago

I have been trying to follow the user guide for diffHic. I would like to save the raw interaction matrices for each pair of chromosomes to be able to use it for other analysis but dont seem to be able to. I would like a file in the form of:

Bin1 Bin2 Bin3 Bin4 Bin5

Bin1 1 1 0 5 7

Bin2 3 4 1 1 2

Bin3

Bin4

Bin5

I understand that the data is saved in the DIList object but am not sure how to extract it in the format I need.

-- I tried as.matrix() but it does not returned non-integer negative numbers which are not read counts.

Thanks!

diffhic interaction matrix • 1.4k views

ADD COMMENT • link 9.3 years ago roladali ▴ 20

score 0 · Answer 1 · 2015-11-14

By default, as.matrix will return a contact matrix filled with the average log-CPM for each bin pair across all libraries. This is the simplest representation when you have multiple samples, because otherwise you'd need a three-dimensional array to contain all of the data. However, if you just want to get the counts for one sample, you can supply the read counts directly via the fill argument. For example, if you have a DIList object named blah, and you wanted to get the matrix for chromosome A, you could do something like:

as.matrix(blah, anchor="chrA", fill=counts(blah)[,1])

... to fill the matrix with the read counts for sample 1. The next release of diffHic will (hopefully) use a base class that handles this conversion more naturally - see here if you want a sneak peak of what it'll look like.

score 0 · Answer 2 · 2015-11-15

Thanks for the quick response! It works of course but the bin numbers in the rows and columns don't match the expected number of bins; I have used hg19 and a bin size of 1Mb which roughly gives me 250 bins for chr1. However, I get 231.

> str(as.matrix(data, first="chr1", second="chr1", fill=counts(data)[,1]), )
int [1:231, 1:231] 562 816 61 46 14 10 43 19 14 19 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:231] "983" "984" "985" "986" ...
..$ : chr [1:231] "983" "984" "985" "986" ...

I thought perhaps bins that have no interactions with any other bins are removed but that doesnt seem to be the case. What is more surprising is that chr2 yields more bins than chr1 (243 vs 231 respectively). I am not sure what is going on. Can I label the bins with their ranges to avoid ambiguity? and where are the missing bins?

str(as.matrix(data, first="chr2", second="chr2", fill=counts(data)[,1]), )
int [1:243, 1:243] 4334 2566 897 667 320 283 322 234 176 102 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:243] "1356" "1357" "1358" "1359" ...
..$ : chr [1:243] "1356" "1357" "1358" "1359" ...

Thanks!