diffHiC differential HiC analysis starting from Ginteractions object
1
0
Entering edit mode
Vivek.b ▴ 100
@vivekb-7661
Last seen 2.2 years ago
Germany

Hi

I would like to use difHiC to get differential domains using two different cell lines. I have converted my normalized matrix for each of them into GInteraction objects.



$s2 GInteractions object with 6398123 interactions and 1 metadata column: seqnames1 ranges1 seqnames2 ranges2 | norm.freq <Rle> <IRanges> <Rle> <IRanges> | <numeric> [1] chr2L [72559, 74473] --- chr2L [ 72559, 74473] | 1725.8969929 [2] chr2L [72559, 74473] --- chr2L [ 76649, 80158] | 4548.27044772 [3] chr2L [72559, 74473] --- chr2L [ 80158, 84182] | 4166.33078199 [4] chr2L [72559, 74473] --- chr2L [133166, 137699] | 956.928116437 [5] chr2L [72559, 74473] --- chr2L [205336, 207194] | 245.480797633 ... ... ... ... ... ... . ... [6398119] chrX [22215621, 22220243] --- chrX [22255907, 22256827] | 966.649143049 [6398120] chrX [22215621, 22220243] --- chrX [22401281, 22407854] | 499.251088874 [6398121] chrX [22255907, 22256827] --- chrX [22255907, 22256827] | 419.723838666 [6398122] chrX [22255907, 22256827] --- chrX [22401281, 22407854] | 653.823761634 [6398123] chrX [22401281, 22407854] --- chrX [22401281, 22407854] | 1729.85785275 ------- regions: 8456 ranges and 0 metadata columns seqinfo: 5 sequences from an unspecified genome; no seqlengths$c8
GInteractions object with 45016517 interactions and 1 metadata column:
seqnames1          ranges1     seqnames2          ranges2 | norm.freq
<Rle>        <IRanges>         <Rle>        <IRanges> | <numeric>
[1]     chr2L    [9879, 11901] ---     chr2L   [ 9879, 11901] |        20
[2]     chr2L    [9879, 11901] ---     chr2L   [11901, 13158] |       359
[3]     chr2L    [9879, 11901] ---     chr2L   [13158, 14087] |       196
[4]     chr2L    [9879, 11901] ---     chr2L   [14087, 14759] |       202
[5]     chr2L    [9879, 11901] ---     chr2L   [14759, 15546] |        20
...       ...              ... ...       ...              ... .       ...
[45016513]   chrYHet [184819, 190734] ---   chrYHet [184819, 190734] |         2
[45016514]   chrYHet [184819, 190734] ---   chrYHet [198239, 215835] |         3
[45016515]   chrYHet [198239, 215835] ---   chrYHet [198239, 215835] |        59
[45016516]   chrYHet [198239, 215835] ---   chrYHet [333103, 338457] |         1
[45016517]   chrYHet [333103, 338457] ---   chrYHet [333103, 338457] |         4
-------
regions: 47740 ranges and 0 metadata columns
seqinfo: 14 sequences from an unspecified genome; no seqlengths



As you can see, the size of the two objects differ and the bin sizes are also not the same. Therefore I am getting error when trying to create an InteractionSet object out of them.

I want to continue from step 5 of the diffHiC manual. Is it possible to proceed from here on ?

Thanks,

Vivek

diffhic interactionset • 868 views
0
Entering edit mode
Aaron Lun ★ 27k
@alun
Last seen 46 minutes ago
The city by the bay

No. There's at least three reasons that I can think of:

• You have different bin pairs in the two objects. How are you going to match them up? What happens to a bin pair in one object that partially overlaps with multiple bin pairs in the other object - which of the overlapping bin pairs is the correct match? What do you do if you have a bin pair that is present in one object and absent in another - should you set the intensity for the latter to zero? (This won't be correct if the entries of your objects represent called interactions, such that missingness does not imply zero.)
• Your normalized interaction frequencies are not counts. edgeR needs counts.
• You don't seem to have any replicates. edgeR needs replicates.

Also, with regards to the first point, your objects have very different numbers of regions and sequences. This should not occur if you generated them directly from contact matrices of the same genome.

0
Entering edit mode

Hi Aaron

Thanks for the reply. Using restriction frag size produced different bin sizes (since sometime i get a cut, sometimes not. plus they are different cell lines). But I now overcome this by using fixed bin size to create the matrix (not restriction frag length).

I can overcome #3 since I have two replicates for each cell line.

For #2, I am using ICE normalized counts at this moment, although I have RAW counts also for this data. I can also floor the normalized counts to integers, but I don't know what you recommend. I thought the counts for balanced matrices would be better.

My new GInteraction objects (only pasting half, due to char limit) :



$s2_2 GInteractions object with 293659 interactions and 1 metadata column: seqnames1 ranges1 seqnames2 ranges2 | norm.freq <Rle> <IRanges> <Rle> <IRanges> | <numeric> [1] chr2L [5000, 10000] --- chr2L [ 5000, 10000] | 23814.2750234 [2] chr2L [5000, 10000] --- chr2L [10000, 15000] | 8456.43596258 [3] chr2L [5000, 10000] --- chr2L [25000, 30000] | 1276.57303962 [4] chr2L [5000, 10000] --- chr2L [60000, 65000] | 858.910217701 [5] chr2L [5000, 10000] --- chr2L [65000, 70000] | 991.555775811 ... ... ... ... ... ... . ... [293655] chrX [22410000, 22415000] --- chrX [22415000, 22420000] | 6260.66759762 [293656] chrX [22410000, 22415000] --- chrX [22420000, 22422827] | 3311.00613075 [293657] chrX [22415000, 22420000] --- chrX [22415000, 22420000] | 65142.0495447 [293658] chrX [22415000, 22420000] --- chrX [22420000, 22422827] | 2274.53704182 [293659] chrX [22420000, 22422827] --- chrX [22420000, 22422827] | 49047.4898882 ------- regions: 2250 ranges and 0 metadata columns seqinfo: 5 sequences from an unspecified genome; no seqlengths$c8_1
GInteractions object with 17320314 interactions and 1 metadata column:
seqnames1              ranges1     seqnames2              ranges2 |     norm.freq
<Rle>            <IRanges>         <Rle>            <IRanges> |     <numeric>
[1]     chr2L        [5000, 10000] ---     chr2L       [ 5000, 10000] | 84.5360595733
[2]     chr2L        [5000, 10000] ---     chr2L       [15000, 20000] | 177.442529967
[3]     chr2L        [5000, 10000] ---     chr2L       [20000, 25000] | 62.7259423407
[4]     chr2L        [5000, 10000] ---     chr2L       [25000, 30000] | 54.1675658477
[5]     chr2L        [5000, 10000] ---     chr2L       [30000, 35000] | 35.1780907057
...       ...                  ... ...       ...                  ... .           ...
[17320310]      chrX [22405000, 22410000] ---      chrX [22420000, 22422827] | 42.9441380746
[17320311]      chrX [22410000, 22415000] ---      chrX [22410000, 22415000] | 228.990371705
[17320312]      chrX [22410000, 22415000] ---      chrX [22420000, 22422827] | 89.3560095486
[17320313]      chrX [22415000, 22420000] ---      chrX [22415000, 22420000] | 140.273595774
[17320314]      chrX [22420000, 22422827] ---      chrX [22420000, 22422827] | 280.963274035
-------
regions: 23502 ranges and 0 metadata columns
seqinfo: 5 sequences from an unspecified genome; no seqlengths



1
Entering edit mode

Okay, let's say we have two GInteractions objects. To "merge" them, I would first create a common reference:

combined <- unique(c(gi1, gi2))


I would then standardize the regions in all the individual objects to this reference:

replaceRegions(gi1) <- regions(combined)
replaceRegions(gi2) <- regions(combined)

I would match the entries of the individual objects to the reference object:

m1 <- match(gi1, combined)
m2 <- match(gi2, combined)

Then use this to generate my count matrix (assuming unobserved interactions have counts of zero):

counts <- matrix(0, ncol=2, nrow=length(combined))
counts[m1,1] <- gi1$counts counts[m2,2] <- gi2$counts


From this point, it is simple to create an InteractionSet object:

iset <- InteractionSet(counts, combined)
0
Entering edit mode

Thanks Aaron. This works..

0
Entering edit mode

Why do you have different fragment sizes between samples? If they came from the same genome with the same restriction enzyme, then the coordinates of each bin should be the same between samples.

P.S. Use the raw counts. Giving normalized values to edgeR is, in general, a Bad Thing.