Hi
I would like to use difHiC to get differential domains using two different cell lines. I have converted my normalized matrix for each of them into GInteraction objects.
```
$s2 GInteractions object with 6398123 interactions and 1 metadata column: seqnames1 ranges1 seqnames2 ranges2 | norm.freq <Rle> <IRanges> <Rle> <IRanges> | <numeric> [1] chr2L [72559, 74473] --- chr2L [ 72559, 74473] | 1725.8969929 [2] chr2L [72559, 74473] --- chr2L [ 76649, 80158] | 4548.27044772 [3] chr2L [72559, 74473] --- chr2L [ 80158, 84182] | 4166.33078199 [4] chr2L [72559, 74473] --- chr2L [133166, 137699] | 956.928116437 [5] chr2L [72559, 74473] --- chr2L [205336, 207194] | 245.480797633 ... ... ... ... ... ... . ... [6398119] chrX [22215621, 22220243] --- chrX [22255907, 22256827] | 966.649143049 [6398120] chrX [22215621, 22220243] --- chrX [22401281, 22407854] | 499.251088874 [6398121] chrX [22255907, 22256827] --- chrX [22255907, 22256827] | 419.723838666 [6398122] chrX [22255907, 22256827] --- chrX [22401281, 22407854] | 653.823761634 [6398123] chrX [22401281, 22407854] --- chrX [22401281, 22407854] | 1729.85785275 ------- regions: 8456 ranges and 0 metadata columns seqinfo: 5 sequences from an unspecified genome; no seqlengths $c8 GInteractions object with 45016517 interactions and 1 metadata column: seqnames1 ranges1 seqnames2 ranges2 | norm.freq <Rle> <IRanges> <Rle> <IRanges> | <numeric> [1] chr2L [9879, 11901] --- chr2L [ 9879, 11901] | 20 [2] chr2L [9879, 11901] --- chr2L [11901, 13158] | 359 [3] chr2L [9879, 11901] --- chr2L [13158, 14087] | 196 [4] chr2L [9879, 11901] --- chr2L [14087, 14759] | 202 [5] chr2L [9879, 11901] --- chr2L [14759, 15546] | 20 ... ... ... ... ... ... . ... [45016513] chrYHet [184819, 190734] --- chrYHet [184819, 190734] | 2 [45016514] chrYHet [184819, 190734] --- chrYHet [198239, 215835] | 3 [45016515] chrYHet [198239, 215835] --- chrYHet [198239, 215835] | 59 [45016516] chrYHet [198239, 215835] --- chrYHet [333103, 338457] | 1 [45016517] chrYHet [333103, 338457] --- chrYHet [333103, 338457] | 4 ------- regions: 47740 ranges and 0 metadata columns seqinfo: 14 sequences from an unspecified genome; no seqlengths
```
As you can see, the size of the two objects differ and the bin sizes are also not the same. Therefore I am getting error when trying to create an InteractionSet object out of them.
I want to continue from step 5 of the diffHiC manual. Is it possible to proceed from here on ?
Thanks,
Vivek
Hi Aaron
Thanks for the reply. Using restriction frag size produced different bin sizes (since sometime i get a cut, sometimes not. plus they are different cell lines). But I now overcome this by using fixed bin size to create the matrix (not restriction frag length).
I can overcome #3 since I have two replicates for each cell line.
For #2, I am using ICE normalized counts at this moment, although I have RAW counts also for this data. I can also floor the normalized counts to integers, but I don't know what you recommend. I thought the counts for balanced matrices would be better.
My new GInteraction objects (only pasting half, due to char limit) :
```
```
Okay, let's say we have two
GInteractions
objects. To "merge" them, I would first create a common reference:I would then standardize the regions in all the individual objects to this reference:
I would match the entries of the individual objects to the reference object:
Then use this to generate my count matrix (assuming unobserved interactions have counts of zero):
From this point, it is simple to create an
InteractionSet
object:Thanks Aaron. This works..
Why do you have different fragment sizes between samples? If they came from the same genome with the same restriction enzyme, then the coordinates of each bin should be the same between samples.
P.S. Use the raw counts. Giving normalized values to edgeR is, in general, a Bad Thing.