Question: diffHiC differential HiC analysis starting from Ginteractions object
0
gravatar for Vivek.b
3.2 years ago by
Vivek.b100
Germany
Vivek.b100 wrote:

Hi 

I would like to use difHiC to get differential domains using two different cell lines. I have converted my normalized matrix for each of them into GInteraction objects.

 

```

$s2
GInteractions object with 6398123 interactions and 1 metadata column:
            seqnames1              ranges1     seqnames2              ranges2 |     norm.freq
                <Rle>            <IRanges>         <Rle>            <IRanges> |     <numeric>
        [1]     chr2L       [72559, 74473] ---     chr2L     [ 72559,  74473] |  1725.8969929
        [2]     chr2L       [72559, 74473] ---     chr2L     [ 76649,  80158] | 4548.27044772
        [3]     chr2L       [72559, 74473] ---     chr2L     [ 80158,  84182] | 4166.33078199
        [4]     chr2L       [72559, 74473] ---     chr2L     [133166, 137699] | 956.928116437
        [5]     chr2L       [72559, 74473] ---     chr2L     [205336, 207194] | 245.480797633
        ...       ...                  ... ...       ...                  ... .           ...
  [6398119]      chrX [22215621, 22220243] ---      chrX [22255907, 22256827] | 966.649143049
  [6398120]      chrX [22215621, 22220243] ---      chrX [22401281, 22407854] | 499.251088874
  [6398121]      chrX [22255907, 22256827] ---      chrX [22255907, 22256827] | 419.723838666
  [6398122]      chrX [22255907, 22256827] ---      chrX [22401281, 22407854] | 653.823761634
  [6398123]      chrX [22401281, 22407854] ---      chrX [22401281, 22407854] | 1729.85785275
  -------
  regions: 8456 ranges and 0 metadata columns
  seqinfo: 5 sequences from an unspecified genome; no seqlengths

$c8
GInteractions object with 45016517 interactions and 1 metadata column:
             seqnames1          ranges1     seqnames2          ranges2 | norm.freq
                 <Rle>        <IRanges>         <Rle>        <IRanges> | <numeric>
         [1]     chr2L    [9879, 11901] ---     chr2L   [ 9879, 11901] |        20
         [2]     chr2L    [9879, 11901] ---     chr2L   [11901, 13158] |       359
         [3]     chr2L    [9879, 11901] ---     chr2L   [13158, 14087] |       196
         [4]     chr2L    [9879, 11901] ---     chr2L   [14087, 14759] |       202
         [5]     chr2L    [9879, 11901] ---     chr2L   [14759, 15546] |        20
         ...       ...              ... ...       ...              ... .       ...
  [45016513]   chrYHet [184819, 190734] ---   chrYHet [184819, 190734] |         2
  [45016514]   chrYHet [184819, 190734] ---   chrYHet [198239, 215835] |         3
  [45016515]   chrYHet [198239, 215835] ---   chrYHet [198239, 215835] |        59
  [45016516]   chrYHet [198239, 215835] ---   chrYHet [333103, 338457] |         1
  [45016517]   chrYHet [333103, 338457] ---   chrYHet [333103, 338457] |         4
  -------
  regions: 47740 ranges and 0 metadata columns
  seqinfo: 14 sequences from an unspecified genome; no seqlengths

```

 

As you can see, the size of the two objects differ and the bin sizes are also not the same. Therefore I am getting error when trying to create an InteractionSet object out of them.

 

I want to continue from step 5 of the diffHiC manual. Is it possible to proceed from here on ?

 

Thanks,

Vivek

diffhic interactionset • 583 views
ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by Vivek.b100
Answer: diffHiC differential HiC analysis starting from Ginteractions object
0
gravatar for Aaron Lun
3.2 years ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

No. There's at least three reasons that I can think of:

  • You have different bin pairs in the two objects. How are you going to match them up? What happens to a bin pair in one object that partially overlaps with multiple bin pairs in the other object - which of the overlapping bin pairs is the correct match? What do you do if you have a bin pair that is present in one object and absent in another - should you set the intensity for the latter to zero? (This won't be correct if the entries of your objects represent called interactions, such that missingness does not imply zero.)
  • Your normalized interaction frequencies are not counts. edgeR needs counts.
  • You don't seem to have any replicates. edgeR needs replicates.

Also, with regards to the first point, your objects have very different numbers of regions and sequences. This should not occur if you generated them directly from contact matrices of the same genome.

ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by Aaron Lun25k

Hi Aaron

Thanks for the reply. Using restriction frag size produced different bin sizes (since sometime i get a cut, sometimes not. plus they are different cell lines). But I now overcome this by using fixed bin size to create the matrix (not restriction frag length). 

I can overcome #3 since I have two replicates for each cell line.

For #2, I am using ICE normalized counts at this moment, although I have RAW counts also for this data. I can also floor the normalized counts to integers, but I don't know what you recommend. I thought the counts for balanced matrices would be better. 

My new GInteraction objects (only pasting half, due to char limit) :

```

$s2_2
GInteractions object with 293659 interactions and 1 metadata column:
           seqnames1              ranges1     seqnames2              ranges2 |     norm.freq
               <Rle>            <IRanges>         <Rle>            <IRanges> |     <numeric>
       [1]     chr2L        [5000, 10000] ---     chr2L       [ 5000, 10000] | 23814.2750234
       [2]     chr2L        [5000, 10000] ---     chr2L       [10000, 15000] | 8456.43596258
       [3]     chr2L        [5000, 10000] ---     chr2L       [25000, 30000] | 1276.57303962
       [4]     chr2L        [5000, 10000] ---     chr2L       [60000, 65000] | 858.910217701
       [5]     chr2L        [5000, 10000] ---     chr2L       [65000, 70000] | 991.555775811
       ...       ...                  ... ...       ...                  ... .           ...
  [293655]      chrX [22410000, 22415000] ---      chrX [22415000, 22420000] | 6260.66759762
  [293656]      chrX [22410000, 22415000] ---      chrX [22420000, 22422827] | 3311.00613075
  [293657]      chrX [22415000, 22420000] ---      chrX [22415000, 22420000] | 65142.0495447
  [293658]      chrX [22415000, 22420000] ---      chrX [22420000, 22422827] | 2274.53704182
  [293659]      chrX [22420000, 22422827] ---      chrX [22420000, 22422827] | 49047.4898882
  -------
  regions: 2250 ranges and 0 metadata columns
  seqinfo: 5 sequences from an unspecified genome; no seqlengths

$c8_1
GInteractions object with 17320314 interactions and 1 metadata column:
             seqnames1              ranges1     seqnames2              ranges2 |     norm.freq
                 <Rle>            <IRanges>         <Rle>            <IRanges> |     <numeric>
         [1]     chr2L        [5000, 10000] ---     chr2L       [ 5000, 10000] | 84.5360595733
         [2]     chr2L        [5000, 10000] ---     chr2L       [15000, 20000] | 177.442529967
         [3]     chr2L        [5000, 10000] ---     chr2L       [20000, 25000] | 62.7259423407
         [4]     chr2L        [5000, 10000] ---     chr2L       [25000, 30000] | 54.1675658477
         [5]     chr2L        [5000, 10000] ---     chr2L       [30000, 35000] | 35.1780907057
         ...       ...                  ... ...       ...                  ... .           ...
  [17320310]      chrX [22405000, 22410000] ---      chrX [22420000, 22422827] | 42.9441380746
  [17320311]      chrX [22410000, 22415000] ---      chrX [22410000, 22415000] | 228.990371705
  [17320312]      chrX [22410000, 22415000] ---      chrX [22420000, 22422827] | 89.3560095486
  [17320313]      chrX [22415000, 22420000] ---      chrX [22415000, 22420000] | 140.273595774
  [17320314]      chrX [22420000, 22422827] ---      chrX [22420000, 22422827] | 280.963274035
  -------
  regions: 23502 ranges and 0 metadata columns
  seqinfo: 5 sequences from an unspecified genome; no seqlengths

```

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by Vivek.b100
1

Okay, let's say we have two GInteractions objects. To "merge" them, I would first create a common reference:

combined <- unique(c(gi1, gi2))

I would then standardize the regions in all the individual objects to this reference:

replaceRegions(gi1) <- regions(combined)
replaceRegions(gi2) <- regions(combined)

I would match the entries of the individual objects to the reference object:

m1 <- match(gi1, combined)
m2 <- match(gi2, combined)

Then use this to generate my count matrix (assuming unobserved interactions have counts of zero):

counts <- matrix(0, ncol=2, nrow=length(combined))
counts[m1,1] <- gi1$counts
counts[m2,2] <- gi2$counts

From this point, it is simple to create an InteractionSet object:

iset <- InteractionSet(counts, combined)
ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by Aaron Lun25k

Thanks Aaron. This works..

ADD REPLYlink written 3.2 years ago by Vivek.b100

Why do you have different fragment sizes between samples? If they came from the same genome with the same restriction enzyme, then the coordinates of each bin should be the same between samples.

P.S. Use the raw counts. Giving normalized values to edgeR is, in general, a Bad Thing.

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by Aaron Lun25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 401 users visited in the last hour