Inflating interactionSets into a contact matrix
1
1
Entering edit mode
hwick1 ▴ 20
@hwick1-12720
Last seen 7.0 years ago

I have a list of interactionSets of restriction fragment interactions from a diffHiC experiment, which I extracted using extractPatch. I would like to inflate these into contact matrices, where the row and column names are the restriction fragment coordinates, and the matrix values are the number of interactions (counts) between given pair of restriction fragments. It seems that Inflate() is the function to do this, but it's unclear to me how to even set the counts of interactions as the fill data. It would also be nice if I could set the row/column names to reflect the interacting fragment coordinates.

Thanks!


> str(test_list[[1]])
Formal class 'InteractionSet' [package "InteractionSet"] with 6 slots
  ..@ interactions   :Formal class 'ReverseStrictGInteractions' [package "InteractionSet"] with 6 slots
  .. .. ..@ anchor1        : int [1:6] 489974 489974 489975 489978 489982 489982
  .. .. ..@ anchor2        : int [1:6] 489972 489974 489975 489964 489964 489975
  .. .. ..@ regions        :Formal class 'GRanges' [package "GenomicRanges"] with 6 slots
...

> interactions(test_list[[1]])
ReverseStrictGInteractions object with 6 interactions and 0 metadata columns:
      seqnames1              ranges1     seqnames2              ranges2
          <Rle>            <IRanges>         <Rle>            <IRanges>
  [1]     chr10 [20477678, 20481028] ---     chr10 [20472898, 20476524]
  [2]     chr10 [20477678, 20481028] ---     chr10 [20477678, 20481028]
  [3]     chr10 [20481025, 20483029] ---     chr10 [20481025, 20483029]
  [4]     chr10 [20488479, 20489489] ---     chr10 [20455104, 20459519]
  [5]     chr10 [20496729, 20499802] ---     chr10 [20455104, 20459519]
  [6]     chr10 [20496729, 20499802] ---     chr10 [20481025, 20483029]
  -------
  regions: 846225 ranges and 1 metadata column
  seqinfo: 93 sequences from an unspecified genome
> assay(test_list[[1]])
     [,1]
[1,]    1
[2,]    3
[3,]    3
[4,]    1
[5,]    1
[6,]    1

test_mat<-inflate(test_list[[1]], 1:length(test_list[[1]]), 1:length(test_list[[1]]),fill=assay(test_list[[1]]))
str(test_mat)
> as.matrix(test_mat)

6 x 6 Matrix of class "dsyMatrix"

     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]   NA   NA   NA   NA   NA   NA
[2,]   NA   NA   NA   NA   NA   NA
[3,]   NA   NA   NA   NA   NA   NA
[4,]   NA   NA   NA   NA   NA   NA
[5,]   NA   NA   NA   NA   NA   NA
[6,]   NA   NA   NA   NA   NA   NA

> str(test_mat)
Formal class 'ContactMatrix' [package "InteractionSet"] with 5 slots
  ..@ matrix  :Formal class 'dsyMatrix' [package "Matrix"] with 5 slots
  .. .. ..@ x       : num [1:36] NA NA NA NA NA NA NA NA NA NA ...
  .. .. ..@ Dim     : int [1:2] 6 6
  .. .. ..@ Dimnames:List of 2
  .. .. .. ..$ : NULL
  .. .. .. ..$ : NULL
  .. .. ..@ uplo    : chr "U"
  .. .. ..@ factors : list()
  ..@ anchor1 : int [1:6] 1 2 3 4 5 6
  ..@ anchor2 : int [1:6] 1 2 3 4 5 6
  ..@ regions :Formal class 'GRanges' [package "GenomicRanges"] with 6 slots
  .. .. ..@ seqnames       :Formal class 'Rle' [package "S4Vectors"] with 4 slots
  .. .. .. .. ..@ values         : Factor w/ 93 levels "chr1","chr2",..: 1 2 3 4 5 6 7 8 9 10 ...
  .. .. .. .. ..@ lengths        : int [1:93] 64395 71483 59636 58925 55089 51311 45517 43187 34643 38016 ...
  .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. ..@ metadata       : list()
  .. .. ..@ ranges         :Formal class 'IRanges' [package "IRanges"] with 6 slots
  .. .. .. .. ..@ start          : int [1:846225] 1 16008 24572 27982 30430 32154 32775 37753 38370 38792 ...
  .. .. .. .. ..@ width          : int [1:846225] 16011 8568 3414 2452 1728 625 4982 621 426 468 ...
  .. .. .. .. ..@ NAMES          : NULL
  .. .. .. .. ..@ elementType    : chr "integer"
  .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. ..@ metadata       : list()
  .. .. ..@ strand         :Formal class 'Rle' [package "S4Vectors"] with 4 slots
  .. .. .. .. ..@ values         : Factor w/ 3 levels "+","-","*": 3
  .. .. .. .. ..@ lengths        : int 846225
  .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. ..@ metadata       : list()
  .. .. ..@ elementMetadata:Formal class 'DataFrame' [package "S4Vectors"] with 6 slots
  .. .. .. .. ..@ rownames       : NULL
  .. .. .. .. ..@ nrows          : int 846225
  .. .. .. .. ..@ listData       :List of 1
  .. .. .. .. .. ..$ nfrags: int [1:846225] 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. .. ..@ elementType    : chr "ANY"
  .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. ..@ metadata       : list()
  .. .. ..@ seqinfo        :Formal class 'Seqinfo' [package "GenomeInfoDb"] with 4 slots
  .. .. .. .. ..@ seqnames   : chr [1:93] "chr1" "chr2" "chr3" "chr4" ...
  .. .. .. .. ..@ seqlengths : int [1:93] 249250621 243199373 198022430 191154276 180915260 171115067 159138663 146364022 141213431 135534747 ...
  .. .. .. .. ..@ is_circular: logi [1:93] NA NA NA NA NA NA ...
  .. .. .. .. ..@ genome     : chr [1:93] NA NA NA NA ...
  .. .. ..@ metadata       : list()
  ..@ metadata: list()

 




 

diffhic hic interactionset contactmatrix inflate • 1.4k views
ADD COMMENT
1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 1 hour ago
The city by the bay

Consider an InteractionSet object x. When you call inflate(x, ...), the rows and columns arguments refer to the elements of regions(x) you want as your matrix rows and columns, not to elements of x. In the code you've given above, you are constructing a ContactMatrix object from the 1:length(test_list[[1]]) entries in regions(x) (presumably the first 6 regions on chromosome 1). These have no recorded interactions in x, so they show up as NA in the output matrix.

The effect of rows and columns is described in quite some depth in ?inflate, so I would suggest a more careful reading of the documentation rather than just blindly applying the function. In this case, I assume you just want the part of the interaction space corresponding to the particular region on chromosome 10 that you seem to be interested in. There's a number of ways to do this, but the simplest would be to do something like:

bounds <- boundingBox(test_list[[1]])
cm <- inflate(test_list[[1]], first(bounds), second(bounds))

By default, the first column in the first assay is used to fill the matrix, as noted in ?inflate. It is also trivial to set the row and column names as you've requested by taking advantage of the as.character coercion for GenomicRanges objects.

rownames(x) <- as.character(anchors(x, type="row"))
colnames(x) <- as.character(anchors(x, type="column"))
ADD COMMENT

Login before adding your answer.

Traffic: 591 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6