problem recasting InteractionSet into a DGEList
1
1
Entering edit mode
@marco-blanchette-17000
Last seen 5 months ago
US/Santa Cruz/Dovetail Genomics

Dear community,

We are trying to write wrappers for analysis around the popular cool and mcool data files. For example, I would like to use diffHic to analyze differential interactions between experiments and start directly from the processed interaction pair counts stored in these popular data format. I do not want to re-process the pairs form the alignment files. Following the vignettes for the GenomicInterations and diffHic, I am trying to create an InteractionSet object that would be recast into a DGEList for passing to some of the popular edgeR package functions.

I am attaching a script and a toy example that can be download here (1.8Gb) https://rb.gy/moxn6x. The script uses the cooler CLI found here https://github.com/mirnylab/cooler

The script successfully creates the InteractionSet object but something is not write as it errors when recasting into the DGEList. I can't seem to trace down what is missing...

Any suggestions would be greatly appreciated

library(diffHic)
library(GenomicInteractions)

getGI <- function(mcool.f,res) {
    con <- pipe(paste('cooler ls',mcool.f))
    cool.fs <- readLines(con)
    close(con)

    resolutions <- as.integer(unlist(lapply(strsplit(cool.fs,'/'),function(x) x[length(x)])))

    ## Should add a tryCatch for testing if resolution exists in file
    cool.f <- cool.fs[resolutions == res]
    chrs <- read.table(pipe(paste("cooler dump -t chroms",cool.f)))
    colnames(chrs) <- c('chr','length')

    ## TODO: here we will roll over all chromosomes using mclapply
    chr <- chrs$chr[21]

    cmd <- pipe(paste("cooler dump -t pixels -r",chr,"-r2",chr,"--header --join",cool.f))
    interaction <- read.table(cmd,header=TRUE,fill=TRUE)

    gi <- GenomicInteractions(GRanges(interaction$chrom1,IRanges(interaction$start1,interaction$end1)),
                              GRanges(interaction$chrom2,IRanges(interaction$start2,interaction$end2)),
                              interaction$count)
    return(gi)
}

## This will be a series of mcool files
mcool.fs <- c("toy1.mcool","toy2.mcool")


## Get the GenomicInteractions and there counts from mcool files
gi.counts <- lapply(mcool.fs,getGI,res = 32000)

################################################################################
## Reformating the list of GenomicInteractions into an Interactions sets
################################################################################

## First, get the universe of genomic interactions in the list
gi <- unique(do.call(c,gi.counts))

## Just removing the metadata
mcols(gi) <- c() 

## Get a dataframe of counts for each interaction pairs
counts <- do.call(cbind,lapply(gi.counts,function(x){

    ## Initialize a vector of counts 0 of the lenght of all interaction
    counts <- as.integer(rep(0,length(gi)))

    ## Find the interactions in the common set intersecting with the current GI
    mm <- findMatches(gi,x)

    ## Replace in the vector of counts for the universe, replace the intersection with the current counts
    counts[queryHits(mm)] <- x$counts[subjectHits(mm)]

    ## Return
    return(counts)
}))

## Asign the name of the Column to the sample
colnames(counts) <- mcool.fs

## Get the library size
lib.data <- DataFrame(lib.sizes=colSums(counts))

## Create the interaction set
data <- InteractionSet(counts,
                       gi,
                       colData=lib.data)


################################################################################
## Trying to follow the diffHiC vignette
################################################################################
library(edgeR)

## Casting teh InteractionSet as a DGEList for using countCPM
data.DGEL <- asDGEList(data)

## Throwing an error... Can't find why...
diffHic GenomicInteractions edgeR HiC cooler • 1.9k views
ADD COMMENT
0
Entering edit mode

here is the full error text and the session info

Error:

 > data.DGEL <- asDGEList(data)                                                                                                                                                                                                                                       
 Error in assay(object, i = assay.id, withDimnames = FALSE) :
   'assay(<InteractionSet>, i="character", ...)' invalid subscript 'i'
 'counts' not in names(assays(<InteractionSet>))
 In addition: Warning message:
 In .local(object, ...) :
   library sizes not found in 'totals', setting to NULL

sessionInfo():

> sessionInfo()
 R version 4.0.2 (2020-06-22)
 Platform: x86_64-pc-linux-gnu (64-bit)
 Running under: Ubuntu 18.04.5 LTS

 Matrix products: default
 BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

 locale:
  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8
  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8
  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C
 [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] parallel  stats4    stats     graphics  grDevices utils     datasets
 [8] methods   base

 other attached packages:
  [1] edgeR_3.30.3                limma_3.44.3
  [3] GenomicInteractions_1.22.0  diffHic_1.20.0
  [5] InteractionSet_1.16.0       SummarizedExperiment_1.18.2
  [7] DelayedArray_0.14.1         matrixStats_0.56.0
  [9] Biobase_2.48.0              GenomicRanges_1.40.0
 [11] GenomeInfoDb_1.24.2         IRanges_2.22.2
 [13] S4Vectors_0.26.1            BiocGenerics_0.34.0

 loaded via a namespace (and not attached):
  [1] ProtGenerics_1.20.0      bitops_1.0-6             bit64_4.0.2
  [4] RColorBrewer_1.1-2       progress_1.2.2           httr_1.4.2
  [7] backports_1.1.9          tools_4.0.2              R6_2.4.1
 [10] rpart_4.1-15             lazyeval_0.2.2           Hmisc_4.4-1
 [13] DBI_1.1.0                Gviz_1.32.0              colorspace_1.4-1
 [16] nnet_7.3-14              tidyselect_1.1.0         gridExtra_2.3
 [19] prettyunits_1.1.1        bit_4.0.4                curl_4.3
 [22] compiler_4.0.2           htmlTable_2.0.1          csaw_1.22.1
 [25] rtracklayer_1.48.0       checkmate_2.0.0          scales_1.1.1
 [28] askpass_1.1              rappdirs_0.3.1           stringr_1.4.0
 [31] digest_0.6.25            Rsamtools_2.4.0          foreign_0.8-80
 [34] XVector_0.28.0           dichromat_2.0-0          htmltools_0.5.0
 [37] base64enc_0.1-3          jpeg_0.1-8.1             pkgconfig_2.0.3
 [40] ensembldb_2.12.1         dbplyr_1.4.4             BSgenome_1.56.0
 [43] htmlwidgets_1.5.1        rlang_0.4.7              rstudioapi_0.11
 [46] RSQLite_2.2.0            generics_0.0.2           BiocParallel_1.22.0
 [49] dplyr_1.0.2              VariantAnnotation_1.34.0 RCurl_1.98-1.2
 [52] magrittr_1.5             GenomeInfoDbData_1.2.3   Formula_1.2-3
 [55] Matrix_1.2-18            Rcpp_1.0.5               munsell_0.5.0
 [58] Rhdf5lib_1.10.1          lifecycle_0.2.0          stringi_1.4.6
 [61] zlibbioc_1.34.0          rhdf5_2.32.2             BiocFileCache_1.12.1
 [64] grid_4.0.2               blob_1.2.1               crayon_1.3.4
 [67] lattice_0.20-41          Biostrings_2.56.0        splines_4.0.2
 [70] GenomicFeatures_1.40.1   hms_0.5.3                locfit_1.5-9.4
 [73] knitr_1.29               pillar_1.4.6             igraph_1.2.5
 [76] biomaRt_2.44.1           XML_3.99-0.5             glue_1.4.1
 [79] biovizBase_1.36.0        latticeExtra_0.6-29      data.table_1.13.0
 [82] png_0.1-7                vctrs_0.3.2              gtable_0.3.0
 [85] openssl_1.4.2            purrr_0.3.4              assertthat_0.2.1
 [88] ggplot2_3.3.2            xfun_0.16                AnnotationFilter_1.12.0
 [91] survival_3.2-3           tibble_3.0.3             GenomicAlignments_1.24.0
 [94] AnnotationDbi_1.50.3     memoise_1.1.0            cluster_2.1.0
 [97] Rhtslib_1.20.0           ellipsis_0.3.1
ADD REPLY
0
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 51 minutes ago
The city by the bay

The error and warning messages are quite informative here.

 Error in assay(object, i = assay.id, withDimnames = FALSE) :
   'assay(<InteractionSet>, i="character", ...)' invalid subscript 'i'
 'counts' not in names(assays(<InteractionSet>))

I would say this is pretty clear; the object doesn't have an assay named "counts". You have a variable named counts, but this is not quite the same thing; the variable name doesn't count for anything when you construct the InteractionSet. The solution is pretty simple, just set the name:

assayNames(data) <- "counts"

Or, during InteractionSet construction:

data <- InteractionSet(list(counts=counts),
                       gi,
                       colData=lib.data)

Or if you don't want to rename anything, you could just set assay.id=1 in the asDGEList() call.

The warning comes from the fact that your library sizes are stored in lib.sizes but asDGEList is looking in totals, probably for various historical reasons that I have long forgotten.

 In addition: Warning message:
 In .local(object, ...) :
   library sizes not found in 'totals', setting to NULL

You can also tell asDGEList() to pull the library sizes from lib.sizes with:

asDGEList(data, lib.sizes=data$lib.sizes)
ADD COMMENT
0
Entering edit mode

Hey Aaron, thanks a lot! Is this something worth adding to your package? Ie getting a InteractionSet straight from .mcool, .cool or .hic?

ADD REPLY
0
Entering edit mode

I think that would be an excellent idea for a small package on its own. diffHic has a lot of technical debt that I need to eventually wade through and clean up - if you can make your own light-weight package (with a mind towards submitting it to Bioconductor), it would probably be a win-win for both of us. Who knows, if I can rely on your hypothetical package to make the ISets, I might be able to finally get rid of my BAM file-parsing code in diffHic.

ADD REPLY
0
Entering edit mode

If you don't want to make your own package, this could also fit well in GenomicInteractions - we already have some import functions but not for these newer formats since they didn't exist when we created the package and I've never got round to adding them. (Although, that would not really be 'lightweight', there's definitely a good argument for a standalone package)

ADD REPLY
0
Entering edit mode

Hi Aaron,

The dovetail team and I have a first draft of a package reading .cool and .mcool HDF5 contact matrix. I wrote it using the native HDF5 R package. if you want to take a look at the coolR package, install it from this repos: https://github.com/dovetail-genomics/coolR. You first need to install the packages InteractionSet and rhdf5 from bioconductor (once it's in BioC, the dependencies will be automatically taken care of):

BiocManager::install("InteractionSet")
BiocManager::install("rhdf5")

Then you can install it from GitHub using devtools:

install.packages('devtools')
devtools::install_github("git@github.com:dovetail-genomics/coolR.git")

Then loading data and using it with EdgeR and diffHiC:

library(diffHic)
library(edgeR)
library(coolR)

## You need ~1.25Gb of RAM per thread                                                                                                                                                                     
cores <- detecCores()

## Reading the top level resolution in 2 multi-dimentional contact matrix                                                                                                                                 
data <- cool2IntSet(c("./toy1.mcool","./toy2.mcool"),
                    res=1000,
                    cores=cores)

## Filtering low represted  pairs                                                                                                                                                                         
keep <- aveLogCPM(asDGEList(data)) > 0

data <- data[keep,]

The plan is to submit it to BioConductor but right now the documentation is quite sparse, open it here, we are working on it but I wanted to share it with you in case you think we should add functionalities or modify something.

Let us know,

Best, Marco

ADD REPLY
0
Entering edit mode

Excellent, this looks like a good start - I'll take my comments to the GH repo.

ADD REPLY

Login before adding your answer.

Traffic: 769 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6