Dear community,
We are trying to write wrappers for analysis around the popular cool and mcool data files. For example, I would like to use diffHic to analyze differential interactions between experiments and start directly from the processed interaction pair counts stored in these popular data format. I do not want to re-process the pairs form the alignment files. Following the vignettes for the GenomicInterations and diffHic, I am trying to create an InteractionSet object that would be recast into a DGEList for passing to some of the popular edgeR package functions.
I am attaching a script and a toy example that can be download here (1.8Gb) https://rb.gy/moxn6x. The script uses the cooler CLI found here https://github.com/mirnylab/cooler
The script successfully creates the InteractionSet object but something is not write as it errors when recasting into the DGEList. I can't seem to trace down what is missing...
Any suggestions would be greatly appreciated
library(diffHic)
library(GenomicInteractions)
getGI <- function(mcool.f,res) {
con <- pipe(paste('cooler ls',mcool.f))
cool.fs <- readLines(con)
close(con)
resolutions <- as.integer(unlist(lapply(strsplit(cool.fs,'/'),function(x) x[length(x)])))
## Should add a tryCatch for testing if resolution exists in file
cool.f <- cool.fs[resolutions == res]
chrs <- read.table(pipe(paste("cooler dump -t chroms",cool.f)))
colnames(chrs) <- c('chr','length')
## TODO: here we will roll over all chromosomes using mclapply
chr <- chrs$chr[21]
cmd <- pipe(paste("cooler dump -t pixels -r",chr,"-r2",chr,"--header --join",cool.f))
interaction <- read.table(cmd,header=TRUE,fill=TRUE)
gi <- GenomicInteractions(GRanges(interaction$chrom1,IRanges(interaction$start1,interaction$end1)),
GRanges(interaction$chrom2,IRanges(interaction$start2,interaction$end2)),
interaction$count)
return(gi)
}
## This will be a series of mcool files
mcool.fs <- c("toy1.mcool","toy2.mcool")
## Get the GenomicInteractions and there counts from mcool files
gi.counts <- lapply(mcool.fs,getGI,res = 32000)
################################################################################
## Reformating the list of GenomicInteractions into an Interactions sets
################################################################################
## First, get the universe of genomic interactions in the list
gi <- unique(do.call(c,gi.counts))
## Just removing the metadata
mcols(gi) <- c()
## Get a dataframe of counts for each interaction pairs
counts <- do.call(cbind,lapply(gi.counts,function(x){
## Initialize a vector of counts 0 of the lenght of all interaction
counts <- as.integer(rep(0,length(gi)))
## Find the interactions in the common set intersecting with the current GI
mm <- findMatches(gi,x)
## Replace in the vector of counts for the universe, replace the intersection with the current counts
counts[queryHits(mm)] <- x$counts[subjectHits(mm)]
## Return
return(counts)
}))
## Asign the name of the Column to the sample
colnames(counts) <- mcool.fs
## Get the library size
lib.data <- DataFrame(lib.sizes=colSums(counts))
## Create the interaction set
data <- InteractionSet(counts,
gi,
colData=lib.data)
################################################################################
## Trying to follow the diffHiC vignette
################################################################################
library(edgeR)
## Casting teh InteractionSet as a DGEList for using countCPM
data.DGEL <- asDGEList(data)
## Throwing an error... Can't find why...
here is the full error text and the session info
Error:
sessionInfo():