diffHiC statistical test to check differences between conditions
1
0
Entering edit mode
lazaris • 0
@lazaris-10999
Last seen 6.6 years ago

Dear Aaron,

your diffHiC package looks ideal for my case as I want to detect the

chromatin interaction differences between two conditions and I have

more than one technical replicates for each condition. However, I have

already the corrected matrices (with ICE and other methods) and I would

essentially like to use only the part of your pipeline that sets the contrasts

and performs the statistical tests. Would it be possible to have your

help on this?

Harris

diffHiC statistical test • 646 views
0
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 10 hours ago
The city by the bay

The "part of your pipeline that sets the contrasts and performs the statistical tests" is basically just edgeR. Most of the work that diffHic does is to process the data to get the read pair counts for each bin pair in each sample. So, if you've already got a matrix of counts (where each bin pair is a row, and each column is a sample - see below), you can just go straight to testing for differences between conditions with edgeR.

However, from your original post, it seems that you have normalized intensities rather than counts. If you want to use edgeR (or DESeq2, or limma + voom), you need the raw counts. Information about normalization is introduced via GLM offsets, rather than being applied directly to the counts. To this end, you could use an approach similar to the one described in Section 5.3 of the user's guide, where the log-transformed correction factor (required to get from the count to the normalized intensity in the first place) is used as the offset.

With all that said, though, is ICE really necessary for detecting differences? Remember that you're comparing the same bin pair between samples, such that the genomic biases that are removed by ICE would largely cancel out anyway. More important for a differential analysis are the biases between samples, e.g., due to variable cross-linking/ligation efficiency. I would suggest working from the raw counts and using the normOffsets function to compute offsets that eliminate library-specific biases. Alternatively, use normalizeCNV if your genomic biases are different between samples (e.g., due to copy number variations).

Finally, I guess that, even if you did have counts, you would have a 2D interaction count matrix from each sample (where each row and column represents a genomic region) rather than the matrix of counts per bin pair per sample that edgeR expects. To get from one to another, you can easily convert each of your 2D count matrices to ContactMatrix objects, and then do something like the below to obtain a single matrix for all bin pairs/samples. Assuming you have two libraries (represented by cm1 and cm2):

# Select interesting interactions; in this case, non-zero in at least one library
to.keep <- as.matrix(cm1)!=0 | as.matrix(cm2)!=0
iset1 <- deflate(cm1, extract=to.keep)
iset2 <- deflate(cm2, extract=to.keep)
data <- cbind(iset1, iset2)
interactions(data) <- as(interactions(data), "ReverseStrictGInteractions")

The data object can then be run through normOffsets and asDGEList - read the diffHic user's guide for more details.