Question: diffHic ICE normalization PCA with NAs
0
7 months ago by
inzirio10
inzirio10 wrote:

Hi,

I'm working with HI-C data and trying to compute the PCA loadings starting from the normalized contact matrix computed by correctedContact of the diffHic package, but I'm not able to do that because of the NAs produced during the normalization.

Is there any way to avoid the NAs or to ignore them during the PCA computing? I saw the na.omit option with prcomp but this cuts too mutch rows of my contact matrix.

Any help is appreciated.

Thanks! Dario

normalization pca diffhic hi-c nas • 157 views
modified 7 months ago by Aaron Lun25k • written 7 months ago by inzirio10
Answer: diffHic ICE normalization PCA with NAs
0
7 months ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

I'll level with you here. I barely remember writing this function, and what memories I do have are filled with suffering and despair. I think I wrote it because I thought the iterative correction approach might be useful for differential analyses, but it never was, and now I'm stuck with maintaining something that I never use.

Anyway, to try to answer your immediate question: if you want to get rid of the NAs, you could try turning off the filtering by setting ignore.low=0. Also possibly turn off the winsorizing by setting winsor.high=0. If that doesn't work... I don't really know where else the NAs could be coming from.

A comment on your analysis: what are you trying to do? If you're doing a PCA on the samples, it would be much better to do between-sample normalization (e.g., csaw::normOffsets). There's no need to expose yourself to the difficulties of trying to remove biases in coverage, distance between anchors, etc. These cancel out (or are greatly reduced) when you're comparing between samples for the same interaction.

If you're doing a PCA on the contact matrix itself... I suppose you're looking for A/B compartments? The more relevant normalization would be to get rid of the bias with respect to the distance between anchors (i.e., the distance from the diagonal of the contact matrix), which would improve your signal for the relevant long-range intra-compartment contacts. ICE doesn't really help in that regard, hence the dist.correct option that I added when I was still enthusiastic about this kind of analysis.

Besides, I'm not convinced that intra-chromosomal interactions exhibit biases that can be expressed as products of the biases of the interacting regions. (You can do a little thought experiment where you double a chromosome's copy number; do the intra-chromosomal contacts increase by 2? Or by 4?) I could believe factorizability for inter-chromosomal contacts, but that's because it's mostly random noise anyway.

Hi Aaron,

Yes I'm doing the PCA for the A/B compartments...