diffHic ICE normalization PCA with NAs
Entering edit mode
inzirio ▴ 10
Last seen 11 weeks ago


I'm working with HI-C data and trying to compute the PCA loadings starting from the normalized contact matrix computed by correctedContact of the diffHic package, but I'm not able to do that because of the NAs produced during the normalization.

Is there any way to avoid the NAs or to ignore them during the PCA computing? I saw the na.omit option with prcomp but this cuts too mutch rows of my contact matrix.

Any help is appreciated.

Thanks! Dario

diffHic normalization pca NAs HI-C • 412 views
Entering edit mode
Aaron Lun ★ 27k
Last seen 4 hours ago
The city by the bay

I'll level with you here. I barely remember writing this function, and what memories I do have are filled with suffering and despair. I think I wrote it because I thought the iterative correction approach might be useful for differential analyses, but it never was, and now I'm stuck with maintaining something that I never use.

Anyway, to try to answer your immediate question: if you want to get rid of the NAs, you could try turning off the filtering by setting ignore.low=0. Also possibly turn off the winsorizing by setting winsor.high=0. If that doesn't work... I don't really know where else the NAs could be coming from.

A comment on your analysis: what are you trying to do? If you're doing a PCA on the samples, it would be much better to do between-sample normalization (e.g., csaw::normOffsets). There's no need to expose yourself to the difficulties of trying to remove biases in coverage, distance between anchors, etc. These cancel out (or are greatly reduced) when you're comparing between samples for the same interaction.

If you're doing a PCA on the contact matrix itself... I suppose you're looking for A/B compartments? The more relevant normalization would be to get rid of the bias with respect to the distance between anchors (i.e., the distance from the diagonal of the contact matrix), which would improve your signal for the relevant long-range intra-compartment contacts. ICE doesn't really help in that regard, hence the dist.correct option that I added when I was still enthusiastic about this kind of analysis.

Besides, I'm not convinced that intra-chromosomal interactions exhibit biases that can be expressed as products of the biases of the interacting regions. (You can do a little thought experiment where you double a chromosome's copy number; do the intra-chromosomal contacts increase by 2? Or by 4?) I could believe factorizability for inter-chromosomal contacts, but that's because it's mostly random noise anyway.

Entering edit mode

Hi Aaron,

thanks for your reply!!

Yes I'm doing the PCA for the A/B compartments...

I'll do some more tests based on your comments!

Thanks again, dario


Login before adding your answer.

Traffic: 223 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6