diffHic ICE normalization PCA with NAs
1
0
Entering edit mode
inzirio ▴ 10
@inzirio-13571
Last seen 13 months ago
Italy

Hi,

I'm working with HI-C data and trying to compute the PCA loadings starting from the normalized contact matrix computed by correctedContact of the diffHic package, but I'm not able to do that because of the NAs produced during the normalization.

Is there any way to avoid the NAs or to ignore them during the PCA computing? I saw the na.omit option with prcomp but this cuts too mutch rows of my contact matrix.

Any help is appreciated.

Thanks! Dario

diffHic normalization pca NAs HI-C • 1.2k views
ADD COMMENT
0
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 26 minutes ago
The city by the bay

I'll level with you here. I barely remember writing this function, and what memories I do have are filled with suffering and despair. I think I wrote it because I thought the iterative correction approach might be useful for differential analyses, but it never was, and now I'm stuck with maintaining something that I never use.

Anyway, to try to answer your immediate question: if you want to get rid of the NAs, you could try turning off the filtering by setting ignore.low=0. Also possibly turn off the winsorizing by setting winsor.high=0. If that doesn't work... I don't really know where else the NAs could be coming from.

A comment on your analysis: what are you trying to do? If you're doing a PCA on the samples, it would be much better to do between-sample normalization (e.g., csaw::normOffsets). There's no need to expose yourself to the difficulties of trying to remove biases in coverage, distance between anchors, etc. These cancel out (or are greatly reduced) when you're comparing between samples for the same interaction.

If you're doing a PCA on the contact matrix itself... I suppose you're looking for A/B compartments? The more relevant normalization would be to get rid of the bias with respect to the distance between anchors (i.e., the distance from the diagonal of the contact matrix), which would improve your signal for the relevant long-range intra-compartment contacts. ICE doesn't really help in that regard, hence the dist.correct option that I added when I was still enthusiastic about this kind of analysis.

Besides, I'm not convinced that intra-chromosomal interactions exhibit biases that can be expressed as products of the biases of the interacting regions. (You can do a little thought experiment where you double a chromosome's copy number; do the intra-chromosomal contacts increase by 2? Or by 4?) I could believe factorizability for inter-chromosomal contacts, but that's because it's mostly random noise anyway.

ADD COMMENT
0
Entering edit mode

Hi Aaron,

thanks for your reply!!

Yes I'm doing the PCA for the A/B compartments...

I'll do some more tests based on your comments!

Thanks again, dario

ADD REPLY

Login before adding your answer.

Traffic: 435 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6