Question

normLGF yields non-symetric matrices

0

Entering edit mode

enrique.vidal ▴ 10

@enriquevidal-12787

Last seen 6.9 years ago

Hi!

HiCNorm implementation provided in normLGF differs from original one in

https://github.com/Bioconductor-mirror/HiTC/blob/master/R/normalize_hiC.R#L405

while the original code is

len_m<-(len_m-mean(c(len_m)))/sd(c(len_m))
gcc_m<-(gcc_m-mean(c(gcc_m)))/sd(c(gcc_m))

This leads to non-symmetrical matrices.

Any ideas why? Is it desirable to break symmetry?

HiTC • 1.4k views

ADD COMMENT • link updated 7.0 years ago by Nicolas Servant ▴ 260 • written 7.0 years ago by enrique.vidal ▴ 10

score 1 · Answer 1 · 2017-04-07

1

Entering edit mode

Nicolas Servant ▴ 260

@nicolas-servant-1466

Last seen 22 months ago

France

Hi,

Not sure to see why this lead to non-symmetrical matrices ?

But indeed, this is not expected. Normalized matrices should also be symmetric.

Best

ADD COMMENT • link 7.0 years ago Nicolas Servant ▴ 260

score 1 · Answer 2 · 2017-04-07

Indeed ! I also checked in the original code from Hu et al. 2012, and this is the same.

http://www.people.fas.harvard.edu/~junliu/HiCNorm/

So I agree that the matrix is no longer symmetric but the normalized values in i,j and j,i remains very close .. so I do not think that this is a real issue.

Would you have any idea to change that ?

Otherwise, you can transform it in a symmetric matrix using ;

> forceSymmetric(hiC_LGF)

And I will try to contact the authors of the method.

Best

score 1 · Answer 3 · 2017-04-07

I've checked the original scripts following the link you provided and it seems they are scaling by the overall sd, no the column-specific sd.

I guess changing the lines

    len_m<-(len_m-mean(len_m, na.rm=TRUE))/apply(len_m, 2, sd, na.rm=TRUE)
    gcc_m<-(gcc_m-mean(gcc_m, na.rm=TRUE))/apply(gcc_m, 2, sd, na.rm=TRUE)

by

    len_m<-(len_m-mean(len_m, na.rm=TRUE))/sd(len_m, na.rm=TRUE)
    gcc_m<-(gcc_m-mean(gcc_m, na.rm=TRUE))/sd(gcc_m, na.rm=TRUE)

in the normLGF definition would do the trick (which I've already done in my local version of the package).

I agree with you that the differences at the "cell" level (x_{i,j}) could be minor. However, I don't know what is the advantage of scaling by the column-specific sd instead of the overall sd.

In any case, thanks for your quick responses.

:)

score 0 · Answer 4 · 2017-04-07

0

Entering edit mode

enrique.vidal ▴ 10

@enriquevidal-12787

Last seen 6.9 years ago

I guess if you divide each column by a different number, then the matrix no longer is symmetric.

a <- matrix(rnorm(100), 10)
a <- (a + t(a)) / 2

check_sim <- function(x) identical(x, (x + t(x)) / 2)

check_sim(a)

b <- (a - mean(a))/ sd(a)
check_sim(b)

bb <- (a - mean(a))/apply(a, 2, sd)
check_sim(bb)

ADD COMMENT • link 7.0 years ago enrique.vidal ▴ 10

score 0 · Answer 5 · 2017-04-07

0

Entering edit mode

Nicolas Servant ▴ 260

@nicolas-servant-1466

Last seen 22 months ago

France

thank you very much for your suggestion !

I will try to update the package for next release.

Best

ADD COMMENT • link 7.0 years ago Nicolas Servant ▴ 260