Search
Question: normLGF yields non-symetric matrices
0
15 months ago by
enrique.vidal10 wrote:

Hi!

HiCNorm implementation provided in normLGF differs from original one in

https://github.com/Bioconductor-mirror/HiTC/blob/master/R/normalize_hiC.R#L405

while the original code is

len_m<-(len_m-mean(c(len_m)))/sd(c(len_m))
gcc_m<-(gcc_m-mean(c(gcc_m)))/sd(c(gcc_m))

This leads to non-symmetrical matrices.

Any ideas why? Is it desirable to break symmetry?

modified 15 months ago by Nicolas Servant230 • written 15 months ago by enrique.vidal10
1
15 months ago by
France
Nicolas Servant230 wrote:

Hi,

Not sure to see why this lead to non-symmetrical matrices ?

But indeed, this is not expected. Normalized matrices should also be symmetric.

Best

1
15 months ago by
France
Nicolas Servant230 wrote:

Indeed ! I also checked in the original code from Hu et al. 2012, and this is the same.

http://www.people.fas.harvard.edu/~junliu/HiCNorm/

So I agree that the matrix is no longer symmetric but the normalized values in i,j and j,i remains very close .. so I do not think that this is a real issue.

Would you have any idea to change that ?

Otherwise, you can transform it in a symmetric matrix using ;

> forceSymmetric(hiC_LGF)

And I will try to contact the authors of the method.

Best

1
15 months ago by
enrique.vidal10 wrote:

I've checked the original scripts following the link you provided and it seems they are scaling by the overall sd, no the column-specific sd.

I guess changing the lines

    len_m<-(len_m-mean(len_m, na.rm=TRUE))/apply(len_m, 2, sd, na.rm=TRUE)
gcc_m<-(gcc_m-mean(gcc_m, na.rm=TRUE))/apply(gcc_m, 2, sd, na.rm=TRUE)

by

    len_m<-(len_m-mean(len_m, na.rm=TRUE))/sd(len_m, na.rm=TRUE)
gcc_m<-(gcc_m-mean(gcc_m, na.rm=TRUE))/sd(gcc_m, na.rm=TRUE)

in the normLGF definition would do the trick (which I've already done in my local version of the package).

I agree with you that the differences at the "cell" level (x_{i,j}) could be minor. However, I don't know what is the advantage of scaling by the column-specific sd instead of the overall sd.

In any case, thanks for your quick responses.

:)

0
15 months ago by
enrique.vidal10 wrote:

I guess if you divide each column by a different number, then the matrix no longer is symmetric.

a <- matrix(rnorm(100), 10)
a <- (a + t(a)) / 2

check_sim <- function(x) identical(x, (x + t(x)) / 2)

check_sim(a)

b <- (a - mean(a))/ sd(a)
check_sim(b)

bb <- (a - mean(a))/apply(a, 2, sd)
check_sim(bb)

0
15 months ago by
France
Nicolas Servant230 wrote:

thank you very much for your suggestion !

I will try to update the package for next release.

Best