Search
Question: normLGF yields non-symetric matrices
0
gravatar for enrique.vidal
7 months ago by
enrique.vidal10 wrote:

Hi!

HiCNorm implementation provided in normLGF differs from original one in

https://github.com/Bioconductor-mirror/HiTC/blob/master/R/normalize_hiC.R#L405

while the original code is

len_m<-(len_m-mean(c(len_m)))/sd(c(len_m))
gcc_m<-(gcc_m-mean(c(gcc_m)))/sd(c(gcc_m))

This leads to non-symmetrical matrices.

Any ideas why? Is it desirable to break symmetry?

ADD COMMENTlink modified 7 months ago by Nicolas Servant220 • written 7 months ago by enrique.vidal10
1
gravatar for Nicolas Servant
7 months ago by
France
Nicolas Servant220 wrote:

Hi,

Not sure to see why this lead to non-symmetrical matrices ?

But indeed, this is not expected. Normalized matrices should also be symmetric.

Best

ADD COMMENTlink written 7 months ago by Nicolas Servant220
1
gravatar for Nicolas Servant
7 months ago by
France
Nicolas Servant220 wrote:

Indeed ! I also checked in the original code from Hu et al. 2012, and this is the same.

http://www.people.fas.harvard.edu/~junliu/HiCNorm/

So I agree that the matrix is no longer symmetric but the normalized values in i,j and j,i remains very close .. so I do not think that this is a real issue.

Would you have any idea to change that ?

Otherwise, you can transform it in a symmetric matrix using ;

> forceSymmetric(hiC_LGF)

And I will try to contact the authors of the method.

Best

 

ADD COMMENTlink written 7 months ago by Nicolas Servant220
1
gravatar for enrique.vidal
7 months ago by
enrique.vidal10 wrote:

I've checked the original scripts following the link you provided and it seems they are scaling by the overall sd, no the column-specific sd.

I guess changing the lines

    len_m<-(len_m-mean(len_m, na.rm=TRUE))/apply(len_m, 2, sd, na.rm=TRUE)
    gcc_m<-(gcc_m-mean(gcc_m, na.rm=TRUE))/apply(gcc_m, 2, sd, na.rm=TRUE)

by

    len_m<-(len_m-mean(len_m, na.rm=TRUE))/sd(len_m, na.rm=TRUE)
    gcc_m<-(gcc_m-mean(gcc_m, na.rm=TRUE))/sd(gcc_m, na.rm=TRUE)


in the normLGF definition would do the trick (which I've already done in my local version of the package).

I agree with you that the differences at the "cell" level (x_{i,j}) could be minor. However, I don't know what is the advantage of scaling by the column-specific sd instead of the overall sd.

In any case, thanks for your quick responses.

:)

 

 

ADD COMMENTlink written 7 months ago by enrique.vidal10
0
gravatar for enrique.vidal
7 months ago by
enrique.vidal10 wrote:

I guess if you divide each column by a different number, then the matrix no longer is symmetric.

a <- matrix(rnorm(100), 10)
a <- (a + t(a)) / 2

check_sim <- function(x) identical(x, (x + t(x)) / 2)

check_sim(a)

b <- (a - mean(a))/ sd(a)
check_sim(b)

bb <- (a - mean(a))/apply(a, 2, sd)
check_sim(bb)

 

ADD COMMENTlink written 7 months ago by enrique.vidal10
0
gravatar for Nicolas Servant
7 months ago by
France
Nicolas Servant220 wrote:

thank you very much for your suggestion !

I will try to update the package for next release.

Best

ADD COMMENTlink written 7 months ago by Nicolas Servant220
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 153 users visited in the last hour