Entering edit mode
Patrick Cahan
▴
20
@patrick-cahan-702
Last seen 10.3 years ago
Any ideas on how to calculate the significance or rather the
probability of getting a given similarity score by chance?
/pc
Patrick Cahan
202.994.8922
pcahan1@gwu.edu
> You can then create a distance matrix from this by calculating all
> pariwise combinations of the length normalized cosine between the
> vectors:
> > a <- c(1,1,0,0,1,0)
> > b <- c(0,1,1,0,1,1)
> > x <- a%*%b / (length(a) * length(b))
> > x
> [,1]
> [1,] 0.05555556
>
> x is a measure for the similarity between vectors a and b. This is
> used is a standard procedure in text/document comparison. Since
> one want s to create a distance matrix one still needs to somehow
> "invert" this matrix so that high similqrity gets small values!
>
> Once you've your matrix M of cosines (this is a symmetric matrix
> m). You convert this via as.dist(M), and pass it to the hclust
> routine.
> I'd be interested in the outcome (does it make sense?) - if you're
> interested. You should only try it if you've got *many* sets to
> test, so that a real Venn approach gets too complex.
>
> good luck and let me know how it goes,
> +regards,
>
> Arne
>
> --
> Arne Muller, Ph.D.
> Toxicogenomics, Aventis Pharma
> arne dot muller domain=aventis com