RE: venn diagram
1
0
Entering edit mode
@patrick-cahan-702
Last seen 9.6 years ago
Any ideas on how to calculate the significance or rather the probability of getting a given similarity score by chance? /pc Patrick Cahan 202.994.8922 pcahan1@gwu.edu > You can then create a distance matrix from this by calculating all > pariwise combinations of the length normalized cosine between the > vectors: > > a <- c(1,1,0,0,1,0) > > b <- c(0,1,1,0,1,1) > > x <- a%*%b / (length(a) * length(b)) > > x > [,1] > [1,] 0.05555556 > > x is a measure for the similarity between vectors a and b. This is > used is a standard procedure in text/document comparison. Since > one want s to create a distance matrix one still needs to somehow > "invert" this matrix so that high similqrity gets small values! > > Once you've your matrix M of cosines (this is a symmetric matrix > m). You convert this via as.dist(M), and pass it to the hclust > routine. > I'd be interested in the outcome (does it make sense?) - if you're > interested. You should only try it if you've got *many* sets to > test, so that a real Venn approach gets too complex. > > good luck and let me know how it goes, > +regards, > > Arne > > -- > Arne Muller, Ph.D. > Toxicogenomics, Aventis Pharma > arne dot muller domain=aventis com
convert convert • 799 views
ADD COMMENT
0
Entering edit mode
@arnemulleraventiscom-466
Last seen 9.6 years ago
Hi, the only way I can think of is to generate paires of random sets of the same size as the real set pairs and run the vector comparison (as below), do this 10,000 times or so. Then estimate the parameters of the distribution (maybe it's even normal distributed). I'd sample directly from the entire population of gene on the chip. regards, Arne -- Arne Muller, Ph.D. Toxicogenomics, Aventis Pharma arne dot muller domain=aventis com > -----Original Message----- > From: bioconductor-bounces@stat.math.ethz.ch > [mailto:bioconductor-bounces@stat.math.ethz.ch]On Behalf Of Patrick > Cahan > Sent: 28 April 2004 15:02 > To: bioconductor@stat.math.ethz.ch > Subject: [BioC] RE: venn diagram > > > Any ideas on how to calculate the significance or rather the > probability of getting a given similarity score by chance? > > /pc > > Patrick Cahan > 202.994.8922 > pcahan1@gwu.edu > > > You can then create a distance matrix from this by calculating all > > pariwise combinations of the length normalized cosine between the > > vectors: > > > a <- c(1,1,0,0,1,0) > > > b <- c(0,1,1,0,1,1) > > > x <- a%*%b / (length(a) * length(b)) > > > x > > [,1] > > [1,] 0.05555556 > > > > x is a measure for the similarity between vectors a and b. This is > > used is a standard procedure in text/document comparison. Since > > one want s to create a distance matrix one still needs to somehow > > "invert" this matrix so that high similqrity gets small values! > > > > Once you've your matrix M of cosines (this is a symmetric matrix > > m). You convert this via as.dist(M), and pass it to the hclust > > routine. > > I'd be interested in the outcome (does it make sense?) - if you're > > interested. You should only try it if you've got *many* sets to > > test, so that a real Venn approach gets too complex. > > > > good luck and let me know how it goes, > > +regards, > > > > Arne > > > > -- > > Arne Muller, Ph.D. > > Toxicogenomics, Aventis Pharma > > arne dot muller domain=aventis com > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >
ADD COMMENT

Login before adding your answer.

Traffic: 763 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6