RE: venn diagram

0

Entering edit mode

Patrick Cahan ▴ 20

@patrick-cahan-702

Last seen 10.6 years ago

Any ideas on how to calculate the significance or rather the probability of getting a given similarity score by chance? /pc Patrick Cahan 202.994.8922 pcahan1@gwu.edu > You can then create a distance matrix from this by calculating all > pariwise combinations of the length normalized cosine between the > vectors: > > a <- c(1,1,0,0,1,0) > > b <- c(0,1,1,0,1,1) > > x <- a%*%b / (length(a) * length(b)) > > x > [,1] > [1,] 0.05555556 > > x is a measure for the similarity between vectors a and b. This is > used is a standard procedure in text/document comparison. Since > one want s to create a distance matrix one still needs to somehow > "invert" this matrix so that high similqrity gets small values! > > Once you've your matrix M of cosines (this is a symmetric matrix > m). You convert this via as.dist(M), and pass it to the hclust > routine. > I'd be interested in the outcome (does it make sense?) - if you're > interested. You should only try it if you've got *many* sets to > test, so that a real Venn approach gets too complex. > > good luck and let me know how it goes, > +regards, > > Arne > > -- > Arne Muller, Ph.D. > Toxicogenomics, Aventis Pharma > arne dot muller domain=aventis com

convert convert • 957 views

ADD COMMENT • link updated 20.9 years ago by Arne.Muller@aventis.com ▴ 620 • written 20.9 years ago by Patrick Cahan ▴ 20

0

Entering edit mode

Arne.Muller@aventis.com ▴ 620

@arnemulleraventiscom-466

Last seen 10.6 years ago

Hi, the only way I can think of is to generate paires of random sets of the same size as the real set pairs and run the vector comparison (as below), do this 10,000 times or so. Then estimate the parameters of the distribution (maybe it's even normal distributed). I'd sample directly from the entire population of gene on the chip. regards, Arne -- Arne Muller, Ph.D. Toxicogenomics, Aventis Pharma arne dot muller domain=aventis com > -----Original Message----- > From: bioconductor-bounces@stat.math.ethz.ch > [mailto:bioconductor-bounces@stat.math.ethz.ch]On Behalf Of Patrick > Cahan > Sent: 28 April 2004 15:02 > To: bioconductor@stat.math.ethz.ch > Subject: [BioC] RE: venn diagram > > > Any ideas on how to calculate the significance or rather the > probability of getting a given similarity score by chance? > > /pc > > Patrick Cahan > 202.994.8922 > pcahan1@gwu.edu > > > You can then create a distance matrix from this by calculating all > > pariwise combinations of the length normalized cosine between the > > vectors: > > > a <- c(1,1,0,0,1,0) > > > b <- c(0,1,1,0,1,1) > > > x <- a%*%b / (length(a) * length(b)) > > > x > > [,1] > > [1,] 0.05555556 > > > > x is a measure for the similarity between vectors a and b. This is > > used is a standard procedure in text/document comparison. Since > > one want s to create a distance matrix one still needs to somehow > > "invert" this matrix so that high similqrity gets small values! > > > > Once you've your matrix M of cosines (this is a symmetric matrix > > m). You convert this via as.dist(M), and pass it to the hclust > > routine. > > I'd be interested in the outcome (does it make sense?) - if you're > > interested. You should only try it if you've got *many* sets to > > test, so that a real Venn approach gets too complex. > > > > good luck and let me know how it goes, > > +regards, > > > > Arne > > > > -- > > Arne Muller, Ph.D. > > Toxicogenomics, Aventis Pharma > > arne dot muller domain=aventis com > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >

ADD COMMENT • link 20.9 years ago Arne.Muller@aventis.com ▴ 620

Login before adding your answer.