understanding ACME
1
0
Entering edit mode
Ramon Diaz ★ 1.1k
@ramon-diaz-159
Last seen 10.2 years ago
hits=-2.6 tests?YES_00 X-USF-Spam-Flag: NO Dear All, I am trying to understand how the ACME package (by Sean Davis) works, but I think there is something I am missing about the way the p-values are computed. It seems when I try to do the chi-square test myself, I always overestimate the p-value. The following is a complete example: silly.dat <- c(2, 1, 5, 3, 6, 4) dummy.data <- new("aGFF", data = matrix(silly.dat, ncol = 1), annotation = data.frame(Chromosome = 1, Location = c(1, 10, 20, 1000, 1200, 1300)), samples = data.frame(SampleID = 1)) ### So we have cbind(silly.dat, Position = c(1, 10, 20, 1000, 1200, 1300)) silly.dat Position [1,] 2 1 [2,] 1 10 [3,] 5 20 [4,] 3 1000 [5,] 6 1200 [6,] 4 1300 do.aGFF.calc(dummy.data, window = 110, thresh = 0.90) ### Cutpoints: ### [1] 5.5 ### Thus, all values except 6 (the fifth value) are below the threshold ## Within the window for fifth value we have: ## only value of 6 do.aGFF.calc(dummy.data, window = 10, thresh = 0.90)@vals[5] ## 0.0877 chisq.test(x = c(0, 1), p = as.vector(table(silly.dat > 5.5)/6)) ## 0.02535 ## value of 6 and 4 do.aGFF.calc(dummy.data, window = 202, thresh = 0.90)@vals[5] ## 0.3458 chisq.test(x = c(1, 1), p = as.vector(table(silly.dat > 5.5)/6)) ## 0.2059 ## values 6, 4, 3 do.aGFF.calc(dummy.data, window = 402, thresh = 0.90)@vals[5] ## 0.571 chisq.test(x = c(2, 1), p = as.vector(table(silly.dat > 5.5)/6)) ## 0.4386 Where am I computing the chisq in the wrong way? Thanks, R. -- Ram?n D?az-Uriarte Statistical Computing Team Centro Nacional de Investigaciones Oncol?gicas (CNIO) (Spanish National Cancer Center) Melchor Fern?ndez Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electr?nico, y ...{{dropped:3}}
Annotation Cancer ACME Annotation Cancer ACME • 1.4k views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 3 months ago
United States
On Feb 4, 2008 1:50 PM, Ramon Diaz-Uriarte <rdiaz at="" cnio.es=""> wrote: > hits=-2.6 tests?YES_00 > X-USF-Spam-Flag: NO > > Dear All, > > I am trying to understand how the ACME package (by Sean Davis) works, but I > think there is something I am missing about the way the p-values are > computed. It seems when I try to do the chi-square test myself, I always > overestimate the p-value. > > > The following is a complete example: > > silly.dat <- c(2, 1, 5, 3, 6, 4) > dummy.data <- new("aGFF", data = matrix(silly.dat, ncol = 1), > annotation = data.frame(Chromosome = 1, > Location = c(1, 10, 20, 1000, 1200, 1300)), > samples = data.frame(SampleID = 1)) > > ### So we have > cbind(silly.dat, Position = c(1, 10, 20, 1000, 1200, 1300)) > silly.dat Position > [1,] 2 1 > [2,] 1 10 > [3,] 5 20 > [4,] 3 1000 > [5,] 6 1200 > [6,] 4 1300 > > > do.aGFF.calc(dummy.data, window = 110, thresh = 0.90) > ### Cutpoints: > ### [1] 5.5 > ### Thus, all values except 6 (the fifth value) are below the threshold > > > ## Within the window for fifth value we have: > ## only value of 6 > do.aGFF.calc(dummy.data, window = 10, thresh = 0.90)@vals[5] ## 0.0877 > chisq.test(x = c(0, 1), p = as.vector(table(silly.dat > 5.5)/6)) ## 0.02535 > > ## value of 6 and 4 > do.aGFF.calc(dummy.data, window = 202, thresh = 0.90)@vals[5] ## 0.3458 > chisq.test(x = c(1, 1), p = as.vector(table(silly.dat > 5.5)/6)) ## 0.2059 > > ## values 6, 4, 3 > do.aGFF.calc(dummy.data, window = 402, thresh = 0.90)@vals[5] ## 0.571 > chisq.test(x = c(2, 1), p = as.vector(table(silly.dat > 5.5)/6)) ## 0.4386 > > > Where am I computing the chisq in the wrong way? Hi, Ramon. ACME is simply calculating the chi-square on a 2x2 table where the cells have the values defined like so: a=number of probes on the array above the threshold b=number of probes total on the array NOT above the threshold c=number of probes in the window above the threshold d=number of probes in the window NOT above the threshold So, building on your example above in which a=1,b=5,c=1, and d varies as below: ## d=0 chisq.test(x=matrix(c(1,5,1,0),nc=2),correct=FALSE) ## 0.08767 ## d=1 chisq.test(x=matrix(c(1,5,1,1),nc=2),correct=FALSE) ## 0.3458 ## d=2 chisq.test(x=matrix(c(1,5,1,2),nc=2),correct=FALSE) ## 0.5708 Does this explanation help? Sean
ADD COMMENT
0
Entering edit mode
Hi Sean, Oh, yes, it does help. I misunderstood because I thought that the chi-square was comparing expected with observed, taking the expected from the proportions of above/below the threshold from the complete data set (i.e., as if doing a chi-square test against a know, theoretical, proportion). Thanks for your explanation. Best, R. On Feb 4, 2008 8:44 PM, Sean Davis <sdavis2 at="" mail.nih.gov=""> wrote: > On Feb 4, 2008 1:50 PM, Ramon Diaz-Uriarte <rdiaz at="" cnio.es=""> wrote: > > hits=-2.6 tests?YES_00 > > X-USF-Spam-Flag: NO > > > > > Dear All, > > > > I am trying to understand how the ACME package (by Sean Davis) works, but I > > think there is something I am missing about the way the p-values are > > computed. It seems when I try to do the chi-square test myself, I always > > overestimate the p-value. > > > > > > The following is a complete example: > > > > silly.dat <- c(2, 1, 5, 3, 6, 4) > > dummy.data <- new("aGFF", data = matrix(silly.dat, ncol = 1), > > annotation = data.frame(Chromosome = 1, > > Location = c(1, 10, 20, 1000, 1200, 1300)), > > samples = data.frame(SampleID = 1)) > > > > ### So we have > > cbind(silly.dat, Position = c(1, 10, 20, 1000, 1200, 1300)) > > silly.dat Position > > [1,] 2 1 > > [2,] 1 10 > > [3,] 5 20 > > [4,] 3 1000 > > [5,] 6 1200 > > [6,] 4 1300 > > > > > > do.aGFF.calc(dummy.data, window = 110, thresh = 0.90) > > ### Cutpoints: > > ### [1] 5.5 > > ### Thus, all values except 6 (the fifth value) are below the threshold > > > > > > ## Within the window for fifth value we have: > > ## only value of 6 > > do.aGFF.calc(dummy.data, window = 10, thresh = 0.90)@vals[5] ## 0.0877 > > chisq.test(x = c(0, 1), p = as.vector(table(silly.dat > 5.5)/6)) ## 0.02535 > > > > ## value of 6 and 4 > > do.aGFF.calc(dummy.data, window = 202, thresh = 0.90)@vals[5] ## 0.3458 > > chisq.test(x = c(1, 1), p = as.vector(table(silly.dat > 5.5)/6)) ## 0.2059 > > > > ## values 6, 4, 3 > > do.aGFF.calc(dummy.data, window = 402, thresh = 0.90)@vals[5] ## 0.571 > > chisq.test(x = c(2, 1), p = as.vector(table(silly.dat > 5.5)/6)) ## 0.4386 > > > > > > Where am I computing the chisq in the wrong way? > > Hi, Ramon. ACME is simply calculating the chi-square on a 2x2 table > where the cells have the values defined like so: > > a=number of probes on the array above the threshold > b=number of probes total on the array NOT above the threshold > c=number of probes in the window above the threshold > d=number of probes in the window NOT above the threshold > > So, building on your example above in which a=1,b=5,c=1, and d varies as below: > > ## d=0 > chisq.test(x=matrix(c(1,5,1,0),nc=2),correct=FALSE) ## 0.08767 > > ## d=1 > chisq.test(x=matrix(c(1,5,1,1),nc=2),correct=FALSE) ## 0.3458 > > ## d=2 > chisq.test(x=matrix(c(1,5,1,2),nc=2),correct=FALSE) ## 0.5708 > > Does this explanation help? > > Sean > -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz
ADD REPLY

Login before adding your answer.

Traffic: 734 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6