Question: multtest problem...
0
14.4 years ago by
hi, i used the mt.teststat function from the package multtest and wonder why the p values that the function calculates differ from those i get using the wilcox.test or t.test function. i tried it with the golub data set (data(golub)) and wilcox.test(x=golub[,golub.cl==0],y=golub[,golub.cl==1]) returns a p-value = 0.8987 while teststat<-mt.teststat(golub,golub.cl,test="wilcoxon") teststat[1] = 1.75419 i thought that the mt.teststat funciton uses simple t tests or wilcox test to calculate p values... perhaps i am wrong with the class labels? from the help i understood, that if i have two groups in my samples i submit as classlabels for examples c(0,0,0,1,1,1) and this means that the first 3 columns of my data correspond to group 0 and the other 3 to group 1. is this correct? thanks, jo
multtest • 824 views
modified 14.4 years ago by kpollard110 • written 14.4 years ago by Dipl.-Ing. Johannes Rainer430
0
14.4 years ago by
Sean Davis21k
United States
Sean Davis21k wrote:
On Apr 29, 2005, at 5:58 AM, Dipl.-Ing. Johannes Rainer wrote: > hi, > > i used the mt.teststat function from the package multtest and wonder > why the p values that the function calculates differ from those i get > using the wilcox.test or t.test function. > i tried it with the golub data set (data(golub)) and > wilcox.test(x=golub[,golub.cl==0],y=golub[,golub.cl==1]) > > returns a p-value = 0.8987 > > while teststat<-mt.teststat(golub,golub.cl,test="wilcoxon") > > teststat[1] = 1.75419 > > i thought that the mt.teststat funciton uses simple t tests or wilcox > test to calculate p values... > Looking at the help for mt.teststat (by typing help(mt.teststat)), you can see part way down: Value: For 'mt.teststat', a vector of test statistics for each row (gene). The test statistic is NOT a p-value. You have to convert it to a p-value using some other function (usually something like pnorm, pt, etc). Finally, you will need to apply mt.rawp2adjp to get corrected p-values. I would suggest reading the mt.teststat vignette for more information and worked examples, including how to calculate p-values from mt.teststat output. Hope this helps, Sean
:) thanks for the help Quoting Sean Davis <sdavis2@mail.nih.gov>: > > On Apr 29, 2005, at 5:58 AM, Dipl.-Ing. Johannes Rainer wrote: > >> hi, >> >> i used the mt.teststat function from the package multtest and wonder >> why the p values that the function calculates differ from those i >> get using the wilcox.test or t.test function. >> i tried it with the golub data set (data(golub)) and >> wilcox.test(x=golub[,golub.cl==0],y=golub[,golub.cl==1]) >> >> returns a p-value = 0.8987 >> >> while teststat<-mt.teststat(golub,golub.cl,test="wilcoxon") >> >> teststat[1] = 1.75419 >> >> i thought that the mt.teststat funciton uses simple t tests or >> wilcox test to calculate p values... >> > > Looking at the help for mt.teststat (by typing help(mt.teststat)), > you can see part way down: > > Value: > > For 'mt.teststat', a vector of test statistics for each row > (gene). > > The test statistic is NOT a p-value. You have to convert it to a > p-value using some other function (usually something like pnorm, pt, > etc). Finally, you will need to apply mt.rawp2adjp to get corrected > p-values. I would suggest reading the mt.teststat vignette for more > information and worked examples, including how to calculate p-values > from mt.teststat output. > > Hope this helps, > Sean > >
0
14.4 years ago by
EMBL European Molecular Biology Laboratory
Wolfgang Huber13k wrote:
Hi Jo, your mail is not very clear about the distinction between test statistics and p-value, and perhaps this is where your confusion comes from? There is no reason why the two numbers you state in your mail should be equal, they are different things. Have a look as well at the function rowttests in the develop version of "genefilter", for a fast and userfriendly approach to doing t-tests for each row of an exprSet. Best Wolfgang Dipl.-Ing. Johannes Rainer wrote: > hi, > > i used the mt.teststat function from the package multtest and wonder why > the p values that the function calculates differ from those i get using > the wilcox.test or t.test function. > i tried it with the golub data set (data(golub)) and > wilcox.test(x=golub[,golub.cl==0],y=golub[,golub.cl==1]) > > returns a p-value = 0.8987 > > while teststat<-mt.teststat(golub,golub.cl,test="wilcoxon") > > teststat[1] = 1.75419 > > i thought that the mt.teststat funciton uses simple t tests or wilcox > test to calculate p values... > perhaps i am wrong with the class labels? > > from the help i understood, that if i have two groups in my samples i > submit as classlabels for examples c(0,0,0,1,1,1) and this means that > the first 3 columns of my data correspond to group 0 and the other 3 to > group 1. is this correct? > > thanks, jo > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor -- Best regards Wolfgang ------------------------------------- Wolfgang Huber European Bioinformatics Institute European Molecular Biology Laboratory Cambridge CB10 1SD England Phone: +44 1223 494642 Fax: +44 1223 494486 Http: www.ebi.ac.uk/huber
Hi. I'm having problem with R and gcrma, i.e. using gcrma (12 arrays, hgu133a) R keep exiting with segmentation fault. Any hint? Info: Debian platform i386-pc-linux-gnu arch i386 os linux-gnu system i386, linux-gnu status major 2 minor 1.0 year 2005 month 04 day 18 language R Bioconductor devel
Dear bioconductor mailing list, I have some pairs of genes and I would like to measure their "distances" on the GO molecular function graph. So I wrote this lines of code: distances<-numeric(100) for (i in 1:100){ goanal<-simLL(as.character(hmapr$V1[i]),as.character(hmapr$V2[i]),"MF ") print(goanal$sim) distances[i]<-goanal$sim rm(goanal) } where hmapr$V1 and hmapr$V2 are the vectors in which I have the locus link identifiers of my genes. When I run the script, in a few time I obtain "Error: invalid subscript type". I noticed that there are problems with the pair of llID where the script stops. Is there a way to bypass the problem, to jump to the next pair of genes after putting NaN in the distances vector? Thanks a lot, Maria Maria Persico MINT database, Cesareni Group Universita' di Tor Vergata, via della Ricerca Scientifica 00133 Roma, Italy Tel: +39 0672594315 FAX: +39 0672594766 e-mail: maria@cbm.bio.uniroma2.it
Hi, As Seth said you can use try (or some of the other features of the exception handling system). Would it be possible to supply me with an example where it fails - it might be better to fix the code so that it handles these cases more gracefully, thanks, Robert On Apr 30, 2005, at 4:14 AM, Maria Persico wrote: > Dear bioconductor mailing list, > > I have some pairs of genes and I would like to measure their > "distances" > on the GO molecular function graph. > > So I wrote this lines of code: > > distances<-numeric(100) > for (i in 1:100){ > > goanal<- > simLL(as.character(hmapr$V1[i]),as.character(hmapr$V2[i]),"MF") > print(goanal$sim) > distances[i]<-goanal$sim > rm(goanal) > } > > where hmapr$V1 and hmapr$V2 are the vectors in which I have the locus > link > identifiers of my genes. > > When I run the script, in a few time I obtain > "Error: invalid subscript type". > > I noticed that there are problems with the pair of llID where the > script > stops. > > Is there a way to bypass the problem, to jump to the next pair > of genes after putting NaN in the distances vector? > > Thanks a lot, > > Maria > > > > > > Maria Persico > MINT database, Cesareni Group > Universita' di Tor Vergata, via della Ricerca Scientifica > 00133 Roma, Italy > Tel: +39 0672594315 > FAX: +39 0672594766 > e-mail: maria@cbm.bio.uniroma2.it > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > > +--------------------------------------------------------------------- -- ----------------+ | Robert Gentleman phone: (206) 667-7700 | | Head, Program in Computational Biology fax: (206) 667-1319 | | Division of Public Health Sciences office: M2-B865 | | Fred Hutchinson Cancer Research Center | | email: rgentlem@fhcrc.org | +--------------------------------------------------------------------- -- ----------------+
Dear Seth, dear Robert, thanks for your answers. I have still problems because I couldn't reinitialize goanal(sorry, I'm a beginner): indeed.... > goanal [1] "Error: invalid subscript type\n" attr(,"class") [1] "try-error" > str(goanal) Class 'try-error' chr "Error: invalid subscript type " this is the code that doesn't work: > distances<-numeric(100) > for (i in 1:100){ + goanal <- try(simLL(as.character(hmapr$V1[i]),as.character(hmapr$V2[i]),"MF")) + if (inherits("gonal", "try-error")) { + goanal <- as.list() + #goanal[["sim"]] <- NA + #print(goanal$sim) + distances[i]<-NA#goanal$sim + + } + else{ + goanal<-simLL(as.character(hmapr$V1[i]),as.character(hmapr$V2[i])," MF") + print(goanal$sim) + distances[i]<-goanal$sim + + } + + } Some example where the function fails: (I'm working with human genes) The last pair for which I have a distance is simLL("2935","8894") The pair where simLL fails: > simLL("10554","10486") Error: invalid subscript type another examples of failures: > simLL("81608","55339") Error: invalid subscript type > simLL("50999","10972") Error: invalid subscript type > simLL("50999","10972") Error: invalid subscript type Thanks, Maria Maria Persico MINT database, Cesareni Group Universita' di Tor Vergata, via della Ricerca Scientifica 00133 Roma, Italy Tel: +39 0672594315 FAX: +39 0672594766 e-mail: maria@cbm.bio.uniroma2.it On Sat, 30 Apr 2005, Robert Gentleman wrote: > Hi, > As Seth said you can use try (or some of the other features of the > exception handling system). > Would it be possible to supply me with an example where it fails - it > might be better to fix the code so that it handles these cases more > gracefully, > > thanks, > Robert > > On Apr 30, 2005, at 4:14 AM, Maria Persico wrote: > > > Dear bioconductor mailing list, > > > > I have some pairs of genes and I would like to measure their > > "distances" > > on the GO molecular function graph. > > > > So I wrote this lines of code: > > > > distances<-numeric(100) > > for (i in 1:100){ > > > > goanal<- > > simLL(as.character(hmapr$V1[i]),as.character(hmapr$V2[i]),"MF") > > print(goanal$sim) > > distances[i]<-goanal$sim > > rm(goanal) > > } > > > > where hmapr$V1 and hmapr$V2 are the vectors in which I have the locus > > link > > identifiers of my genes. > > > > When I run the script, in a few time I obtain > > "Error: invalid subscript type". > > > > I noticed that there are problems with the pair of llID where the > > script > > stops. > > > > Is there a way to bypass the problem, to jump to the next pair > > of genes after putting NaN in the distances vector? > > > > Thanks a lot, > > > > Maria > > > > > > > > > > > > Maria Persico > > MINT database, Cesareni Group > > Universita' di Tor Vergata, via della Ricerca Scientifica > > 00133 Roma, Italy > > Tel: +39 0672594315 > > FAX: +39 0672594766 > > e-mail: maria@cbm.bio.uniroma2.it > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > +------------------------------------------------------------------- ---- > ----------------+ > | Robert Gentleman phone: (206) 667-7700 > | > | Head, Program in Computational Biology fax: (206) 667-1319 | > | Division of Public Health Sciences office: M2-B865 > | > | Fred Hutchinson Cancer Research Center > | > | email: rgentlem@fhcrc.org > | > +------------------------------------------------------------------- ---- > ----------------+ > >
0
14.4 years ago by
kpollard110
United States
kpollard110 wrote:
Jo, The statistic is the same between multtest and t.test (for t-tests) or wilcox.test (for wilcoxen tests), but the p-value is computed differently. The functions in t.test and wilcox.test use tabled distributions to get a p-value for each test. These assume a certain distribution for the test statistics and also treat each test independently. The multtest package provides methods that use the bootstrap (MTP function) or permutations to estimate the joint null distribution of the test statistics and then compute p-values from these. These are empirical null distributions which differ from those used in t.test and wilcox.test in two important ways: (i) they do not assume a particular parametric form for the test statistics distribution, and (ii) they account for correlations between test statistics by using the *joint* null distribution. For these reasons, the p-values computed can be very different from those using tabled, marginal distributions. Hope this helps, Katie > ------------------------------ > > Message: 18 > Date: Fri, 29 Apr 2005 11:58:26 +0200 > From: "Dipl.-Ing. Johannes Rainer" <johannes.rainer@tugraz.at> > Subject: [BioC] multtest problem... > To: "bioconductor@stat.math.ethz.ch" <bioconductor@stat.math.ethz.ch> > Message-ID: <20050429115826.v1sqjl58kk80c88g@webmail.tugraz.at> > Content-Type: text/plain; charset=ISO-8859-1; format="flowed" > > hi, > > i used the mt.teststat function from the package multtest and wonder > why the p values that the function calculates differ from those i get > using the wilcox.test or t.test function. > i tried it with the golub data set (data(golub)) and > wilcox.test(x=golub[,golub.cl==0],y=golub[,golub.cl==1]) > > returns a p-value = 0.8987 > > while teststat<-mt.teststat(golub,golub.cl,test="wilcoxon") > > teststat[1] = 1.75419 > > i thought that the mt.teststat funciton uses simple t tests or wilcox > test to calculate p values... > perhaps i am wrong with the class labels? > > from the help i understood, that if i have two groups in my samples i > submit as classlabels for examples c(0,0,0,1,1,1) and this means that > the first 3 columns of my data correspond to group 0 and the other 3 to > group 1. is this correct? > > thanks, jo