multtest problem...
3
0
Entering edit mode
@dipl-ing-johannes-rainer-846
Last seen 10.2 years ago
hi, i used the mt.teststat function from the package multtest and wonder why the p values that the function calculates differ from those i get using the wilcox.test or t.test function. i tried it with the golub data set (data(golub)) and wilcox.test(x=golub[,golub.cl==0],y=golub[,golub.cl==1]) returns a p-value = 0.8987 while teststat<-mt.teststat(golub,golub.cl,test="wilcoxon") teststat[1] = 1.75419 i thought that the mt.teststat funciton uses simple t tests or wilcox test to calculate p values... perhaps i am wrong with the class labels? from the help i understood, that if i have two groups in my samples i submit as classlabels for examples c(0,0,0,1,1,1) and this means that the first 3 columns of my data correspond to group 0 and the other 3 to group 1. is this correct? thanks, jo
multtest multtest • 2.3k views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 3 months ago
United States
On Apr 29, 2005, at 5:58 AM, Dipl.-Ing. Johannes Rainer wrote: > hi, > > i used the mt.teststat function from the package multtest and wonder > why the p values that the function calculates differ from those i get > using the wilcox.test or t.test function. > i tried it with the golub data set (data(golub)) and > wilcox.test(x=golub[,golub.cl==0],y=golub[,golub.cl==1]) > > returns a p-value = 0.8987 > > while teststat<-mt.teststat(golub,golub.cl,test="wilcoxon") > > teststat[1] = 1.75419 > > i thought that the mt.teststat funciton uses simple t tests or wilcox > test to calculate p values... > Looking at the help for mt.teststat (by typing help(mt.teststat)), you can see part way down: Value: For 'mt.teststat', a vector of test statistics for each row (gene). The test statistic is NOT a p-value. You have to convert it to a p-value using some other function (usually something like pnorm, pt, etc). Finally, you will need to apply mt.rawp2adjp to get corrected p-values. I would suggest reading the mt.teststat vignette for more information and worked examples, including how to calculate p-values from mt.teststat output. Hope this helps, Sean
ADD COMMENT
0
Entering edit mode
:) thanks for the help Quoting Sean Davis <sdavis2@mail.nih.gov>: > > On Apr 29, 2005, at 5:58 AM, Dipl.-Ing. Johannes Rainer wrote: > >> hi, >> >> i used the mt.teststat function from the package multtest and wonder >> why the p values that the function calculates differ from those i >> get using the wilcox.test or t.test function. >> i tried it with the golub data set (data(golub)) and >> wilcox.test(x=golub[,golub.cl==0],y=golub[,golub.cl==1]) >> >> returns a p-value = 0.8987 >> >> while teststat<-mt.teststat(golub,golub.cl,test="wilcoxon") >> >> teststat[1] = 1.75419 >> >> i thought that the mt.teststat funciton uses simple t tests or >> wilcox test to calculate p values... >> > > Looking at the help for mt.teststat (by typing help(mt.teststat)), > you can see part way down: > > Value: > > For 'mt.teststat', a vector of test statistics for each row > (gene). > > The test statistic is NOT a p-value. You have to convert it to a > p-value using some other function (usually something like pnorm, pt, > etc). Finally, you will need to apply mt.rawp2adjp to get corrected > p-values. I would suggest reading the mt.teststat vignette for more > information and worked examples, including how to calculate p-values > from mt.teststat output. > > Hope this helps, > Sean > >
ADD REPLY
0
Entering edit mode
@wolfgang-huber-3550
Last seen 12 weeks ago
EMBL European Molecular Biology Laborat…
Hi Jo, your mail is not very clear about the distinction between test statistics and p-value, and perhaps this is where your confusion comes from? There is no reason why the two numbers you state in your mail should be equal, they are different things. Have a look as well at the function rowttests in the develop version of "genefilter", for a fast and userfriendly approach to doing t-tests for each row of an exprSet. Best Wolfgang Dipl.-Ing. Johannes Rainer wrote: > hi, > > i used the mt.teststat function from the package multtest and wonder why > the p values that the function calculates differ from those i get using > the wilcox.test or t.test function. > i tried it with the golub data set (data(golub)) and > wilcox.test(x=golub[,golub.cl==0],y=golub[,golub.cl==1]) > > returns a p-value = 0.8987 > > while teststat<-mt.teststat(golub,golub.cl,test="wilcoxon") > > teststat[1] = 1.75419 > > i thought that the mt.teststat funciton uses simple t tests or wilcox > test to calculate p values... > perhaps i am wrong with the class labels? > > from the help i understood, that if i have two groups in my samples i > submit as classlabels for examples c(0,0,0,1,1,1) and this means that > the first 3 columns of my data correspond to group 0 and the other 3 to > group 1. is this correct? > > thanks, jo > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor -- Best regards Wolfgang ------------------------------------- Wolfgang Huber European Bioinformatics Institute European Molecular Biology Laboratory Cambridge CB10 1SD England Phone: +44 1223 494642 Fax: +44 1223 494486 Http: www.ebi.ac.uk/huber
ADD COMMENT
0
Entering edit mode
Hi. I'm having problem with R and gcrma, i.e. using gcrma (12 arrays, hgu133a) R keep exiting with segmentation fault. Any hint? Info: Debian platform i386-pc-linux-gnu arch i386 os linux-gnu system i386, linux-gnu status major 2 minor 1.0 year 2005 month 04 day 18 language R Bioconductor devel
ADD REPLY
0
Entering edit mode
Hi, Could you give us more information about the version of gcrma you used? You can use sessionInfo() to get the information. Moreover, if the 12 arrays are available online, could you also provide the link for us to download the arrays and test it? I try to run 4 hgu133a arrays in gcrma, and there is no error occured. Here is what I did: #################################################################### > library(gcrma) Loading required package: affy Loading required package: Biobase Loading required package: tools Welcome to Bioconductor Vignettes contain introductory material. To view, simply type: openVignette() For details on reading vignettes, see the openVignette help page. Loading required package: reposTools Loading required package: matchprobes > data <- ReadAffy() > eset <- gcrma(data) > data AffyBatch object size of arrays=712x712 features (15851 kb) cdf=HG-U133A (22283 affyids) number of samples=4 number of genes=22283 annotation=hgu133a > eset Expression Set (exprSet) with 22283 genes 4 samples phenoData object with 1 variables and 4 cases varLabels sample: arbitrary numbering > sessionInfo() R version 2.1.0, 2005-04-20, x86_64-unknown-linux-gnu attached base packages: [1] "splines" "tools" "methods" "stats" "graphics" "grDevices" [7] "utils" "datasets" "base" other attached packages: hgu133aprobe hgu133acdf gcrma matchprobes affy reposTools "1.0" "1.4.3" "1.1.3" "1.0.12" "1.5.8-1" "1.5.2" Biobase "1.5.0" #################################################################### HTH, Ting-Yuan On Fri, 29 Apr 2005, Stefano Calza wrote: > Hi. > > I'm having problem with R and gcrma, i.e. using gcrma (12 arrays, hgu133a) R keep exiting with segmentation fault. > > Any hint? > > Info: > > Debian > platform i386-pc-linux-gnu > arch i386 > os linux-gnu > system i386, linux-gnu > status > major 2 > minor 1.0 > year 2005 > month 04 > day 18 > language R > > Bioconductor devel > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor >
ADD REPLY
0
Entering edit mode
Dear bioconductor mailing list, I have some pairs of genes and I would like to measure their "distances" on the GO molecular function graph. So I wrote this lines of code: distances<-numeric(100) for (i in 1:100){ goanal<-simLL(as.character(hmapr$V1[i]),as.character(hmapr$V2[i]),"MF ") print(goanal$sim) distances[i]<-goanal$sim rm(goanal) } where hmapr$V1 and hmapr$V2 are the vectors in which I have the locus link identifiers of my genes. When I run the script, in a few time I obtain "Error: invalid subscript type". I noticed that there are problems with the pair of llID where the script stops. Is there a way to bypass the problem, to jump to the next pair of genes after putting NaN in the distances vector? Thanks a lot, Maria Maria Persico MINT database, Cesareni Group Universita' di Tor Vergata, via della Ricerca Scientifica 00133 Roma, Italy Tel: +39 0672594315 FAX: +39 0672594766 e-mail: maria@cbm.bio.uniroma2.it
ADD REPLY
0
Entering edit mode
Hi, As Seth said you can use try (or some of the other features of the exception handling system). Would it be possible to supply me with an example where it fails - it might be better to fix the code so that it handles these cases more gracefully, thanks, Robert On Apr 30, 2005, at 4:14 AM, Maria Persico wrote: > Dear bioconductor mailing list, > > I have some pairs of genes and I would like to measure their > "distances" > on the GO molecular function graph. > > So I wrote this lines of code: > > distances<-numeric(100) > for (i in 1:100){ > > goanal<- > simLL(as.character(hmapr$V1[i]),as.character(hmapr$V2[i]),"MF") > print(goanal$sim) > distances[i]<-goanal$sim > rm(goanal) > } > > where hmapr$V1 and hmapr$V2 are the vectors in which I have the locus > link > identifiers of my genes. > > When I run the script, in a few time I obtain > "Error: invalid subscript type". > > I noticed that there are problems with the pair of llID where the > script > stops. > > Is there a way to bypass the problem, to jump to the next pair > of genes after putting NaN in the distances vector? > > Thanks a lot, > > Maria > > > > > > Maria Persico > MINT database, Cesareni Group > Universita' di Tor Vergata, via della Ricerca Scientifica > 00133 Roma, Italy > Tel: +39 0672594315 > FAX: +39 0672594766 > e-mail: maria@cbm.bio.uniroma2.it > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > > +--------------------------------------------------------------------- -- ----------------+ | Robert Gentleman phone: (206) 667-7700 | | Head, Program in Computational Biology fax: (206) 667-1319 | | Division of Public Health Sciences office: M2-B865 | | Fred Hutchinson Cancer Research Center | | email: rgentlem@fhcrc.org | +--------------------------------------------------------------------- -- ----------------+
ADD REPLY
0
Entering edit mode
Dear Seth, dear Robert, thanks for your answers. I have still problems because I couldn't reinitialize goanal(sorry, I'm a beginner): indeed.... > goanal [1] "Error: invalid subscript type\n" attr(,"class") [1] "try-error" > str(goanal) Class 'try-error' chr "Error: invalid subscript type " this is the code that doesn't work: > distances<-numeric(100) > for (i in 1:100){ + goanal <- try(simLL(as.character(hmapr$V1[i]),as.character(hmapr$V2[i]),"MF")) + if (inherits("gonal", "try-error")) { + goanal <- as.list() + #goanal[["sim"]] <- NA + #print(goanal$sim) + distances[i]<-NA#goanal$sim + + } + else{ + goanal<-simLL(as.character(hmapr$V1[i]),as.character(hmapr$V2[i])," MF") + print(goanal$sim) + distances[i]<-goanal$sim + + } + + } Some example where the function fails: (I'm working with human genes) The last pair for which I have a distance is simLL("2935","8894") The pair where simLL fails: > simLL("10554","10486") Error: invalid subscript type another examples of failures: > simLL("81608","55339") Error: invalid subscript type > simLL("50999","10972") Error: invalid subscript type > simLL("50999","10972") Error: invalid subscript type Thanks, Maria Maria Persico MINT database, Cesareni Group Universita' di Tor Vergata, via della Ricerca Scientifica 00133 Roma, Italy Tel: +39 0672594315 FAX: +39 0672594766 e-mail: maria@cbm.bio.uniroma2.it On Sat, 30 Apr 2005, Robert Gentleman wrote: > Hi, > As Seth said you can use try (or some of the other features of the > exception handling system). > Would it be possible to supply me with an example where it fails - it > might be better to fix the code so that it handles these cases more > gracefully, > > thanks, > Robert > > On Apr 30, 2005, at 4:14 AM, Maria Persico wrote: > > > Dear bioconductor mailing list, > > > > I have some pairs of genes and I would like to measure their > > "distances" > > on the GO molecular function graph. > > > > So I wrote this lines of code: > > > > distances<-numeric(100) > > for (i in 1:100){ > > > > goanal<- > > simLL(as.character(hmapr$V1[i]),as.character(hmapr$V2[i]),"MF") > > print(goanal$sim) > > distances[i]<-goanal$sim > > rm(goanal) > > } > > > > where hmapr$V1 and hmapr$V2 are the vectors in which I have the locus > > link > > identifiers of my genes. > > > > When I run the script, in a few time I obtain > > "Error: invalid subscript type". > > > > I noticed that there are problems with the pair of llID where the > > script > > stops. > > > > Is there a way to bypass the problem, to jump to the next pair > > of genes after putting NaN in the distances vector? > > > > Thanks a lot, > > > > Maria > > > > > > > > > > > > Maria Persico > > MINT database, Cesareni Group > > Universita' di Tor Vergata, via della Ricerca Scientifica > > 00133 Roma, Italy > > Tel: +39 0672594315 > > FAX: +39 0672594766 > > e-mail: maria@cbm.bio.uniroma2.it > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > +------------------------------------------------------------------- ---- > ----------------+ > | Robert Gentleman phone: (206) 667-7700 > | > | Head, Program in Computational Biology fax: (206) 667-1319 | > | Division of Public Health Sciences office: M2-B865 > | > | Fred Hutchinson Cancer Research Center > | > | email: rgentlem@fhcrc.org > | > +------------------------------------------------------------------- ---- > ----------------+ > >
ADD REPLY
0
Entering edit mode
kpollard ▴ 110
@kpollard-7578
Last seen 9.5 years ago
United States
Jo, The statistic is the same between multtest and t.test (for t-tests) or wilcox.test (for wilcoxen tests), but the p-value is computed differently. The functions in t.test and wilcox.test use tabled distributions to get a p-value for each test. These assume a certain distribution for the test statistics and also treat each test independently. The multtest package provides methods that use the bootstrap (MTP function) or permutations to estimate the joint null distribution of the test statistics and then compute p-values from these. These are empirical null distributions which differ from those used in t.test and wilcox.test in two important ways: (i) they do not assume a particular parametric form for the test statistics distribution, and (ii) they account for correlations between test statistics by using the *joint* null distribution. For these reasons, the p-values computed can be very different from those using tabled, marginal distributions. Hope this helps, Katie > ------------------------------ > > Message: 18 > Date: Fri, 29 Apr 2005 11:58:26 +0200 > From: "Dipl.-Ing. Johannes Rainer" <johannes.rainer@tugraz.at> > Subject: [BioC] multtest problem... > To: "bioconductor@stat.math.ethz.ch" <bioconductor@stat.math.ethz.ch> > Message-ID: <20050429115826.v1sqjl58kk80c88g@webmail.tugraz.at> > Content-Type: text/plain; charset=ISO-8859-1; format="flowed" > > hi, > > i used the mt.teststat function from the package multtest and wonder > why the p values that the function calculates differ from those i get > using the wilcox.test or t.test function. > i tried it with the golub data set (data(golub)) and > wilcox.test(x=golub[,golub.cl==0],y=golub[,golub.cl==1]) > > returns a p-value = 0.8987 > > while teststat<-mt.teststat(golub,golub.cl,test="wilcoxon") > > teststat[1] = 1.75419 > > i thought that the mt.teststat funciton uses simple t tests or wilcox > test to calculate p values... > perhaps i am wrong with the class labels? > > from the help i understood, that if i have two groups in my samples i > submit as classlabels for examples c(0,0,0,1,1,1) and this means that > the first 3 columns of my data correspond to group 0 and the other 3 to > group 1. is this correct? > > thanks, jo
ADD COMMENT

Login before adding your answer.

Traffic: 561 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6