multtest problem...

0

Entering edit mode

Dipl.-Ing. Johannes Rainer ▴ 430

@dipl-ing-johannes-rainer-846

Last seen 11.3 years ago

hi, i used the mt.teststat function from the package multtest and wonder why the p values that the function calculates differ from those i get using the wilcox.test or t.test function. i tried it with the golub data set (data(golub)) and wilcox.test(x=golub[,golub.cl==0],y=golub[,golub.cl==1]) returns a p-value = 0.8987 while teststat<-mt.teststat(golub,golub.cl,test="wilcoxon") teststat[1] = 1.75419 i thought that the mt.teststat funciton uses simple t tests or wilcox test to calculate p values... perhaps i am wrong with the class labels? from the help i understood, that if i have two groups in my samples i submit as classlabels for examples c(0,0,0,1,1,1) and this means that the first 3 columns of my data correspond to group 0 and the other 3 to group 1. is this correct? thanks, jo

multtest multtest • 2.8k views

ADD COMMENT • link updated 20.6 years ago by kpollard ▴ 110 • written 20.6 years ago by Dipl.-Ing. Johannes Rainer ▴ 430

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 9 months ago

United States

On Apr 29, 2005, at 5:58 AM, Dipl.-Ing. Johannes Rainer wrote: > hi, > > i used the mt.teststat function from the package multtest and wonder > why the p values that the function calculates differ from those i get > using the wilcox.test or t.test function. > i tried it with the golub data set (data(golub)) and > wilcox.test(x=golub[,golub.cl==0],y=golub[,golub.cl==1]) > > returns a p-value = 0.8987 > > while teststat<-mt.teststat(golub,golub.cl,test="wilcoxon") > > teststat[1] = 1.75419 > > i thought that the mt.teststat funciton uses simple t tests or wilcox > test to calculate p values... > Looking at the help for mt.teststat (by typing help(mt.teststat)), you can see part way down: Value: For 'mt.teststat', a vector of test statistics for each row (gene). The test statistic is NOT a p-value. You have to convert it to a p-value using some other function (usually something like pnorm, pt, etc). Finally, you will need to apply mt.rawp2adjp to get corrected p-values. I would suggest reading the mt.teststat vignette for more information and worked examples, including how to calculate p-values from mt.teststat output. Hope this helps, Sean

ADD COMMENT • link 20.6 years ago Sean Davis 21k

0

Entering edit mode

:) thanks for the help Quoting Sean Davis <sdavis2@mail.nih.gov>: > > On Apr 29, 2005, at 5:58 AM, Dipl.-Ing. Johannes Rainer wrote: > >> hi, >> >> i used the mt.teststat function from the package multtest and wonder >> why the p values that the function calculates differ from those i >> get using the wilcox.test or t.test function. >> i tried it with the golub data set (data(golub)) and >> wilcox.test(x=golub[,golub.cl==0],y=golub[,golub.cl==1]) >> >> returns a p-value = 0.8987 >> >> while teststat<-mt.teststat(golub,golub.cl,test="wilcoxon") >> >> teststat[1] = 1.75419 >> >> i thought that the mt.teststat funciton uses simple t tests or >> wilcox test to calculate p values... >> > > Looking at the help for mt.teststat (by typing help(mt.teststat)), > you can see part way down: > > Value: > > For 'mt.teststat', a vector of test statistics for each row > (gene). > > The test statistic is NOT a p-value. You have to convert it to a > p-value using some other function (usually something like pnorm, pt, > etc). Finally, you will need to apply mt.rawp2adjp to get corrected > p-values. I would suggest reading the mt.teststat vignette for more > information and worked examples, including how to calculate p-values > from mt.teststat output. > > Hope this helps, > Sean > >

ADD REPLY • link 20.6 years ago Dipl.-Ing. Johannes Rainer ▴ 430

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 11 weeks ago

EMBL European Molecular Biology Laborat…

Hi Jo, your mail is not very clear about the distinction between test statistics and p-value, and perhaps this is where your confusion comes from? There is no reason why the two numbers you state in your mail should be equal, they are different things. Have a look as well at the function rowttests in the develop version of "genefilter", for a fast and userfriendly approach to doing t-tests for each row of an exprSet. Best Wolfgang Dipl.-Ing. Johannes Rainer wrote: > hi, > > i used the mt.teststat function from the package multtest and wonder why > the p values that the function calculates differ from those i get using > the wilcox.test or t.test function. > i tried it with the golub data set (data(golub)) and > wilcox.test(x=golub[,golub.cl==0],y=golub[,golub.cl==1]) > > returns a p-value = 0.8987 > > while teststat<-mt.teststat(golub,golub.cl,test="wilcoxon") > > teststat[1] = 1.75419 > > i thought that the mt.teststat funciton uses simple t tests or wilcox > test to calculate p values... > perhaps i am wrong with the class labels? > > from the help i understood, that if i have two groups in my samples i > submit as classlabels for examples c(0,0,0,1,1,1) and this means that > the first 3 columns of my data correspond to group 0 and the other 3 to > group 1. is this correct? > > thanks, jo > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor -- Best regards Wolfgang ------------------------------------- Wolfgang Huber European Bioinformatics Institute European Molecular Biology Laboratory Cambridge CB10 1SD England Phone: +44 1223 494642 Fax: +44 1223 494486 Http: www.ebi.ac.uk/huber

ADD COMMENT • link 20.6 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Hi. I'm having problem with R and gcrma, i.e. using gcrma (12 arrays, hgu133a) R keep exiting with segmentation fault. Any hint? Info: Debian platform i386-pc-linux-gnu arch i386 os linux-gnu system i386, linux-gnu status major 2 minor 1.0 year 2005 month 04 day 18 language R Bioconductor devel

ADD REPLY • link 20.6 years ago stecalza@tiscali.it ▴ 290

0

Entering edit mode

Hi, Could you give us more information about the version of gcrma you used? You can use sessionInfo() to get the information. Moreover, if the 12 arrays are available online, could you also provide the link for us to download the arrays and test it? I try to run 4 hgu133a arrays in gcrma, and there is no error occured. Here is what I did: #################################################################### > library(gcrma) Loading required package: affy Loading required package: Biobase Loading required package: tools Welcome to Bioconductor Vignettes contain introductory material. To view, simply type: openVignette() For details on reading vignettes, see the openVignette help page. Loading required package: reposTools Loading required package: matchprobes > data <- ReadAffy() > eset <- gcrma(data) > data AffyBatch object size of arrays=712x712 features (15851 kb) cdf=HG-U133A (22283 affyids) number of samples=4 number of genes=22283 annotation=hgu133a > eset Expression Set (exprSet) with 22283 genes 4 samples phenoData object with 1 variables and 4 cases varLabels sample: arbitrary numbering > sessionInfo() R version 2.1.0, 2005-04-20, x86_64-unknown-linux-gnu attached base packages: [1] "splines" "tools" "methods" "stats" "graphics" "grDevices" [7] "utils" "datasets" "base" other attached packages: hgu133aprobe hgu133acdf gcrma matchprobes affy reposTools "1.0" "1.4.3" "1.1.3" "1.0.12" "1.5.8-1" "1.5.2" Biobase "1.5.0" #################################################################### HTH, Ting-Yuan On Fri, 29 Apr 2005, Stefano Calza wrote: > Hi. > > I'm having problem with R and gcrma, i.e. using gcrma (12 arrays, hgu133a) R keep exiting with segmentation fault. > > Any hint? > > Info: > > Debian > platform i386-pc-linux-gnu > arch i386 > os linux-gnu > system i386, linux-gnu > status > major 2 > minor 1.0 > year 2005 > month 04 > day 18 > language R > > Bioconductor devel > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor >

ADD REPLY • link 20.6 years ago Ting-Yuan Liu FHCRC ▴ 780

0

Entering edit mode

Dear bioconductor mailing list, I have some pairs of genes and I would like to measure their "distances" on the GO molecular function graph. So I wrote this lines of code: distances<-numeric(100) for (i in 1:100){ goanal<-simLL(as.character(hmapr$V1[i]),as.character(hmapr$V2[i]),"MF ") print(goanal$sim) distances[i]<-goanal$sim rm(goanal) } where hmapr$V1 and hmapr$V2 are the vectors in which I have the locus link identifiers of my genes. When I run the script, in a few time I obtain "Error: invalid subscript type". I noticed that there are problems with the pair of llID where the script stops. Is there a way to bypass the problem, to jump to the next pair of genes after putting NaN in the distances vector? Thanks a lot, Maria Maria Persico MINT database, Cesareni Group Universita' di Tor Vergata, via della Ricerca Scientifica 00133 Roma, Italy Tel: +39 0672594315 FAX: +39 0672594766 e-mail: maria@cbm.bio.uniroma2.it

ADD REPLY • link 20.6 years ago Maria Persico ▴ 100

0

Entering edit mode

Hi, As Seth said you can use try (or some of the other features of the exception handling system). Would it be possible to supply me with an example where it fails - it might be better to fix the code so that it handles these cases more gracefully, thanks, Robert On Apr 30, 2005, at 4:14 AM, Maria Persico wrote: > Dear bioconductor mailing list, > > I have some pairs of genes and I would like to measure their > "distances" > on the GO molecular function graph. > > So I wrote this lines of code: > > distances<-numeric(100) > for (i in 1:100){ > > goanal<- > simLL(as.character(hmapr$V1[i]),as.character(hmapr$V2[i]),"MF") > print(goanal$sim) > distances[i]<-goanal$sim > rm(goanal) > } > > where hmapr$V1 and hmapr$V2 are the vectors in which I have the locus > link > identifiers of my genes. > > When I run the script, in a few time I obtain > "Error: invalid subscript type". > > I noticed that there are problems with the pair of llID where the > script > stops. > > Is there a way to bypass the problem, to jump to the next pair > of genes after putting NaN in the distances vector? > > Thanks a lot, > > Maria > > > > > > Maria Persico > MINT database, Cesareni Group > Universita' di Tor Vergata, via della Ricerca Scientifica > 00133 Roma, Italy > Tel: +39 0672594315 > FAX: +39 0672594766 > e-mail: maria@cbm.bio.uniroma2.it > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > > +--------------------------------------------------------------------- -- ----------------+ | Robert Gentleman phone: (206) 667-7700 | | Head, Program in Computational Biology fax: (206) 667-1319 | | Division of Public Health Sciences office: M2-B865 | | Fred Hutchinson Cancer Research Center | | email: rgentlem@fhcrc.org | +--------------------------------------------------------------------- -- ----------------+

ADD REPLY • link 20.6 years ago rgentleman ★ 5.5k

0

Entering edit mode

Dear Seth, dear Robert, thanks for your answers. I have still problems because I couldn't reinitialize goanal(sorry, I'm a beginner): indeed.... > goanal [1] "Error: invalid subscript type\n" attr(,"class") [1] "try-error" > str(goanal) Class 'try-error' chr "Error: invalid subscript type " this is the code that doesn't work: > distances<-numeric(100) > for (i in 1:100){ + goanal <- try(simLL(as.character(hmapr$V1[i]),as.character(hmapr$V2[i]),"MF")) + if (inherits("gonal", "try-error")) { + goanal <- as.list() + #goanal[["sim"]] <- NA + #print(goanal$sim) + distances[i]<-NA#goanal$sim + + } + else{ + goanal<-simLL(as.character(hmapr$V1[i]),as.character(hmapr$V2[i])," MF") + print(goanal$sim) + distances[i]<-goanal$sim + + } + + } Some example where the function fails: (I'm working with human genes) The last pair for which I have a distance is simLL("2935","8894") The pair where simLL fails: > simLL("10554","10486") Error: invalid subscript type another examples of failures: > simLL("81608","55339") Error: invalid subscript type > simLL("50999","10972") Error: invalid subscript type > simLL("50999","10972") Error: invalid subscript type Thanks, Maria Maria Persico MINT database, Cesareni Group Universita' di Tor Vergata, via della Ricerca Scientifica 00133 Roma, Italy Tel: +39 0672594315 FAX: +39 0672594766 e-mail: maria@cbm.bio.uniroma2.it On Sat, 30 Apr 2005, Robert Gentleman wrote: > Hi, > As Seth said you can use try (or some of the other features of the > exception handling system). > Would it be possible to supply me with an example where it fails - it > might be better to fix the code so that it handles these cases more > gracefully, > > thanks, > Robert > > On Apr 30, 2005, at 4:14 AM, Maria Persico wrote: > > > Dear bioconductor mailing list, > > > > I have some pairs of genes and I would like to measure their > > "distances" > > on the GO molecular function graph. > > > > So I wrote this lines of code: > > > > distances<-numeric(100) > > for (i in 1:100){ > > > > goanal<- > > simLL(as.character(hmapr$V1[i]),as.character(hmapr$V2[i]),"MF") > > print(goanal$sim) > > distances[i]<-goanal$sim > > rm(goanal) > > } > > > > where hmapr$V1 and hmapr$V2 are the vectors in which I have the locus > > link > > identifiers of my genes. > > > > When I run the script, in a few time I obtain > > "Error: invalid subscript type". > > > > I noticed that there are problems with the pair of llID where the > > script > > stops. > > > > Is there a way to bypass the problem, to jump to the next pair > > of genes after putting NaN in the distances vector? > > > > Thanks a lot, > > > > Maria > > > > > > > > > > > > Maria Persico > > MINT database, Cesareni Group > > Universita' di Tor Vergata, via della Ricerca Scientifica > > 00133 Roma, Italy > > Tel: +39 0672594315 > > FAX: +39 0672594766 > > e-mail: maria@cbm.bio.uniroma2.it > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > +------------------------------------------------------------------- ---- > ----------------+ > | Robert Gentleman phone: (206) 667-7700 > | > | Head, Program in Computational Biology fax: (206) 667-1319 | > | Division of Public Health Sciences office: M2-B865 > | > | Fred Hutchinson Cancer Research Center > | > | email: rgentlem@fhcrc.org > | > +------------------------------------------------------------------- ---- > ----------------+ > >

ADD REPLY • link 20.6 years ago Maria Persico ▴ 100

0

Entering edit mode

kpollard ▴ 110

@kpollard-7578

Last seen 10.5 years ago

United States

Jo, The statistic is the same between multtest and t.test (for t-tests) or wilcox.test (for wilcoxen tests), but the p-value is computed differently. The functions in t.test and wilcox.test use tabled distributions to get a p-value for each test. These assume a certain distribution for the test statistics and also treat each test independently. The multtest package provides methods that use the bootstrap (MTP function) or permutations to estimate the joint null distribution of the test statistics and then compute p-values from these. These are empirical null distributions which differ from those used in t.test and wilcox.test in two important ways: (i) they do not assume a particular parametric form for the test statistics distribution, and (ii) they account for correlations between test statistics by using the *joint* null distribution. For these reasons, the p-values computed can be very different from those using tabled, marginal distributions. Hope this helps, Katie > ------------------------------ > > Message: 18 > Date: Fri, 29 Apr 2005 11:58:26 +0200 > From: "Dipl.-Ing. Johannes Rainer" <johannes.rainer@tugraz.at> > Subject: [BioC] multtest problem... > To: "bioconductor@stat.math.ethz.ch" <bioconductor@stat.math.ethz.ch> > Message-ID: <20050429115826.v1sqjl58kk80c88g@webmail.tugraz.at> > Content-Type: text/plain; charset=ISO-8859-1; format="flowed" > > hi, > > i used the mt.teststat function from the package multtest and wonder > why the p values that the function calculates differ from those i get > using the wilcox.test or t.test function. > i tried it with the golub data set (data(golub)) and > wilcox.test(x=golub[,golub.cl==0],y=golub[,golub.cl==1]) > > returns a p-value = 0.8987 > > while teststat<-mt.teststat(golub,golub.cl,test="wilcoxon") > > teststat[1] = 1.75419 > > i thought that the mt.teststat funciton uses simple t tests or wilcox > test to calculate p values... > perhaps i am wrong with the class labels? > > from the help i understood, that if i have two groups in my samples i > submit as classlabels for examples c(0,0,0,1,1,1) and this means that > the first 3 columns of my data correspond to group 0 and the other 3 to > group 1. is this correct? > > thanks, jo

ADD COMMENT • link 20.6 years ago kpollard ▴ 110

Login before adding your answer.