hi,
i used the mt.teststat function from the package multtest and wonder
why the p values that the function calculates differ from those i get
using the wilcox.test or t.test function.
i tried it with the golub data set (data(golub)) and
wilcox.test(x=golub[,golub.cl==0],y=golub[,golub.cl==1])
returns a p-value = 0.8987
while teststat<-mt.teststat(golub,golub.cl,test="wilcoxon")
teststat[1] = 1.75419
i thought that the mt.teststat funciton uses simple t tests or wilcox
test to calculate p values...
perhaps i am wrong with the class labels?
from the help i understood, that if i have two groups in my samples i
submit as classlabels for examples c(0,0,0,1,1,1) and this means that
the first 3 columns of my data correspond to group 0 and the other 3
to
group 1. is this correct?
thanks, jo
On Apr 29, 2005, at 5:58 AM, Dipl.-Ing. Johannes Rainer wrote:
> hi,
>
> i used the mt.teststat function from the package multtest and wonder
> why the p values that the function calculates differ from those i
get
> using the wilcox.test or t.test function.
> i tried it with the golub data set (data(golub)) and
> wilcox.test(x=golub[,golub.cl==0],y=golub[,golub.cl==1])
>
> returns a p-value = 0.8987
>
> while teststat<-mt.teststat(golub,golub.cl,test="wilcoxon")
>
> teststat[1] = 1.75419
>
> i thought that the mt.teststat funciton uses simple t tests or
wilcox
> test to calculate p values...
>
Looking at the help for mt.teststat (by typing help(mt.teststat)), you
can see part way down:
Value:
For 'mt.teststat', a vector of test statistics for each row
(gene).
The test statistic is NOT a p-value. You have to convert it to a
p-value using some other function (usually something like pnorm, pt,
etc). Finally, you will need to apply mt.rawp2adjp to get corrected
p-values. I would suggest reading the mt.teststat vignette for more
information and worked examples, including how to calculate p-values
from mt.teststat output.
Hope this helps,
Sean
:) thanks for the help
Quoting Sean Davis <sdavis2@mail.nih.gov>:
>
> On Apr 29, 2005, at 5:58 AM, Dipl.-Ing. Johannes Rainer wrote:
>
>> hi,
>>
>> i used the mt.teststat function from the package multtest and
wonder
>> why the p values that the function calculates differ from those i
>> get using the wilcox.test or t.test function.
>> i tried it with the golub data set (data(golub)) and
>> wilcox.test(x=golub[,golub.cl==0],y=golub[,golub.cl==1])
>>
>> returns a p-value = 0.8987
>>
>> while teststat<-mt.teststat(golub,golub.cl,test="wilcoxon")
>>
>> teststat[1] = 1.75419
>>
>> i thought that the mt.teststat funciton uses simple t tests or
>> wilcox test to calculate p values...
>>
>
> Looking at the help for mt.teststat (by typing help(mt.teststat)),
> you can see part way down:
>
> Value:
>
> For 'mt.teststat', a vector of test statistics for each row
> (gene).
>
> The test statistic is NOT a p-value. You have to convert it to a
> p-value using some other function (usually something like pnorm, pt,
> etc). Finally, you will need to apply mt.rawp2adjp to get corrected
> p-values. I would suggest reading the mt.teststat vignette for more
> information and worked examples, including how to calculate p-values
> from mt.teststat output.
>
> Hope this helps,
> Sean
>
>
Hi Jo,
your mail is not very clear about the distinction between test
statistics and p-value, and perhaps this is where your confusion comes
from?
There is no reason why the two numbers you state in your mail should
be
equal, they are different things.
Have a look as well at the function rowttests in the develop version
of
"genefilter", for a fast and userfriendly approach to doing t-tests
for
each row of an exprSet.
Best
Wolfgang
Dipl.-Ing. Johannes Rainer wrote:
> hi,
>
> i used the mt.teststat function from the package multtest and wonder
why
> the p values that the function calculates differ from those i get
using
> the wilcox.test or t.test function.
> i tried it with the golub data set (data(golub)) and
> wilcox.test(x=golub[,golub.cl==0],y=golub[,golub.cl==1])
>
> returns a p-value = 0.8987
>
> while teststat<-mt.teststat(golub,golub.cl,test="wilcoxon")
>
> teststat[1] = 1.75419
>
> i thought that the mt.teststat funciton uses simple t tests or
wilcox
> test to calculate p values...
> perhaps i am wrong with the class labels?
>
> from the help i understood, that if i have two groups in my samples
i
> submit as classlabels for examples c(0,0,0,1,1,1) and this means
that
> the first 3 columns of my data correspond to group 0 and the other 3
to
> group 1. is this correct?
>
> thanks, jo
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
--
Best regards
Wolfgang
-------------------------------------
Wolfgang Huber
European Bioinformatics Institute
European Molecular Biology Laboratory
Cambridge CB10 1SD
England
Phone: +44 1223 494642
Fax: +44 1223 494486
Http: www.ebi.ac.uk/huber
Hi.
I'm having problem with R and gcrma, i.e. using gcrma (12 arrays,
hgu133a) R keep exiting with segmentation fault.
Any hint?
Info:
Debian
platform i386-pc-linux-gnu
arch i386
os linux-gnu
system i386, linux-gnu
status
major 2
minor 1.0
year 2005
month 04
day 18
language R
Bioconductor devel
Hi,
Could you give us more information about the version of gcrma you
used?
You can use sessionInfo() to get the information. Moreover, if the 12
arrays are available online, could you also provide the link for us to
download the arrays and test it?
I try to run 4 hgu133a arrays in gcrma, and there is no error occured.
Here is what I did:
####################################################################
> library(gcrma)
Loading required package: affy
Loading required package: Biobase
Loading required package: tools
Welcome to Bioconductor
Vignettes contain introductory material. To view,
simply type: openVignette()
For details on reading vignettes, see
the openVignette help page.
Loading required package: reposTools
Loading required package: matchprobes
> data <- ReadAffy()
> eset <- gcrma(data)
> data
AffyBatch object
size of arrays=712x712 features (15851 kb)
cdf=HG-U133A (22283 affyids)
number of samples=4
number of genes=22283
annotation=hgu133a
> eset
Expression Set (exprSet) with
22283 genes
4 samples
phenoData object with 1 variables and 4 cases
varLabels
sample: arbitrary numbering
> sessionInfo()
R version 2.1.0, 2005-04-20, x86_64-unknown-linux-gnu
attached base packages:
[1] "splines" "tools" "methods" "stats" "graphics"
"grDevices"
[7] "utils" "datasets" "base"
other attached packages:
hgu133aprobe hgu133acdf gcrma matchprobes affy
reposTools
"1.0" "1.4.3" "1.1.3" "1.0.12" "1.5.8-1"
"1.5.2"
Biobase
"1.5.0"
####################################################################
HTH,
Ting-Yuan
On Fri, 29 Apr 2005, Stefano Calza wrote:
> Hi.
>
> I'm having problem with R and gcrma, i.e. using gcrma (12 arrays,
hgu133a) R keep exiting with segmentation fault.
>
> Any hint?
>
> Info:
>
> Debian
> platform i386-pc-linux-gnu
> arch i386
> os linux-gnu
> system i386, linux-gnu
> status
> major 2
> minor 1.0
> year 2005
> month 04
> day 18
> language R
>
> Bioconductor devel
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
Dear bioconductor mailing list,
I have some pairs of genes and I would like to measure their
"distances"
on the GO molecular function graph.
So I wrote this lines of code:
distances<-numeric(100)
for (i in 1:100){
goanal<-simLL(as.character(hmapr$V1[i]),as.character(hmapr$V2[i]),"MF
")
print(goanal$sim)
distances[i]<-goanal$sim
rm(goanal)
}
where hmapr$V1 and hmapr$V2 are the vectors in which I have the locus
link
identifiers of my genes.
When I run the script, in a few time I obtain
"Error: invalid subscript type".
I noticed that there are problems with the pair of llID where the
script
stops.
Is there a way to bypass the problem, to jump to the next pair
of genes after putting NaN in the distances vector?
Thanks a lot,
Maria
Maria Persico
MINT database, Cesareni Group
Universita' di Tor Vergata, via della Ricerca Scientifica
00133 Roma, Italy
Tel: +39 0672594315
FAX: +39 0672594766
e-mail: maria@cbm.bio.uniroma2.it
Hi,
As Seth said you can use try (or some of the other features of the
exception handling system).
Would it be possible to supply me with an example where it fails - it
might be better to fix the code so that it handles these cases more
gracefully,
thanks,
Robert
On Apr 30, 2005, at 4:14 AM, Maria Persico wrote:
> Dear bioconductor mailing list,
>
> I have some pairs of genes and I would like to measure their
> "distances"
> on the GO molecular function graph.
>
> So I wrote this lines of code:
>
> distances<-numeric(100)
> for (i in 1:100){
>
> goanal<-
> simLL(as.character(hmapr$V1[i]),as.character(hmapr$V2[i]),"MF")
> print(goanal$sim)
> distances[i]<-goanal$sim
> rm(goanal)
> }
>
> where hmapr$V1 and hmapr$V2 are the vectors in which I have the
locus
> link
> identifiers of my genes.
>
> When I run the script, in a few time I obtain
> "Error: invalid subscript type".
>
> I noticed that there are problems with the pair of llID where the
> script
> stops.
>
> Is there a way to bypass the problem, to jump to the next pair
> of genes after putting NaN in the distances vector?
>
> Thanks a lot,
>
> Maria
>
>
>
>
>
> Maria Persico
> MINT database, Cesareni Group
> Universita' di Tor Vergata, via della Ricerca Scientifica
> 00133 Roma, Italy
> Tel: +39 0672594315
> FAX: +39 0672594766
> e-mail: maria@cbm.bio.uniroma2.it
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
>
+---------------------------------------------------------------------
--
----------------+
| Robert Gentleman phone: (206) 667-7700
|
| Head, Program in Computational Biology fax: (206) 667-1319 |
| Division of Public Health Sciences office: M2-B865
|
| Fred Hutchinson Cancer Research Center
|
| email: rgentlem@fhcrc.org
|
+---------------------------------------------------------------------
--
----------------+
Dear Seth, dear Robert,
thanks for your answers.
I have still problems because I couldn't reinitialize goanal(sorry,
I'm a
beginner):
indeed....
> goanal
[1] "Error: invalid subscript type\n"
attr(,"class")
[1] "try-error"
> str(goanal)
Class 'try-error' chr "Error: invalid subscript type
"
this is the code that doesn't work:
> distances<-numeric(100)
> for (i in 1:100){
+ goanal <-
try(simLL(as.character(hmapr$V1[i]),as.character(hmapr$V2[i]),"MF"))
+ if (inherits("gonal", "try-error")) {
+ goanal <- as.list()
+ #goanal[["sim"]] <- NA
+ #print(goanal$sim)
+ distances[i]<-NA#goanal$sim
+
+ }
+ else{
+ goanal<-simLL(as.character(hmapr$V1[i]),as.character(hmapr$V2[i]),"
MF")
+ print(goanal$sim)
+ distances[i]<-goanal$sim
+
+ }
+
+ }
Some example where the function fails:
(I'm working with human genes)
The last pair for which I have a distance is simLL("2935","8894")
The pair where simLL fails:
> simLL("10554","10486")
Error: invalid subscript type
another examples of failures:
> simLL("81608","55339")
Error: invalid subscript type
> simLL("50999","10972")
Error: invalid subscript type
> simLL("50999","10972")
Error: invalid subscript type
Thanks,
Maria
Maria Persico
MINT database, Cesareni Group
Universita' di Tor Vergata, via della Ricerca Scientifica
00133 Roma, Italy
Tel: +39 0672594315
FAX: +39 0672594766
e-mail: maria@cbm.bio.uniroma2.it
On Sat, 30 Apr 2005, Robert Gentleman wrote:
> Hi,
> As Seth said you can use try (or some of the other features of
the
> exception handling system).
> Would it be possible to supply me with an example where it fails -
it
> might be better to fix the code so that it handles these cases more
> gracefully,
>
> thanks,
> Robert
>
> On Apr 30, 2005, at 4:14 AM, Maria Persico wrote:
>
> > Dear bioconductor mailing list,
> >
> > I have some pairs of genes and I would like to measure their
> > "distances"
> > on the GO molecular function graph.
> >
> > So I wrote this lines of code:
> >
> > distances<-numeric(100)
> > for (i in 1:100){
> >
> > goanal<-
> > simLL(as.character(hmapr$V1[i]),as.character(hmapr$V2[i]),"MF")
> > print(goanal$sim)
> > distances[i]<-goanal$sim
> > rm(goanal)
> > }
> >
> > where hmapr$V1 and hmapr$V2 are the vectors in which I have the
locus
> > link
> > identifiers of my genes.
> >
> > When I run the script, in a few time I obtain
> > "Error: invalid subscript type".
> >
> > I noticed that there are problems with the pair of llID where the
> > script
> > stops.
> >
> > Is there a way to bypass the problem, to jump to the next pair
> > of genes after putting NaN in the distances vector?
> >
> > Thanks a lot,
> >
> > Maria
> >
> >
> >
> >
> >
> > Maria Persico
> > MINT database, Cesareni Group
> > Universita' di Tor Vergata, via della Ricerca Scientifica
> > 00133 Roma, Italy
> > Tel: +39 0672594315
> > FAX: +39 0672594766
> > e-mail: maria@cbm.bio.uniroma2.it
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor@stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> >
> >
> +-------------------------------------------------------------------
----
> ----------------+
> | Robert Gentleman phone: (206) 667-7700
> |
> | Head, Program in Computational Biology fax: (206) 667-1319 |
> | Division of Public Health Sciences office: M2-B865
> |
> | Fred Hutchinson Cancer Research Center
> |
> | email: rgentlem@fhcrc.org
> |
> +-------------------------------------------------------------------
----
> ----------------+
>
>
Jo,
The statistic is the same between multtest and t.test (for t-tests) or
wilcox.test (for wilcoxen tests), but the p-value is computed
differently.
The functions in t.test and wilcox.test use tabled distributions to
get a
p-value for each test. These assume a certain distribution for the
test
statistics and also treat each test independently. The multtest
package
provides methods that use the bootstrap (MTP function) or permutations
to
estimate the joint null distribution of the test statistics and then
compute p-values from these. These are empirical null distributions
which
differ from those used in t.test and wilcox.test in two important
ways:
(i) they do not assume a particular parametric form for the test
statistics distribution, and (ii) they account for correlations
between
test statistics by using the *joint* null distribution. For these
reasons,
the p-values computed can be very different from those using tabled,
marginal distributions.
Hope this helps,
Katie
> ------------------------------
>
> Message: 18
> Date: Fri, 29 Apr 2005 11:58:26 +0200
> From: "Dipl.-Ing. Johannes Rainer" <johannes.rainer@tugraz.at>
> Subject: [BioC] multtest problem...
> To: "bioconductor@stat.math.ethz.ch"
<bioconductor@stat.math.ethz.ch>
> Message-ID: <20050429115826.v1sqjl58kk80c88g@webmail.tugraz.at>
> Content-Type: text/plain; charset=ISO-8859-1;
format="flowed"
>
> hi,
>
> i used the mt.teststat function from the package multtest and wonder
> why the p values that the function calculates differ from those i
get
> using the wilcox.test or t.test function.
> i tried it with the golub data set (data(golub)) and
> wilcox.test(x=golub[,golub.cl==0],y=golub[,golub.cl==1])
>
> returns a p-value = 0.8987
>
> while teststat<-mt.teststat(golub,golub.cl,test="wilcoxon")
>
> teststat[1] = 1.75419
>
> i thought that the mt.teststat funciton uses simple t tests or
wilcox
> test to calculate p values...
> perhaps i am wrong with the class labels?
>
> from the help i understood, that if i have two groups in my samples
i
> submit as classlabels for examples c(0,0,0,1,1,1) and this means
that
> the first 3 columns of my data correspond to group 0 and the other 3
to
> group 1. is this correct?
>
> thanks, jo