Hi
I want to calcualte p-value for my matrix of gene expression data,
based
on t-tests, which are adjusted for the FDR (according to Benjamini and
Hochberg 1995).
In the multtest package we have mt.teststat(), which tells me how to
calculate the t statistic for each of the rows in my data frame, and
we
have mt.rawp2adjp(), which converts raw p-values into adjusted
p-values.
So there is a missing step - the first function tells me how to create
t. I then need to access the p-values for this t statistic, and then
go
on to convert them into adjusted p-values.
Now, the documentation for mt.maxT() and mt.minP() *suggests* that raw
p-values *can* be obtained from these functions. However, when
running
them and then comparing the $rawp slots to the p-values achieved by
running t.test(), I find that these rawp values *do not* correspond to
the equivalent p-values outputted by t.test.
SO, what I now plan on doing is:
1) iterating through my matrix myself, running t.test() on each row,
and
storing the p-values
2) using these p-values as an input to mt.rawp2adjp() to create a list
of adjusted p-values
3) mapping these adjusted p-values back onto my original data matrix
So, I come to my questions:
1) can anyone tell me how to get raw p-values for the t-statistic
using
multtest?
2) as the documentation for mt.rawp2adjp() says "This function
computes
adjusted p-values for simple multiple testing procedures from a vector
of raw (unadjusted) p-values", I presume plugging in p-value directly
from t.test() is perfectly valid?
Thanks
Mick
> In the multtest package we have mt.teststat(), which tells me how to
> calculate the t statistic for each of the rows in my data frame, and
we
> have mt.rawp2adjp(), which converts raw p-values into adjusted
p-values.
>
> So there is a missing step - the first function tells me how to
create
> t. I then need to access the p-values for this t statistic, and
then go
> on to convert them into adjusted p-values.
>
> Now, the documentation for mt.maxT() and mt.minP() *suggests* that
raw
> p-values *can* be obtained from these functions. However, when
running
> them and then comparing the $rawp slots to the p-values achieved by
> running t.test(), I find that these rawp values *do not* correspond
to
> the equivalent p-values outputted by t.test.
t.test assumes the gene expression comes from a ***normal***
distribution.
mt.maxT doesn't rely on such normality assumption. If the data are
reasonably normally distributed, you will expect the raw p-values from
mt.maxT and p-values from t.test should be close.
> SO, what I now plan on doing is:
>
> 1) iterating through my matrix myself, running t.test() on each row,
and
> storing the p-values
> 2) using these p-values as an input to mt.rawp2adjp() to create a
list
> of adjusted p-values
> 3) mapping these adjusted p-values back onto my original data matrix
>
> So, I come to my questions:
>
> 1) can anyone tell me how to get raw p-values for the t-statistic
using
> multtest?
The resampling based raw p-values can be obtained from mt.maxT. The
t.test based p-values can be obtained as you mentioned by iterating
t.test
on each row.
> 2) as the documentation for mt.rawp2adjp() says "This function
computes
> adjusted p-values for simple multiple testing procedures from a
vector
> of raw (unadjusted) p-values", I presume plugging in p-value
directly
> from t.test() is perfectly valid?
If you are comfortable with the assumption that the gene expressions
are
normally distributed, then you will be OK. However, I will be hesitant
to
rely the normality assumptions unless some evidences supports it. You
will
be a little bit more safe in using permutation based raw p-values. In
a
future release of multtest, various bootstrap raw based p-values will
be
provided.
Yongchao