Entering edit mode
McGee, Monnie
▴
300
@mcgee-monnie-1108
Last seen 10.6 years ago
Dear BioC,
I am trying to analyze a microarray data set with 4 samples and 54K
genes. None of the samples are replicated. I would like to know
which genes are differentially expressed among the 4 samples.
Here are the details of my data:
> print(affy.object)
AffyBatch object
size of arrays=1164x1164 features (42345 kb)
cdf=HG-U133_Plus_2 (54675 affyids)
number of samples=4
number of genes=54675
annotation=hgu133plus2
I have several questions:
1. I used rma(affy.object) to obtain normalized, background corrected
expression levels. However, rma uses median polish to obtain
expression values. I read in the achieves of this list that median
polish was not a good idea when there are no replicates, and that rlm
would be better. In order to use rlm, one needs an expression set,
which means I have to specify some sort of summary method (using
expresso) or take median polish (rma). If not median polish, which
summary method? I have also read on the list that reviewers of some
journals consider that median polish is the best method, and, if it is
not used, the analysis is suspect (resulting in a reanalysis or a
rejection). So, if I don't use median polish (or rma) how do I
justify the method I use?
2. When I do my.eset <- rma(affy.object), I obtain an object of class
"exprSet", but I think the phenoData attribute is incorrect;
> print(my.eset)
Expression Set (exprSet) with
54675 genes
4 samples
phenoData object with 1 variables and 4 cases
varLabels
sample: arbitrary numbering
> pData(my.eset)
sample
Control 1
Treat1 2
Treat2 3
Treat3 4
It seems to me that the phenoData object should have 4 variables and 1
case. In other words, phenoData sees my treatments as cases (or
patients) and says that I have one variable.
3. Once I have expression levels (and a correct eset), what is the
best way to do non-specific filtering? Clearly, I don't want to test
all 54K genes. I tried the following, suggested in one of the R
working papers:
>library(genefilter)
> f1 <- pOverA(0.25, log2(100))
> f2 <- function(x) (IQR(x) > 0.5)
> ff <- filterfun(f1,f2)
> selected <- genefilter(my.eset,ff)
>sum(selected)
[1] 57
Maybe 57 genes is a great number, but it seems much smaller than the
examples I've seen in papers. Also, one of the AFFX genes is in this
filtered sample. Since they are housekeeping genes, I would think
that they aren't supposed to pass through the filter.
4. Once I have my filtered genes, how can I test for differential
expression among the 4 groups? I've played around with mt.maxT (with
the test="f" option), but I haven't been able to get the proper class
labels.
Thanks for your help and suggestions! Obviously, I am new to this. I
find the work fascinating, but I realize every day that I have so much
more to understand.
Thanks,
Monnie
Monnie McGee, Ph.D.
Assistant Professor
Department of Statistical Science
Southern Methodist University
Ph: 214-768-2462
Fax: 214-768-4035