Entering edit mode
The problem is considerable.
We found the same thing when we followed RMA exactly until the median
polish step, and substituted Huber's biweight for median polish. This
produces a tiny difference in the expression values, and the same
40-50%
overlap in the list.
Such are the limitations of the methodology at this point.
--Naomi
At 04:09 AM 7/9/2005, Adaikalavan Ramasamy wrote:
>Yes we often see poor overlaps. A 40 - 50 % overlap is considered
>pretty good but rare unless you are considering the top 5 genes
>in both list or something silly like that.
>
>To make a fair comparison, try comparing the lists when they are
>both filtered by the same p-value cutoff or statistics rather than
>arbitrarily choosing a numbers.
>
>
>Further, two minor cosmetic points about your code
>
>1) If you look at your design matrix from
>
> strain = c("WT","WT","WT","Drug","Drug","Drug")
> design = model.matrix(~factor(strain))
> colnames(design) = c("WT","Drug")
> design
> WT Drug
>1 1 1
>2 1 1
>3 1 1
>4 1 0
>5 1 0
>6 1 0
>
>the first column represents an intercept not WT. To get the
>correct interpretation, you need to change the second line to
>
> design = model.matrix(~ -1 + factor(strain) )
>
>
>2) You do not need the force the rownames to numeric using
>as.numeric() since intersect happily works with characters.
>
> x <- c("a", "b", "c")
> y <- c("b", "c", "d")
> intersect(x,y)
>[1] "b" "c"
>
>But I do not think either of these point change your results.
>
>
>
>
>On Fri, 2005-07-08 at 18:18 +0100, Emmanuel Levy wrote:
> > Dear Bioconductor community,
> >
> > I've been looking for differentially expressed genes in C. elegans
after a
> > drug treatment.
> > There are 3 replicates of each condition and 2 conditions in total
(WT and
> > Drug)
> > I used limma combined with either rma or mas5. I find a very very
poor
> > overlap in the results:
> >
> > - example (i) only 24 of the 100 most differentially expressed
genes
> > obtained using rma are found in
> > the 1000 most differentially expressed genes obtained using mas5
> > - example (ii) only 183 genes are common to the lists of the 1000
most
> > differentially expressed genes
> > found using both methods.
> > (see piece of code at the end)
> >
> > Either
> > 1/ I am missing something which I would'nt be surprised of, as my
> expertise
> > is very limited.
> >
> > In that case I am sorry for pointing out something irrelevant and
thank
> you
> > in advance for telling
> > me what I'm missing,
> >
> > 2/ The differences in the normalization methods are really at the
> origin of
> > the observed differences.
> > In that case, how can I know which method is the best for my case
study?
> > Does a helpful paper exists
> > which explains in simple words the strengths/weaknesses of each
method?
> >
> > Thank you very much in advance for your help,
> >
> > Emmanuel
> >
> > -------------------------------------- CODE
> > --------------------------------------
> > library(affy)
> > library(limma)
> >
> > # Load data into Affybatch
> > data = ReadAffy(widget=T)
> >
> > # Background correction / normalization
> > eset.rma = rma(data)
> > eset.mas = mas5(data)
> >
> > # Get Expression values
> > exp.rma = exprs(eset.rma)
> > exp.mas = exprs(eset.mas)
> >
> > # --- Look for differentially expressed genes using Limma package
> > strain = c("WT","WT","WT","Drug","Drug","Drug")
> > design = model.matrix(~factor(strain))
> > colnames(design) = c("WT","Drug")
> >
> > fit.rma = lmFit(eset.rma,design)
> > fit.mas = lmFit(eset.mas,design)
> >
> > fit.rma.2 = eBayes(fit.rma)
> > fit.mas.2 = eBayes(fit.mas)
> >
> > top.rma = as.numeric(rownames(topTable(fit.rma.2,n=1000)))
> > top.mas = as.numeric(rownames(topTable(fit.mas.2,n=100)))
> > length(intersect(top.rma,top.mas))
> > > [1] 24
> >
> > top.rma = as.numeric(rownames(topTable(fit.rma.2,n=100)))
> > top.mas = as.numeric(rownames(topTable(fit.mas.2,n=1000)))
> > length(intersect(top.rma,top.mas))
> > > [1] 0
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> >
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348
(Statistics)
University Park, PA 16802-2111