Almost inexisting overlap of diff. expr. genes found when comparing mas5 / rma

0

Entering edit mode

Naomi Altman ★ 6.0k

@naomi-altman-380

Last seen 3.1 years ago

United States

The problem is considerable. We found the same thing when we followed RMA exactly until the median polish step, and substituted Huber's biweight for median polish. This produces a tiny difference in the expression values, and the same 40-50% overlap in the list. Such are the limitations of the methodology at this point. --Naomi At 04:09 AM 7/9/2005, Adaikalavan Ramasamy wrote: >Yes we often see poor overlaps. A 40 - 50 % overlap is considered >pretty good but rare unless you are considering the top 5 genes >in both list or something silly like that. > >To make a fair comparison, try comparing the lists when they are >both filtered by the same p-value cutoff or statistics rather than >arbitrarily choosing a numbers. > > >Further, two minor cosmetic points about your code > >1) If you look at your design matrix from > > strain = c("WT","WT","WT","Drug","Drug","Drug") > design = model.matrix(~factor(strain)) > colnames(design) = c("WT","Drug") > design > WT Drug >1 1 1 >2 1 1 >3 1 1 >4 1 0 >5 1 0 >6 1 0 > >the first column represents an intercept not WT. To get the >correct interpretation, you need to change the second line to > > design = model.matrix(~ -1 + factor(strain) ) > > >2) You do not need the force the rownames to numeric using >as.numeric() since intersect happily works with characters. > > x <- c("a", "b", "c") > y <- c("b", "c", "d") > intersect(x,y) >[1] "b" "c" > >But I do not think either of these point change your results. > > > > >On Fri, 2005-07-08 at 18:18 +0100, Emmanuel Levy wrote: > > Dear Bioconductor community, > > > > I've been looking for differentially expressed genes in C. elegans after a > > drug treatment. > > There are 3 replicates of each condition and 2 conditions in total (WT and > > Drug) > > I used limma combined with either rma or mas5. I find a very very poor > > overlap in the results: > > > > - example (i) only 24 of the 100 most differentially expressed genes > > obtained using rma are found in > > the 1000 most differentially expressed genes obtained using mas5 > > - example (ii) only 183 genes are common to the lists of the 1000 most > > differentially expressed genes > > found using both methods. > > (see piece of code at the end) > > > > Either > > 1/ I am missing something which I would'nt be surprised of, as my > expertise > > is very limited. > > > > In that case I am sorry for pointing out something irrelevant and thank > you > > in advance for telling > > me what I'm missing, > > > > 2/ The differences in the normalization methods are really at the > origin of > > the observed differences. > > In that case, how can I know which method is the best for my case study? > > Does a helpful paper exists > > which explains in simple words the strengths/weaknesses of each method? > > > > Thank you very much in advance for your help, > > > > Emmanuel > > > > -------------------------------------- CODE > > -------------------------------------- > > library(affy) > > library(limma) > > > > # Load data into Affybatch > > data = ReadAffy(widget=T) > > > > # Background correction / normalization > > eset.rma = rma(data) > > eset.mas = mas5(data) > > > > # Get Expression values > > exp.rma = exprs(eset.rma) > > exp.mas = exprs(eset.mas) > > > > # --- Look for differentially expressed genes using Limma package > > strain = c("WT","WT","WT","Drug","Drug","Drug") > > design = model.matrix(~factor(strain)) > > colnames(design) = c("WT","Drug") > > > > fit.rma = lmFit(eset.rma,design) > > fit.mas = lmFit(eset.mas,design) > > > > fit.rma.2 = eBayes(fit.rma) > > fit.mas.2 = eBayes(fit.mas) > > > > top.rma = as.numeric(rownames(topTable(fit.rma.2,n=1000))) > > top.mas = as.numeric(rownames(topTable(fit.mas.2,n=100))) > > length(intersect(top.rma,top.mas)) > > > [1] 24 > > > > top.rma = as.numeric(rownames(topTable(fit.rma.2,n=100))) > > top.mas = as.numeric(rownames(topTable(fit.mas.2,n=1000))) > > length(intersect(top.rma,top.mas)) > > > [1] 0 > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

Normalization limma Normalization limma • 659 views

ADD COMMENT • link 18.7 years ago Naomi Altman ★ 6.0k

Login before adding your answer.