problem with expresso()

0

Entering edit mode

Oliver Hartmann ▴ 70

@oliver-hartmann-141

Last seen 9.6 years ago

Dear lsit memners, I am trying to find a way of normalzing affy chips with vsn (I found a data set where rma() doesn't do well together with the t-statistic and I was hopeing that vsn() could fix that). I used the following script: data <- ReadAffy() normalize.AffyBatch.methods <- c(normalize.AffyBatch.methods, "vsn") es = expresso(data, pmcorrect.method = "pmonly", bgcorrect.method = "none", normalize.method = "vsn", summary.method = "medianpolish") With this, identifying differentially expressed genes works fine (results are very similar to rma() - see my tech report for details if you like). But there seems to be one problem: the intensities and the values \delta h for differential expression (equivalent to the difference between the log-ratios if using rma()) are both on the wrong scale. Well, as rma() and other methods use log-transformed data, but vsn() uses a different tranformation, I think using expresso() to calculat vsn-normalized measures seems to log- AND arcsin-transform the data. Is there a way around that? From the description I didn't find a way around log-transformation nor where exactly the log-transformation was taking place. If you are interested in the comparission of the performance of rma(), vsn() and MAS() tested on affymetrix data with spike in genes you can find a tech report at http://staff-www.uni-marburg.de/~hartmann/ - but only very preliminary work, sorry. Thanks a lot -oliver hartmann- -- Oliver Hartmann, Institute of Medical Biometry and Epidemiology Philipps-University Marburg, Bunsenstr. 3, D-35037 Marburg phone +49(0)6421 28 66514, fax +49(0)6421 28 68921

affy vsn affy vsn • 1.1k views

ADD COMMENT • link updated 21.3 years ago by Wolfgang Huber ★ 13k • written 21.3 years ago by Oliver Hartmann ▴ 70

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 18 days ago

EMBL European Molecular Biology Laborat…

Hi, Oliver and I discussed this offline last Friday. The reason for the confusion seems to be that the summary method "medianpolish" takes the logarithm of the data, while, for example, "avdiff" does not. However, the normalization and data transformation method "vsn" also implies a data transformation that is like the logarithm. Thus, a call like normalize.AffyBatch.methods <- c(normalize.AffyBatch.methods, "vsn") es = expresso(data, pmcorrect.method = "pmonly", bgcorrect.method = "none", normalize.method = "vsn", summary.method = "medianpolish") will effectively take the logarithm of the intensities TWICE. The same call with summary.method = "avdiff" would, however, produce the right result. Not sure how to best resolve this? I could "re-exponentiate" the data returned by "vsn" in normalize.AffyBatch.vsn, such that the subsequent log-transformation done in the summary.method would produce consistent results. However, here is a question regarding the general architecture of the affy package: where is the right place to take the log-transformation? In the "normalization"? In the "summary.method"? As an extra module? (Since some people, including myself, may argue that log-transformation is not the only thing one can do with microarray data?) Opinions? Best regards Wolfgang Division of Molecular Genome Analysis (Poustka Lab) German Cancer Research Center (DKFZ) Im Neuenheimer Feld 580 69120 Heidelberg, Germany w.huber@dkfz.de http://www.dkfz.de/abt0840/whuber Tel +49-6221-424709 Fax +49-6221-42524709 -----Original Message----- From: bioconductor-admin@stat.math.ethz.ch [mailto:bioconductor-admin@stat.math.ethz.ch]On Behalf Of Oliver Hartmann Sent: Thursday, January 09, 2003 2:47 PM To: bioconductor Subject: [BioC] problem with expresso() Dear lsit memners, I am trying to find a way of normalzing affy chips with vsn (I found a data set where rma() doesn't do well together with the t-statistic and I was hopeing that vsn() could fix that). I used the following script: data <- ReadAffy() With this, identifying differentially expressed genes works fine (results are very similar to rma() - see my tech report for details if you like). But there seems to be one problem: the intensities and the values \delta h for differential expression (equivalent to the difference between the log-ratios if using rma()) are both on the wrong scale. Well, as rma() and other methods use log-transformed data, but vsn() uses a different tranformation, I think using expresso() to calculat vsn-normalized measures seems to log- AND arcsin-transform the data. Is there a way around that? From the description I didn't find a way around log-transformation nor where exactly the log-transformation was taking place. If you are interested in the comparission of the performance of rma(), vsn() and MAS() tested on affymetrix data with spike in genes you can find a tech report at http://staff-www.uni-marburg.de/~hartmann/ - but only very preliminary work, sorry. Thanks a lot -oliver hartmann- -- Oliver Hartmann, Institute of Medical Biometry and Epidemiology Philipps-University Marburg, Bunsenstr. 3, D-35037 Marburg phone +49(0)6421 28 66514, fax +49(0)6421 28 68921 _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch http://www.stat.math.ethz.ch/mailman/listinfo/bioconductor

ADD COMMENT • link 21.3 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

On Tue, Jan 14, 2003 at 01:29:30PM +0100, Wolfgang Huber wrote: > Hi, > > Oliver and I discussed this offline last Friday. The reason for the > confusion seems to be that the summary method "medianpolish" takes the > logarithm of the data, while, for example, "avdiff" does not. However, the > normalization and data transformation method "vsn" also implies a data > transformation that is like the logarithm. Thus, a call like > > normalize.AffyBatch.methods <- c(normalize.AffyBatch.methods, "vsn") > es = expresso(data, > pmcorrect.method = "pmonly", > bgcorrect.method = "none", > normalize.method = "vsn", > summary.method = "medianpolish") > > will effectively take the logarithm of the intensities TWICE. The same call > with summary.method = "avdiff" would, however, produce the right result. > Not sure how to best resolve this? I could "re-exponentiate" the data > returned by "vsn" in normalize.AffyBatch.vsn, such that the subsequent > log-transformation done in the summary.method would produce consistent > results. It would appear to be the right to proceed on my side (see below). > However, here is a question regarding the general architecture of the affy > package: where is the right place to take the log-transformation? In the > "normalization"? In the "summary.method"? As an extra module? (Since some > people, including myself, may argue that log-transformation is not the only > thing one can do with microarray data?) This is an interesting question. Some people may even argue for a transformation to be done once the expression values are obtained (i.e. once the exprSet object is obtained). Here is a suggestion: - "intermediate" processing steps must return data on the same scale than they received them - add two paramaters to functions like "normalize", "computeExpr" : 'transfo' (and 'untransfo') to specify a transformation to apply before proceeding (and the inverse of the transformation). This would let one toy with alternatives to log transforming... (one might also think about a collection of 'transfo and untransfo' included in the package) Would this appear satisfactory/reasonable ? L. > > Opinions? > > Best regards > Wolfgang > > Division of Molecular Genome Analysis (Poustka Lab) > German Cancer Research Center (DKFZ) > Im Neuenheimer Feld 580 > 69120 Heidelberg, Germany > > w.huber@dkfz.de > http://www.dkfz.de/abt0840/whuber > Tel +49-6221-424709 > Fax +49-6221-42524709 > > > -----Original Message----- > From: bioconductor-admin@stat.math.ethz.ch > [mailto:bioconductor-admin@stat.math.ethz.ch]On Behalf Of Oliver > Hartmann > Sent: Thursday, January 09, 2003 2:47 PM > To: bioconductor > Subject: [BioC] problem with expresso() > > > Dear lsit memners, > > I am trying to find a way of normalzing affy chips with vsn (I found a > data set where rma() doesn't do well together with the t-statistic and I > was hopeing that vsn() could fix that). I used the following script: > > data <- ReadAffy() > With this, identifying differentially expressed genes works fine > (results are very similar to rma() - see my tech report for details if > you like). > But there seems to be one problem: the intensities and the values \delta > h for differential expression (equivalent to the difference between the > log-ratios if using rma()) are both on the wrong scale. Well, as rma() > and other methods use log-transformed data, but vsn() uses a different > tranformation, I think using expresso() to calculat vsn-normalized > measures seems to log- AND arcsin-transform the data. Is there a way > around that? From the description I didn't find a way around > log-transformation nor where exactly the log-transformation was taking > place. > > If you are interested in the comparission of the performance of rma(), > vsn() and MAS() tested on affymetrix data with spike in genes you can > find a tech report at http://staff-www.uni-marburg.de/~hartmann/ - but > only very preliminary work, sorry. > > Thanks a lot > > -oliver hartmann- > > -- > Oliver Hartmann, Institute of Medical Biometry and Epidemiology > Philipps-University Marburg, Bunsenstr. 3, D-35037 Marburg > phone +49(0)6421 28 66514, fax +49(0)6421 28 68921 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > http://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > http://www.stat.math.ethz.ch/mailman/listinfo/bioconductor -- -------------------------------------------------------------- currently at the National Yang-Ming University in Taipei, Taiwan -------------------------------------------------------------- Laurent Gautier CBS, Building 208, DTU PhD. Student DK-2800 Lyngby,Denmark tel: +45 45 25 24 89 http://www.cbs.dtu.dk/laurent

ADD REPLY • link 21.3 years ago Laurent Gautier ★ 2.3k

0

Entering edit mode

Hi Laurent: > Here is a suggestion: > 1) "intermediate" processing steps must return data on the same > scale than they received them > 2) add two paramaters to functions like "normalize", "computeExpr": > 'transfo' (and 'untransfo') to specify a transformation to apply before > Would this appear satisfactory/reasonable ? The combinatorics of all those different method could become quite overwhelming. And that means also: potentially prone to bugs or user mistakes, and inefficient (computation time, memory). To be able to combine the different methods freely is extremely useful for people working on method comparisons, but is this really the main goal of the affy package? I still do not fully understand why there are both express() and expresso() methods, and in addition there is now also a standalone implementation of RMA in C. But could it be that this reflects the limitations of the combinatorial approach? Another approach that I'd suggest is to expect people that want to plug together all sorts of different background adjustment, normalization, transformation and probeset-summary methods to do so on their own responsibility. And for everyone else, you (we) can offer a small number of functions like rma(), express(o) with limited options, that we have found to make sense. What do you think? Best regards Wolfgang

ADD REPLY • link 21.3 years ago Wolfgang Huber ★ 13k

Login before adding your answer.