RMA vs VSN

0

Entering edit mode

Roger Vallejo ▴ 120

@roger-vallejo-535

Last seen 10.8 years ago

We have a small experiment with high FDR (around 0.40): 8 affymetrix mouse genechips with 22k genes, 2 replications, saline and E. coli treated mammary tissue, evaluated at 24 hr and 48 hr post injections. I have run both data preprocessing functions via expresso. To subsequenctly run an lme-ANOVA. As expected, I got lower FDR and much smaller p-values when using VSN. The FDR was estimated using QVALUE package. Obviously, I feel tempted to use VSN instead of RMA. However, before proceeding I would like to hear some comments from the Bioconductor group on this approach. The question is: Is VSN better than RMA? I have read the literature and both claim to be the function to be used! Personally, I feel more towards the use of VSN. I might be wrong, so I would appreciate any suggestions or comments on this. These are the functions that I used: ************************************************************* For RMA: > library(affy) > Data <- ReadAffy(widget=TRUE) > eset <- expresso(Data,bgcorrect.method="rma", normalize.method="quantiles", pmcorrect.method="pmonly", summary.method="medianpolish") ******************************************************************** For VSN: > library(affy) > Data <- ReadAffy(widget=TRUE) > library(vsn) > normalize.AffyBatch.methods <- c(normalize.AffyBatch.methods, "vsn") > eset = expresso(Data, bg.correct= FALSE, normalize.method = "vsn", pmcorrect.method = "pmonly", summary.method = "medianpolish") ********************************************************************** ** ************************************ Thank you very much. Roger Roger L. Vallejo, Ph.D. Assist. Professor of Genomics & Bioinformatics Genomics & Bioinformatics Laboratory Department of Dairy & Animal Science The Pennsylvania State University 305 Henning Building University Park, PA 16802 Phone: (814) 865-1846 Email: rvallejo@psu.edu [[alternative HTML version deleted]]

Preprocessing vsn Preprocessing vsn • 2.3k views

ADD COMMENT • link 21.1 years ago Roger Vallejo ▴ 120

0

Entering edit mode

Rafael A. Irizarry ★ 2.3k

@rafael-a-irizarry-205

Last seen 10.8 years ago

vsn and rma are not competitors. the first is a normalization technique, the second is a way to obtain expression measures from affy arrays which includes background adjustment, normaliztion, and summarization. rma uses quantile normalization as a default. changing this to vsn yields, in general, very similar results. notice, some use rma to obtain an expression measure and then use vsn to nromalize that, although i worry this could result in over-normalization. On Sat, 19 Jun 2004, Roger Vallejo wrote: > We have a small experiment with high FDR (around 0.40): 8 affymetrix > mouse genechips with 22k genes, 2 replications, saline and E. coli > treated mammary tissue, evaluated at 24 hr and 48 hr post injections. > > I have run both data preprocessing functions via expresso. To > subsequenctly run an lme-ANOVA. As expected, I got lower FDR and much > smaller p-values when using VSN. The FDR was estimated using QVALUE > package. Obviously, I feel tempted to use VSN instead of RMA. However, > before proceeding I would like to hear some comments from the > Bioconductor group on this approach. The question is: > > Is VSN better than RMA? > > I have read the literature and both claim to be the function to be used! > > > Personally, I feel more towards the use of VSN. I might be wrong, so I > would appreciate any suggestions or comments on this. > > These are the functions that I used: > > ************************************************************* > > For RMA: > > > library(affy) > > > Data <- ReadAffy(widget=TRUE) > > > > eset <- expresso(Data,bgcorrect.method="rma", > normalize.method="quantiles", pmcorrect.method="pmonly", > summary.method="medianpolish") > > > > ******************************************************************** > > For VSN: > > > library(affy) > > > Data <- ReadAffy(widget=TRUE) > > > library(vsn) > > > normalize.AffyBatch.methods <- c(normalize.AffyBatch.methods, "vsn") > > > eset = expresso(Data, bg.correct= FALSE, normalize.method = "vsn", > pmcorrect.method = "pmonly", summary.method = "medianpolish") > > > > ******************************************************************** **** > ************************************ > > Thank you very much. > > Roger > > > > > > Roger L. Vallejo, Ph.D. > > Assist. Professor of Genomics & Bioinformatics > > Genomics & Bioinformatics Laboratory > > Department of Dairy & Animal Science > > The Pennsylvania State University > > 305 Henning Building > > University Park, PA 16802 > > Phone: (814) 865-1846 > > Email: rvallejo@psu.edu > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >

ADD COMMENT • link 21.1 years ago Rafael A. Irizarry ★ 2.3k

0

Entering edit mode

Roger Vallejo ▴ 120

@roger-vallejo-535

Last seen 10.8 years ago

This must be of interest for those preprocessing data from affymetrix chips. We have compared RMA vs VSN performing an lme-ANOVA. If you are wondering what to use RMA or VSN? or what are the potential pitfalls or benefits from using either normalization or background data correction approach. Then, please read below and make your own conclusions. Thank you to the enlightening discussion followed up with colleagues at the Bioconductor group. Roger Roger L. Vallejo, Ph.D. Assist. Professor of Genomics & Bioinformatics Genomics & Bioinformatics Laboratory Department of Dairy & Animal Science The Pennsylvania State University 305 Henning Building University Park, PA 16802 Phone: (814) 865-1846 Email: rvallejo@psu.edu -----Original Message----- From: Rafael Irizarry [mailto:ririzarr@jhsph.edu] Sent: Monday, June 21, 2004 2:12 PM To: Roger Vallejo Subject: Re: [BioC] RMA vs VSN you should consider posting something on the bioc list. i think this may help many others. On Jun 21, 2004, at 11:35 AM, Roger Vallejo wrote: > Dear Rafael, > I am glad that I asked this question on RMA and VSN. Your comments > below > are true. I have quickly checked outputs from LME-ANOVA using data > preprocessed separately with RMA and VSN. Indeed several interesting > genes detected with RMA as significant ones are not detected or missed > with VSN. Also generally the P-values for those significant genes are > more striking when using RMA than VSN. God knows what else I might be > missing by using VSN, because I am just checking for those genes that > we > know are related to immune and inflammatory events. So the rate of > false undiscoveries is increased with vsn at expense of slightly lower > FDR. I would rather maximize gene discovery at slightly higher and > acceptable FDR. > Thanks for the excellent point! > Roger > > > Roger L. Vallejo, Ph.D. > Assist. Professor of Genomics & Bioinformatics > Genomics & Bioinformatics Laboratory > Department of Dairy & Animal Science > The Pennsylvania State University > 305 Henning Building > University Park, PA 16802 > Phone: (814) 865-1846 > Email: rvallejo@psu.edu > > -----Original Message----- > From: Rafael A. Irizarry [mailto:ririzarr@jhsph.edu] > Sent: Saturday, June 19, 2004 3:58 PM > To: Roger Vallejo > Cc: rafa@jhu.edu > Subject: RE: [BioC] RMA vs VSN > > i believe the difference does not come from the vsn but from > background=FALSE. try > > eset <- expresso(Data,bg.correct=FALSE, > normalize.method="quantiles", pmcorrect.method="pmonly", > summary.method="medianpolish") > > i suspect you will get similar results. > > when you do not bg correct the variance level for low expressed genes > is > much smaller. but also the estimates of fold change get attenuated. > false > discoveries are lower but false "undiscoveries" increase. > > On Sat, 19 > Jun 2004, Roger Vallejo wrote: > >> Dear Rafael, >> Thank you very much for your comments. >> Our results are somewhat different for VSN vs. RMA. If they were > similar >> likely I could have kept using RMA because it is part of our standard >> array data preprocessing functions. The p-values are smaller and > thereby >> the PER and FDR are slightly more acceptable (although not much) when >> using p-values from VSN normalization and lme-anova. I would like to >> make sure that if deciding to use VSN in the way that I indicated >> (please see the functions below), I am not over-normalizing my data as >> you indicated and most important that I am using a data normalization >> fucntion that is as good as RMA. Thanks for your comments. >> Roger >> >> Roger L. Vallejo, Ph.D. >> Assist. Professor of Genomics & Bioinformatics >> Genomics & Bioinformatics Laboratory >> Department of Dairy & Animal Science >> The Pennsylvania State University >> 305 Henning Building >> University Park, PA 16802 >> Phone: (814) 865-1846 >> Email: rvallejo@psu.edu >> >> -----Original Message----- >> From: Rafael A. Irizarry [mailto:ririzarr@jhsph.edu] >> Sent: Saturday, June 19, 2004 2:05 PM >> To: Roger Vallejo >> Cc: bioconductor@stat.math.ethz.ch >> Subject: Re: [BioC] RMA vs VSN >> >> vsn and rma are not competitors. the first is a normalization >> technique, the second is a way to obtain expression measures from affy >> arrays which includes background adjustment, normaliztion, and >> summarization. rma uses quantile normalization as a default. >> changing this to vsn yields, in general, very similar results. >> >> notice, some use rma to obtain an expression measure and then >> use vsn to nromalize that, although i worry this could result in >> over-normalization. >> >> On Sat, 19 Jun 2004, Roger Vallejo wrote: >> >>> We have a small experiment with high FDR (around 0.40): 8 affymetrix >>> mouse genechips with 22k genes, 2 replications, saline and E. coli >>> treated mammary tissue, evaluated at 24 hr and 48 hr post > injections. >>> >>> I have run both data preprocessing functions via expresso. To >>> subsequenctly run an lme-ANOVA. As expected, I got lower FDR and > much >>> smaller p-values when using VSN. The FDR was estimated using QVALUE >>> package. Obviously, I feel tempted to use VSN instead of RMA. > However, >>> before proceeding I would like to hear some comments from the >>> Bioconductor group on this approach. The question is: >>> >>> Is VSN better than RMA? >>> >>> I have read the literature and both claim to be the function to be >> used! >>> >>> >>> Personally, I feel more towards the use of VSN. I might be wrong, so > I >>> would appreciate any suggestions or comments on this. >>> >>> These are the functions that I used: >>> >>> ************************************************************* >>> >>> For RMA: >>> >>>> library(affy) >>> >>>> Data <- ReadAffy(widget=TRUE) >>> >>> >>>> eset <- expresso(Data,bgcorrect.method="rma", >>> normalize.method="quantiles", pmcorrect.method="pmonly", >>> summary.method="medianpolish") >>> >>> >>> >>> ******************************************************************** >>> >>> For VSN: >>> >>>> library(affy) >>> >>>> Data <- ReadAffy(widget=TRUE) >> >>> >>>> library(vsn) >>> >>>> normalize.AffyBatch.methods <- c(normalize.AffyBatch.methods, > "vsn") >>> >>>> eset = expresso(Data, bg.correct= FALSE, normalize.method = "vsn", >>> pmcorrect.method = "pmonly", summary.method = "medianpolish") >>> >>> >>> >>> >> > ********************************************************************** * > * >>> ************************************ >>> >>> Thank you very much. >>> >>> Roger >>> >>> >>> >>> >>> >>> Roger L. Vallejo, Ph.D. >>> >>> Assist. Professor of Genomics & Bioinformatics >>> >>> Genomics & Bioinformatics Laboratory >>> >>> Department of Dairy & Animal Science >>> >>> The Pennsylvania State University >>> >>> 305 Henning Building >>> >>> University Park, PA 16802 >>> >>> Phone: (814) 865-1846 >>> >>> Email: rvallejo@psu.edu >>> >>> >>> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@stat.math.ethz.ch >>> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >>> >> >> > [[alternative HTML version deleted]]

ADD COMMENT • link 21.1 years ago Roger Vallejo ▴ 120

0

Entering edit mode

On Monday 21 June 2004 21:08, Roger Vallejo wrote: > This must be of interest for those preprocessing data from affymetrix > chips. We have compared RMA vs VSN performing an lme-ANOVA. If you are > wondering what to use RMA or VSN? or what are the potential pitfalls or > benefits from using either normalization or background data correction > approach. Then, please read below and make your own conclusions. > > > > Thank you to the enlightening discussion followed up with colleagues at > the Bioconductor group. > I'd like to throw another observation with request for comments into this round. Our group has been using affymetrix murine chips with pooled samples but no technical replicates as a way of identifying candidate genes for further characterization by RT-PCR or in situ hybridization and other techniques. We have used a simple fold-change threshold between samples taken from different developmental stages or between wt and ko models to identify candidates. Then, we are able to confirm about 80% of genes predicted following mas5 analysis (original or bioconductor is very similar). However, rma analysis has produced lists of genes that are much shorter and in general do not correspond well to the confirmed lists of genes produced by mas5 (or to our biological prejudices as to what genes should be observed). Have others notices similar discrepancies between mas5 and rma? Are there perhaps other issues I have overlooked? Thanks Peter

ADD REPLY • link 21.1 years ago peter robinson ▴ 300

0

Entering edit mode

I believe this is due to the fact that you have used Fold Change to filter your gene list. Try filtering your genes by t-test or SAM and see how the two lists compare. In the last paragraph of the Results section of Irizarry et al, 2003 (Pubmed ID : 12582260), the authors mention that RMA compressed the Fold Change estimates by 10-20%. This could be due to the quantile normalisation. But in reality I often find 50-60% compression and am wondering why is this myself. If anyone could shed light into this area, it would be much appreciated. On Mon, 2004-06-21 at 11:39, peter robinson wrote: <snip> > I'd like to throw another observation with request for comments into this > round. Our group has been using affymetrix murine chips with pooled samples > but no technical replicates as a way of identifying candidate genes for > further characterization by RT-PCR or in situ hybridization and other > techniques. We have used a simple fold-change threshold between samples taken > from different developmental stages or between wt and ko models to identify > candidates. > > Then, we are able to confirm about 80% of genes predicted following mas5 > analysis (original or bioconductor is very similar). > However, rma analysis has produced lists of genes that are much shorter and in > general do not correspond well to the confirmed lists of genes produced by > mas5 (or to our biological prejudices as to what genes should be observed). > > Have others notices similar discrepancies between mas5 and rma? Are there > perhaps other issues I have overlooked? > > Thanks > > Peter

ADD REPLY • link 21.1 years ago Adaikalavan Ramasamy ★ 1.8k

0

Entering edit mode

rma sacrifices accuracy for its gains in precission. this has two consequences: 1- for a fixed false positive rate, RMA tends to have many more true positives 2- to attain the same true positive rate as with mas 5.0 you need to lower the fold-change cut-off (which is arbitrary anyway). e.g. if you are using fold change>2 as your cut-off and RMA gives lists that you think are too small, then use fold change > 1.5. this has more to do with background correction than normalization. it is discussed in some detail in the gcrma paper appearing in JASA soon; http://www.bepress.com/jhubiostat/paper1 On Jun 21, 2004, at 5:35 PM, Adaikalavan Ramasamy wrote: > I believe this is due to the fact that you have used Fold Change to > filter your gene list. Try filtering your genes by t-test or SAM and > see > how the two lists compare. > > In the last paragraph of the Results section of Irizarry et al, 2003 > (Pubmed ID : 12582260), the authors mention that RMA compressed the > Fold > Change estimates by 10-20%. This could be due to the quantile > normalisation. > > But in reality I often find 50-60% compression and am wondering why is > this myself. If anyone could shed light into this area, it would be > much > appreciated. > > > On Mon, 2004-06-21 at 11:39, peter robinson wrote: > <snip> >> I'd like to throw another observation with request for comments into >> this >> round. Our group has been using affymetrix murine chips with pooled >> samples >> but no technical replicates as a way of identifying candidate genes >> for >> further characterization by RT-PCR or in situ hybridization and other >> techniques. We have used a simple fold-change threshold between >> samples taken >> from different developmental stages or between wt and ko models to >> identify >> candidates. >> >> Then, we are able to confirm about 80% of genes predicted following >> mas5 >> analysis (original or bioconductor is very similar). >> However, rma analysis has produced lists of genes that are much >> shorter and in >> general do not correspond well to the confirmed lists of genes >> produced by >> mas5 (or to our biological prejudices as to what genes should be >> observed). >> >> Have others notices similar discrepancies between mas5 and rma? Are >> there >> perhaps other issues I have overlooked? >> >> Thanks >> >> Peter > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor

ADD REPLY • link 21.1 years ago Rafael A. Irizarry ★ 2.3k

0

Entering edit mode

If there are no replicates, t-tests and SAM cannot be used. --Naomit At 11:35 PM 6/21/2004 +0100, Adaikalavan Ramasamy wrote: >I believe this is due to the fact that you have used Fold Change to >filter your gene list. Try filtering your genes by t-test or SAM and see >how the two lists compare. > >In the last paragraph of the Results section of Irizarry et al, 2003 >(Pubmed ID : 12582260), the authors mention that RMA compressed the Fold >Change estimates by 10-20%. This could be due to the quantile >normalisation. > >But in reality I often find 50-60% compression and am wondering why is >this myself. If anyone could shed light into this area, it would be much >appreciated. > > >On Mon, 2004-06-21 at 11:39, peter robinson wrote: ><snip> > > I'd like to throw another observation with request for comments into this > > round. Our group has been using affymetrix murine chips with pooled > samples > > but no technical replicates as a way of identifying candidate genes for > > further characterization by RT-PCR or in situ hybridization and other > > techniques. We have used a simple fold-change threshold between samples > taken > > from different developmental stages or between wt and ko models to > identify > > candidates. > > > > Then, we are able to confirm about 80% of genes predicted following mas5 > > analysis (original or bioconductor is very similar). > > However, rma analysis has produced lists of genes that are much shorter > and in > > general do not correspond well to the confirmed lists of genes produced by > > mas5 (or to our biological prejudices as to what genes should be observed). > > > > Have others notices similar discrepancies between mas5 and rma? Are there > > perhaps other issues I have overlooked? > > > > Thanks > > > > Peter > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD REPLY • link 21.1 years ago Naomi Altman ★ 6.0k

Login before adding your answer.