RMA + loess normalisation and filtering

0

Entering edit mode

Katleen De Preter ▴ 130

@katleen-de-preter-1070

Last seen 8.8 years ago

Belgium

Dear BioC-users, question 1: I have performed *RMA normalisation *of my Affymetrix data. However, for further analysis I think it is necessary to *filter* the data (non-expressed genes or below background). However I don't know the best way to filter the genes that are not expressed or very low expressed (below the background), based on the RMA normalisation data. question 2: In a paper of Choe et al (2005, Genome Biology) I have read that *loess normalisation *after the first normalisation step is important in order to detect most true positive differentially expressed genes. However when I perform />normdatabis<-normalize.exprSet.loess(RMAdata,transfn="antilog")/ following warnings appear: /k-d tree limited by memory ncmax=5002/ I guess that the loess normalization was only based on the 5002 first probe set id's or what does this mean? Is it ok or do I need to follow another strategy for the second loess normalisation step? Best regards, katleen de preter -- *dr. ir. Katleen De Preter* Center for Medical Genetics Ghent (CMGG) Ghent University Hospital Medical Research Building (MRB), 2nd floor, room 120.038 De Pintelaan 185, B-9000 Ghent, Belgium +32 9 240 5533 (phone) | +32 9 240 6549 (fax) http://medgen.ugent.be Katleen.DePreter@UGent.be <mailto:katleen.depreter@ugent.be> [[alternative HTML version deleted]]

Genetics Normalization Genetics Normalization • 1.7k views

ADD COMMENT • link updated 19.0 years ago by Wolfgang Huber ★ 13k • written 19.0 years ago by Katleen De Preter ▴ 130

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 10 days ago

EMBL European Molecular Biology Laborat…

Hi Katleen, > question 1: I have performed *RMA normalisation *of my Affymetrix data. > However, for further analysis I think it is necessary to *filter* the > data (non-expressed genes or below background). However I don't know the > best way to filter the genes that are not expressed or very low > expressed (below the background), based on the RMA normalisation data. My preference is to select genes based on their overall variability, using a criterions such as z = apply(exprs(x), 1, IQR) (see als rowQ from Biobase-devel, or rowSds from the vsn package). The rationale is that it is difficult to decide on an absolute number that corresponds to "present" or "absent" (e.g. due to different AT- content), but if the values vary across the experiment there is some hope this is really detecting a transcript. I have no good suggestion on deciding a threshold though - I'd usually take the top 50% or alike, depending on chip type, and how the histogram of "z" looks. > question 2: In a paper of Choe et al (2005, Genome Biology) I have read > that *loess normalisation *after the first normalisation step is > important in order to detect most true positive differentially expressed > genes. However when I perform > />normdatabis<-normalize.exprSet.loess(RMAdata,transfn="antilog")/ > following warnings appear: /k-d tree limited by memory ncmax=5002/ > I guess that the loess normalization was only based on the 5002 first > probe set id's or what does this mean? > Is it ok or do I need to follow another strategy for the second loess > normalisation step? I don't think combining multiple normalization steps in this way is appropriate. RMA is a model-based normalization method and the results from it should be fine as is. It they aren't, then the model does not fit -- which means that either you have a data quality problem or you shouldn't use RMA in the first place. Also, with so much normalization you are likely not just to remove technical variations but also biological signal, hence, to find *less* differentially expresse genes. Best regards Wolfgang ------------------------------------- Wolfgang Huber European Bioinformatics Institute European Molecular Biology Laboratory Cambridge CB10 1SD England Phone: +44 1223 494642 Fax: +44 1223 494486 Http: www.ebi.ac.uk/huber

ADD COMMENT • link 19.0 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Hi again, >> question 1: I have performed *RMA normalisation *of my Affymetrix >> data. However, for further analysis I think it is necessary to >> *filter* the data (non-expressed genes or below background). However I >> don't know the best way to filter the genes that are not expressed or >> very low expressed (below the background), based on the RMA >> normalisation data. > > > My preference is to select genes based on their overall variability, > using a criterions such as > > z = apply(exprs(x), 1, IQR) Just remembered - a discussion (with data) on whether to select on mean level or variability can be found here: http://www.bepress.com/bioconductor/paper7/ There are also additional ideas on pre-filtering, in order to alleviate the loss of power from multiple testing. -- Best regards Wolfgang ------------------------------------- Wolfgang Huber European Bioinformatics Institute European Molecular Biology Laboratory Cambridge CB10 1SD England Phone: +44 1223 494642 Fax: +44 1223 494486 Http: www.ebi.ac.uk/huber

ADD REPLY • link 19.0 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

This topic keeps coming up. Any references supported by data that show some method is preferable? Wolfgang, how many replicates do you think are needed for your method to be reasonable? (I've gotten burned by reviewers on this one.) --Naomi At 12:01 PM 4/19/2005, Wolfgang Huber wrote: >Hi again, > > >>>question 1: I have performed *RMA normalisation *of my Affymetrix data. >>>However, for further analysis I think it is necessary to *filter* the >>>data (non-expressed genes or below background). However I don't know the >>>best way to filter the genes that are not expressed or very low >>>expressed (below the background), based on the RMA normalisation data. >> >>My preference is to select genes based on their overall variability, >>using a criterions such as >> z = apply(exprs(x), 1, IQR) > > > >Just remembered - a discussion (with data) on whether to select on mean >level or variability can be found here: > >http://www.bepress.com/bioconductor/paper7/ > >There are also additional ideas on pre-filtering, in order to alleviate >the loss of power from multiple testing. > >-- >Best regards > Wolfgang > >------------------------------------- >Wolfgang Huber >European Bioinformatics Institute >European Molecular Biology Laboratory >Cambridge CB10 1SD >England >Phone: +44 1223 494642 >Fax: +44 1223 494486 >Http: www.ebi.ac.uk/huber > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD REPLY • link 19.0 years ago Naomi Altman ★ 6.0k

Login before adding your answer.