frma normalization and batch effects

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.6 years ago

Dear all, I am working on expression classifiers for leukemic subtypes using Affymetrix Plus2 arrays. The training data consists of several batches. The developed classifier will be used to predict the subtype of new sets of samples as well as single samples. So far, I co- normalized new arrays with the training set, but this is not ideal. I have read the frma paper by McCall et al, and it seems the perfect solutions. Before I start, I have a few conceptual questions: 1. The training data consists of several batches of different sizes, some of them biased towards a single subtype. Does normalization per batch using summarize=???random_effect??? remove biology in this case? ComBat clearly did, and I ended up not correcting for batch effect, which worked fine for the classifiers I am using. Any suggestion which summarization would be best to use in this case? 2. Is there a minimum of arrays to use with summarize=???random_effect???? Any suggestions on how to best implement frma in this project are very welcome! Cheers, Judith -- output of sessionInfo(): R version 2.15.2 (2012-10-26) Platform: i386-w64-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base -- Sent via the guest posting facility at bioconductor.org.

Normalization frma Normalization frma • 1.6k views

ADD COMMENT • link updated 10.7 years ago by Wolfgang Huber ★ 13k • written 10.7 years ago by Guest User ★ 13k

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 17 days ago

EMBL European Molecular Biology Laborat…

Hi Judith I am sure the frma people will have more specific recommendations, but in addition, both your questions below could be interpreted as questions of parameter choice in a (somewhat complex, since it includes the preprocessing and batch adjustment) classifier. An often useful way of making such choices is by cross-validation on a dataset that mimics the kind of data you expect to see in the future. I guess you might also enjoy Jeff Leek's recent talk: http://www.birs.ca/events/2013/5-day- workshops/13w5083/videos/watch/201308151110-Leek.mp4 with frozen sva, and top scoring pairs Best wishes Wolfgang On 23 Aug 2013, at 10:55, Judith Boer [guest] <guest at="" bioconductor.org=""> wrote: > > Dear all, > > I am working on expression classifiers for leukemic subtypes using Affymetrix Plus2 arrays. The training data consists of several batches. The developed classifier will be used to predict the subtype of new sets of samples as well as single samples. So far, I co- normalized new arrays with the training set, but this is not ideal. > > I have read the frma paper by McCall et al, and it seems the perfect solutions. Before I start, I have a few conceptual questions: > > 1. The training data consists of several batches of different sizes, some of them biased towards a single subtype. Does normalization per batch using summarize=???random_effect??? remove biology in this case? ComBat clearly did, and I ended up not correcting for batch effect, which worked fine for the classifiers I am using. Any suggestion which summarization would be best to use in this case? > > 2. Is there a minimum of arrays to use with summarize=???random_effect???? > > Any suggestions on how to best implement frma in this project are very welcome! > > Cheers, Judith > > > -- output of sessionInfo(): > > R version 2.15.2 (2012-10-26) > Platform: i386-w64-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 > [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C > [5] LC_TIME=English_United Kingdom.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 10.7 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Judith, You're probably fine using the default frma summarization unless your data are in some way atypical. The random effect summarization just allows the probe effects in your data to differ slightly from the global (frozen) probe effects. Also if you're going to have singletons for which you want to predict in the future, then the default summarization is definitely the way to go. Depending on how big your dataset is going to be, you might consider creating your own custom frma implementation using the frmaTools package. Finally frma addresses one type of batch effect functioning at the probe level. It does nothing when the batch effect exists at the probeset level. So something like fSVA, after preprocessing with frma, would certainly be a good idea as well. Best, Matt On Aug 23, 2013 7:56 AM, "Wolfgang Huber" <whuber@embl.de> wrote: > Hi Judith > > I am sure the frma people will have more specific recommendations, but in > addition, both your questions below could be interpreted as questions of > parameter choice in a (somewhat complex, since it includes the > preprocessing and batch adjustment) classifier. An often useful way of > making such choices is by cross-validation on a dataset that mimics the > kind of data you expect to see in the future. > > I guess you might also enjoy Jeff Leek's recent talk: > http://www.birs.ca/events/2013/5-day- workshops/13w5083/videos/watch/201308151110-Leek.mp4with frozen sva, and top scoring pairs > > Best wishes > Wolfgang > > On 23 Aug 2013, at 10:55, Judith Boer [guest] <guest@bioconductor.org> > wrote: > > > > > Dear all, > > > > I am working on expression classifiers for leukemic subtypes using > Affymetrix Plus2 arrays. The training data consists of several batches. The > developed classifier will be used to predict the subtype of new sets of > samples as well as single samples. So far, I co-normalized new arrays with > the training set, but this is not ideal. > > > > I have read the frma paper by McCall et al, and it seems the perfect > solutions. Before I start, I have a few conceptual questions: > > > > 1. The training data consists of several batches of different sizes, > some of them biased towards a single subtype. Does normalization per batch > using summarize=â random_effectâ remove biology in this case? ComBat > clearly did, and I ended up not correcting for batch effect, which worked > fine for the classifiers I am using. Any suggestion which summarization > would be best to use in this case? > > > > 2. Is there a minimum of arrays to use with summarize=â random_effectâ > ? > > > > Any suggestions on how to best implement frma in this project are very > welcome! > > > > Cheers, Judith > > > > > > -- output of sessionInfo(): > > > > R version 2.15.2 (2012-10-26) > > Platform: i386-w64-mingw32/i386 (32-bit) > > > > locale: > > [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United > Kingdom.1252 > > [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C > > [5] LC_TIME=English_United Kingdom.1252 > > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > > > > > -- > > Sent via the guest posting facility at bioconductor.org. > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 10.7 years ago Matthew McCall ▴ 830

Login before adding your answer.