Question: frma normalization and batch effects
0
gravatar for Guest User
6.3 years ago by
Guest User12k
Guest User12k wrote:
Dear all, I am working on expression classifiers for leukemic subtypes using Affymetrix Plus2 arrays. The training data consists of several batches. The developed classifier will be used to predict the subtype of new sets of samples as well as single samples. So far, I co- normalized new arrays with the training set, but this is not ideal. I have read the frma paper by McCall et al, and it seems the perfect solutions. Before I start, I have a few conceptual questions: 1. The training data consists of several batches of different sizes, some of them biased towards a single subtype. Does normalization per batch using summarize=???random_effect??? remove biology in this case? ComBat clearly did, and I ended up not correcting for batch effect, which worked fine for the classifiers I am using. Any suggestion which summarization would be best to use in this case? 2. Is there a minimum of arrays to use with summarize=???random_effect???? Any suggestions on how to best implement frma in this project are very welcome! Cheers, Judith -- output of sessionInfo(): R version 2.15.2 (2012-10-26) Platform: i386-w64-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base -- Sent via the guest posting facility at bioconductor.org.
normalization frma • 889 views
ADD COMMENTlink modified 6.3 years ago by Wolfgang Huber13k • written 6.3 years ago by Guest User12k
Answer: frma normalization and batch effects
0
gravatar for Wolfgang Huber
6.3 years ago by
EMBL European Molecular Biology Laboratory
Wolfgang Huber13k wrote:
Hi Judith I am sure the frma people will have more specific recommendations, but in addition, both your questions below could be interpreted as questions of parameter choice in a (somewhat complex, since it includes the preprocessing and batch adjustment) classifier. An often useful way of making such choices is by cross-validation on a dataset that mimics the kind of data you expect to see in the future. I guess you might also enjoy Jeff Leek's recent talk: http://www.birs.ca/events/2013/5-day- workshops/13w5083/videos/watch/201308151110-Leek.mp4 with frozen sva, and top scoring pairs Best wishes Wolfgang On 23 Aug 2013, at 10:55, Judith Boer [guest] <guest at="" bioconductor.org=""> wrote: > > Dear all, > > I am working on expression classifiers for leukemic subtypes using Affymetrix Plus2 arrays. The training data consists of several batches. The developed classifier will be used to predict the subtype of new sets of samples as well as single samples. So far, I co- normalized new arrays with the training set, but this is not ideal. > > I have read the frma paper by McCall et al, and it seems the perfect solutions. Before I start, I have a few conceptual questions: > > 1. The training data consists of several batches of different sizes, some of them biased towards a single subtype. Does normalization per batch using summarize=???random_effect??? remove biology in this case? ComBat clearly did, and I ended up not correcting for batch effect, which worked fine for the classifiers I am using. Any suggestion which summarization would be best to use in this case? > > 2. Is there a minimum of arrays to use with summarize=???random_effect???? > > Any suggestions on how to best implement frma in this project are very welcome! > > Cheers, Judith > > > -- output of sessionInfo(): > > R version 2.15.2 (2012-10-26) > Platform: i386-w64-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 > [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C > [5] LC_TIME=English_United Kingdom.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENTlink written 6.3 years ago by Wolfgang Huber13k
Judith, You're probably fine using the default frma summarization unless your data are in some way atypical. The random effect summarization just allows the probe effects in your data to differ slightly from the global (frozen) probe effects. Also if you're going to have singletons for which you want to predict in the future, then the default summarization is definitely the way to go. Depending on how big your dataset is going to be, you might consider creating your own custom frma implementation using the frmaTools package. Finally frma addresses one type of batch effect functioning at the probe level. It does nothing when the batch effect exists at the probeset level. So something like fSVA, after preprocessing with frma, would certainly be a good idea as well. Best, Matt On Aug 23, 2013 7:56 AM, "Wolfgang Huber" <whuber@embl.de> wrote: > Hi Judith > > I am sure the frma people will have more specific recommendations, but in > addition, both your questions below could be interpreted as questions of > parameter choice in a (somewhat complex, since it includes the > preprocessing and batch adjustment) classifier. An often useful way of > making such choices is by cross-validation on a dataset that mimics the > kind of data you expect to see in the future. > > I guess you might also enjoy Jeff Leek's recent talk: > http://www.birs.ca/events/2013/5-day- workshops/13w5083/videos/watch/201308151110-Leek.mp4with frozen sva, and top scoring pairs > > Best wishes > Wolfgang > > On 23 Aug 2013, at 10:55, Judith Boer [guest] <guest@bioconductor.org> > wrote: > > > > > Dear all, > > > > I am working on expression classifiers for leukemic subtypes using > Affymetrix Plus2 arrays. The training data consists of several batches. The > developed classifier will be used to predict the subtype of new sets of > samples as well as single samples. So far, I co-normalized new arrays with > the training set, but this is not ideal. > > > > I have read the frma paper by McCall et al, and it seems the perfect > solutions. Before I start, I have a few conceptual questions: > > > > 1. The training data consists of several batches of different sizes, > some of them biased towards a single subtype. Does normalization per batch > using summarize=†random_effect†remove biology in this case? ComBat > clearly did, and I ended up not correcting for batch effect, which worked > fine for the classifiers I am using. Any suggestion which summarization > would be best to use in this case? > > > > 2. Is there a minimum of arrays to use with summarize=†random_effect†> ? > > > > Any suggestions on how to best implement frma in this project are very > welcome! > > > > Cheers, Judith > > > > > > -- output of sessionInfo(): > > > > R version 2.15.2 (2012-10-26) > > Platform: i386-w64-mingw32/i386 (32-bit) > > > > locale: > > [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United > Kingdom.1252 > > [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C > > [5] LC_TIME=English_United Kingdom.1252 > > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > > > > > -- > > Sent via the guest posting facility at bioconductor.org. > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLYlink written 6.3 years ago by Matthew McCall830
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 199 users visited in the last hour