How do I set the median when performing MAS5.0 normalization
1
0
Entering edit mode
@james-w-macdonald-5106
Last seen 9 hours ago
United States
Hi Matthew, No, the sc parameter scales the data using a trimmed mean rather than the median. You can see that by inspecting first the code for mas5(): mas5 <- function (object, normalize = TRUE, sc = 500, analysis = "absolute", ...) { res <- expresso(object, bgcorrect.method = "mas", pmcorrect.method = "mas", normalize = FALSE, summary.method = "mas", ...) if (normalize) res <- affy.scalevalue.exprSet(res, sc = sc, analysis = analysis) return(res) } and then by looking at the code for affy.scalevalue.exprSet(): affy.scalevalue.exprSet <- function (eset, sc = 500, analysis = "absolute") { analysis <- match(analysis, c("absolute", "comparison")) if (analysis == 1) nf <- 1 else stop("sorry! comparison not implemented.") for (i in 1:ncol(exprs(eset))) { slg <- exprs(eset)[, i] sf <- sc/mean(slg, trim = 0.02) reported.value <- nf * sf * slg exprs(eset)[, i] <- reported.value } return(eset) } So this just centers things on a 2% trimmed mean. You _could_ just hijack this function and set the trim to 0.5 to do the median scaling, but it is probably easier to 'roll yer own'. Best, Jim James W. MacDonald, M.S. Biostatistician Douglas Lab 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 >>> Matthew Willmann 12/06/10 5:53 PM >>> Hi Jim, Thank you. Does this mean that I can obtain the desired normalization by the following? mas5(data.MRW, normalize=FALSE, sc=50) Matthew ----------------------------------------------------- Matthew R. Willmann, Ph.D. Research Associate, Poethig Lab University of Pennsylvania Department of Biology 433 S. University Avenue Philadelphia, PA 19104 Lab phone: 215-898-8916 Cell: 508-243-2495 Fax: 215-898-8780 On Dec 6, 2010, at 3:55 PM, James W. MacDonald wrote: > Hi Matthew, > > On 12/6/2010 11:16 AM, Matthew Willmann wrote: >> Hello, >> >> I am trying to replicate results in a published paper, where they performed a MAS5.0 normalization with the median normalized to 50 and then defined genes as present if the normalized expression value is higher than 30. Does anyone know how I can alter the standard code to establish these parameters? >> >>> Cel.files=list.files(pattern=".CEL") >>> data.MRW=ReadAffy(filenames=Cel.files) >>> mas5(data.MRW, normalize=TRUE, sc=500) > > From the help page for mas5(): > > Usage: > > mas5(object, normalize = TRUE, sc = 500, analysis = "absolute", ...) > > Arguments: > > object: an instance of 'AffyBatch' > > normalize: logical. If 'TRUE' scale normalization is used after we > obtain an instance of 'ExpressionSet' > > So if you use mas5() with normalize = FALSE, then you don't get any scale normalization. You can then simply scale each chip to have a median of 50. > > Best, > > Jim > > >> >> Thank you for your time and advice. >> >> >> Matthew >> >> >> >> ----------------------------------------------------- >> Matthew R. Willmann, Ph.D. >> Research Associate, Poethig Lab >> University of Pennsylvania >> Department of Biology >> 433 S. University Avenue >> Philadelphia, PA 19104 >> Lab phone: 215-898-8916 >> Cell: 508-243-2495 >> Fax: 215-898-8780 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > University of Michigan > Department of Human Genetics > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 > ********************************************************** > Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues > ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
Normalization affy Normalization affy • 1.3k views
ADD COMMENT
0
Entering edit mode
@matthew-willmann-3882
Last seen 9.7 years ago
Hi Jim, I tried doing the following, and I still could not replicate the exact values in the published paper. I tried altering the code, as you suggested, as I do not know how to "roll yer own", as I am still too green. > mas50 <- function (object, normalize = TRUE, sc = 500, analysis = "absolute", + ...) + { + res <- expresso(object, bgcorrect.method = "mas", pmcorrect.method = "mas", + normalize = FALSE, summary.method = "mas", ...) + if (normalize) + res <- affy.scalevalues.exprSet(res, sc = sc, analysis = analysis) + return(res) + } > affy.scalevalues.exprSet <- function (eset, sc = 500, analysis = "absolute") + { + analysis <- match(analysis, c("absolute", "comparison")) + if (analysis == 1) + nf <- 1 + else stop("sorry! comparison not implemented.") + for (i in 1:ncol(exprs(eset))) { + slg <- exprs(eset)[, i] + sf <- sc/mean(slg, trim = 0.5) + reported.value <- nf * sf * slg + exprs(eset)[, i] <- reported.value + } + return(eset) + } > library(affy) Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material. To view, type 'openVignette()'. To cite Bioconductor, see 'citation("Biobase")' and for packages 'citation(pkgname)'. Attaching package: 'affy' The following object(s) are masked _by_ '.GlobalEnv': affy.scalevalue.exprSet, mas5 > data.MRW=ReadAffy(filenames=Cel.files) > data.mas50=mas50(data.MRW, normalize=FALSE, sc=500) background correction: mas PM/MM correction : mas expression values: mas background correcting...done. 22810 ids to be processed | | |####################| > data.mas50.exprs=exprs(data.mas50) > data.wk=data.mas50.exprs > write.table(data.wk,"test.txt",sep="\t") ----------------------------------------------------- Matthew R. Willmann, Ph.D. Research Associate, Poethig Lab University of Pennsylvania Department of Biology 433 S. University Avenue Philadelphia, PA 19104 Lab phone: 215-898-8916 Cell: 508-243-2495 Fax: 215-898-8780 On Dec 6, 2010, at 6:54 PM, James MacDonald wrote: > Hi Matthew, > > No, the sc parameter scales the data using a trimmed mean rather than the median. You can see that by inspecting first the code for mas5(): > > mas5 <- function (object, normalize = TRUE, sc = 500, analysis = "absolute", > ...) > { > res <- expresso(object, bgcorrect.method = "mas", pmcorrect.method = "mas", > normalize = FALSE, summary.method = "mas", ...) > if (normalize) > res <- affy.scalevalue.exprSet(res, sc = sc, analysis = analysis) > return(res) > } > > and then by looking at the code for affy.scalevalue.exprSet(): > > affy.scalevalue.exprSet <- function (eset, sc = 500, analysis = "absolute") > { > analysis <- match(analysis, c("absolute", "comparison")) > if (analysis == 1) > nf <- 1 > else stop("sorry! comparison not implemented.") > for (i in 1:ncol(exprs(eset))) { > slg <- exprs(eset)[, i] > sf <- sc/mean(slg, trim = 0.02) > reported.value <- nf * sf * slg > exprs(eset)[, i] <- reported.value > } > return(eset) > } > > So this just centers things on a 2% trimmed mean. You _could_ just hijack this function and set the trim to 0.5 to do the median scaling, but it is probably easier to 'roll yer own'. > > Best, > > Jim > > > > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 >>>> Matthew Willmann 12/06/10 5:53 PM >>> > Hi Jim, > > Thank you. Does this mean that I can obtain the desired normalization by the following? > > mas5(data.MRW, normalize=FALSE, sc=50) > > > Matthew > ----------------------------------------------------- > Matthew R. Willmann, Ph.D. > Research Associate, Poethig Lab > University of Pennsylvania > Department of Biology > 433 S. University Avenue > Philadelphia, PA 19104 > Lab phone: 215-898-8916 > Cell: 508-243-2495 > Fax: 215-898-8780 > > On Dec 6, 2010, at 3:55 PM, James W. MacDonald wrote: > >> Hi Matthew, >> >> On 12/6/2010 11:16 AM, Matthew Willmann wrote: >>> Hello, >>> >>> I am trying to replicate results in a published paper, where they performed a MAS5.0 normalization with the median normalized to 50 and then defined genes as present if the normalized expression value is higher than 30. Does anyone know how I can alter the standard code to establish these parameters? >>> >>>> Cel.files=list.files(pattern=".CEL") >>>> data.MRW=ReadAffy(filenames=Cel.files) >>>> mas5(data.MRW, normalize=TRUE, sc=500) >> >> From the help page for mas5(): >> >> Usage: >> >> mas5(object, normalize = TRUE, sc = 500, analysis = "absolute", ...) >> >> Arguments: >> >> object: an instance of 'AffyBatch' >> >> normalize: logical. If 'TRUE' scale normalization is used after we >> obtain an instance of 'ExpressionSet' >> >> So if you use mas5() with normalize = FALSE, then you don't get any scale normalization. You can then simply scale each chip to have a median of 50. >> >> Best, >> >> Jim >> >> >>> >>> Thank you for your time and advice. >>> >>> >>> Matthew >>> >>> >>> >>> ----------------------------------------------------- >>> Matthew R. Willmann, Ph.D. >>> Research Associate, Poethig Lab >>> University of Pennsylvania >>> Department of Biology >>> 433 S. University Avenue >>> Philadelphia, PA 19104 >>> Lab phone: 215-898-8916 >>> Cell: 508-243-2495 >>> Fax: 215-898-8780 >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> Douglas Lab >> University of Michigan >> Department of Human Genetics >> 5912 Buhl >> 1241 E. Catherine St. >> Ann Arbor MI 48109-5618 >> 734-615-7826 >> ********************************************************** >> Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues >> > > > ********************************************************** > Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues > >
ADD COMMENT
0
Entering edit mode
Hi Matthew, On 12/7/2010 7:13 PM, Matthew Willmann wrote: > Hi Jim, > > I tried doing the following, and I still could not replicate the exact values in the published paper. I tried altering the code, as you suggested, as I do not know how to "roll yer own", as I am still too green. > >> mas50<- function (object, normalize = TRUE, sc = 500, analysis = "absolute", > + ...) > + { > + res<- expresso(object, bgcorrect.method = "mas", pmcorrect.method = "mas", > + normalize = FALSE, summary.method = "mas", ...) > + if (normalize) > + res<- affy.scalevalues.exprSet(res, sc = sc, analysis = analysis) > + return(res) > + } >> affy.scalevalues.exprSet<- function (eset, sc = 500, analysis = "absolute") > + { > + analysis<- match(analysis, c("absolute", "comparison")) > + if (analysis == 1) > + nf<- 1 > + else stop("sorry! comparison not implemented.") > + for (i in 1:ncol(exprs(eset))) { > + slg<- exprs(eset)[, i] > + sf<- sc/mean(slg, trim = 0.5) > + reported.value<- nf * sf * slg > + exprs(eset)[, i]<- reported.value > + } > + return(eset) > + } I think you are a bit confused here. Setting the mean() function to use 50% trimming will give you the median, but keeping sc = 500 will median center at 500, not the 50 you originally stated. To further complicate matters, I don't know what the authors of the paper you cite really did. They may in fact have not normalized the data in this way at all, instead just median centering at 50. It's easy enough to do that, if you want to check. Just do something like eset <- mas5(data.MRW, FALSE) meds <- apply(exprs(eset), 2, median) dat <- sweep(exprs(eset), 2, meds - 50) exprs(eset) <- dat Best, Jim >> library(affy) > Loading required package: Biobase > > Welcome to Bioconductor > > Vignettes contain introductory material. To view, type > 'openVignette()'. To cite Bioconductor, see > 'citation("Biobase")' and for packages 'citation(pkgname)'. > > > Attaching package: 'affy' > > The following object(s) are masked _by_ '.GlobalEnv': > > affy.scalevalue.exprSet, mas5 > > >> data.MRW=ReadAffy(filenames=Cel.files) >> data.mas50=mas50(data.MRW, normalize=FALSE, sc=500) > background correction: mas > PM/MM correction : mas > expression values: mas > background correcting...done. > 22810 ids to be processed > | | > |####################| >> data.mas50.exprs=exprs(data.mas50) >> data.wk=data.mas50.exprs >> write.table(data.wk,"test.txt",sep="\t") > > ----------------------------------------------------- > Matthew R. Willmann, Ph.D. > Research Associate, Poethig Lab > University of Pennsylvania > Department of Biology > 433 S. University Avenue > Philadelphia, PA 19104 > Lab phone: 215-898-8916 > Cell: 508-243-2495 > Fax: 215-898-8780 > > On Dec 6, 2010, at 6:54 PM, James MacDonald wrote: > >> Hi Matthew, >> >> No, the sc parameter scales the data using a trimmed mean rather than the median. You can see that by inspecting first the code for mas5(): >> >> mas5<- function (object, normalize = TRUE, sc = 500, analysis = "absolute", >> ...) >> { >> res<- expresso(object, bgcorrect.method = "mas", pmcorrect.method = "mas", >> normalize = FALSE, summary.method = "mas", ...) >> if (normalize) >> res<- affy.scalevalue.exprSet(res, sc = sc, analysis = analysis) >> return(res) >> } >> >> and then by looking at the code for affy.scalevalue.exprSet(): >> >> affy.scalevalue.exprSet<- function (eset, sc = 500, analysis = "absolute") >> { >> analysis<- match(analysis, c("absolute", "comparison")) >> if (analysis == 1) >> nf<- 1 >> else stop("sorry! comparison not implemented.") >> for (i in 1:ncol(exprs(eset))) { >> slg<- exprs(eset)[, i] >> sf<- sc/mean(slg, trim = 0.02) >> reported.value<- nf * sf * slg >> exprs(eset)[, i]<- reported.value >> } >> return(eset) >> } >> >> So this just centers things on a 2% trimmed mean. You _could_ just hijack this function and set the trim to 0.5 to do the median scaling, but it is probably easier to 'roll yer own'. >> >> Best, >> >> Jim >> >> >> >> James W. MacDonald, M.S. >> Biostatistician >> Douglas Lab >> 5912 Buhl >> 1241 E. Catherine St. >> Ann Arbor MI 48109-5618 >> 734-615-7826 >>>>> Matthew Willmann 12/06/10 5:53 PM>>> >> Hi Jim, >> >> Thank you. Does this mean that I can obtain the desired normalization by the following? >> >> mas5(data.MRW, normalize=FALSE, sc=50) >> >> >> Matthew >> ----------------------------------------------------- >> Matthew R. Willmann, Ph.D. >> Research Associate, Poethig Lab >> University of Pennsylvania >> Department of Biology >> 433 S. University Avenue >> Philadelphia, PA 19104 >> Lab phone: 215-898-8916 >> Cell: 508-243-2495 >> Fax: 215-898-8780 >> >> On Dec 6, 2010, at 3:55 PM, James W. MacDonald wrote: >> >>> Hi Matthew, >>> >>> On 12/6/2010 11:16 AM, Matthew Willmann wrote: >>>> Hello, >>>> >>>> I am trying to replicate results in a published paper, where they performed a MAS5.0 normalization with the median normalized to 50 and then defined genes as present if the normalized expression value is higher than 30. Does anyone know how I can alter the standard code to establish these parameters? >>>> >>>>> Cel.files=list.files(pattern=".CEL") >>>>> data.MRW=ReadAffy(filenames=Cel.files) >>>>> mas5(data.MRW, normalize=TRUE, sc=500) >>> >>> From the help page for mas5(): >>> >>> Usage: >>> >>> mas5(object, normalize = TRUE, sc = 500, analysis = "absolute", ...) >>> >>> Arguments: >>> >>> object: an instance of 'AffyBatch' >>> >>> normalize: logical. If 'TRUE' scale normalization is used after we >>> obtain an instance of 'ExpressionSet' >>> >>> So if you use mas5() with normalize = FALSE, then you don't get any scale normalization. You can then simply scale each chip to have a median of 50. >>> >>> Best, >>> >>> Jim >>> >>> >>>> >>>> Thank you for your time and advice. >>>> >>>> >>>> Matthew >>>> >>>> >>>> >>>> ----------------------------------------------------- >>>> Matthew R. Willmann, Ph.D. >>>> Research Associate, Poethig Lab >>>> University of Pennsylvania >>>> Department of Biology >>>> 433 S. University Avenue >>>> Philadelphia, PA 19104 >>>> Lab phone: 215-898-8916 >>>> Cell: 508-243-2495 >>>> Fax: 215-898-8780 >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> -- >>> James W. MacDonald, M.S. >>> Biostatistician >>> Douglas Lab >>> University of Michigan >>> Department of Human Genetics >>> 5912 Buhl >>> 1241 E. Catherine St. >>> Ann Arbor MI 48109-5618 >>> 734-615-7826 >>> ********************************************************** >>> Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues >>> >> >> >> ********************************************************** >> Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues >> >> > -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
ADD REPLY

Login before adding your answer.

Traffic: 382 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6