median normalization

0

Entering edit mode

viritha kaza ▴ 580

@viritha-kaza-4318

Last seen 9.6 years ago

Hi group, I am trying to replicate a dataset GSE4824 from a paper. There are actually 3 platforms in them. But right now I am concentrating only on one platform GPL570.This contains 6 arrays. I have written the code to perform Microarray Suite version 5.0 (MAS 5.0) using Affymetrix default analysis settings and global scaling as normalization method. The trimmed mean target intensity of each array was arbitrarily set to 250.After which median normalization. >source("http://bioconductor.org/biocLite.R") > biocLite("affy") > library(affy) > mydata <- ReadAffy() >eset.mas5 = mas5(mydata,sc=250,normalize=TRUE) > write.exprs(eset.mas5,"GSE4824_GPL570.txt",sep='\t') >eset=exprs(eset.mas5) >median = apply(eset, 2, median) >median1=median(median) >exprs<-eset/median*median1 >write.table(exprs,"GSE4824_GPL570_Median.txt",sep='\t') Please let me know if the my code performs corectly the above task, especially if last few steps would perform median normalization correctly or not? Also let me know if this is the right way to do median normalization. Thanks, Viritha [[alternative HTML version deleted]]

Microarray Normalization Microarray Normalization • 5.8k views

ADD COMMENT • link updated 12.7 years ago by James W. MacDonald 65k • written 12.7 years ago by viritha kaza ▴ 580

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 4 hours ago

United States

Hi Viritha, On 8/29/2011 12:10 PM, viritha kaza wrote: > Hi group, > I am trying to replicate a dataset GSE4824 from a paper. > There are actually 3 platforms in them. But right now I am concentrating > only on one platform GPL570.This contains 6 arrays. > I have written the code to perform Microarray Suite version 5.0 (MAS 5.0) > using Affymetrix default analysis settings and global scaling as > normalization method. The trimmed mean target intensity of each array was > arbitrarily set to 250.After which median normalization. > > >> source("http://bioconductor.org/biocLite.R") > >> biocLite("affy") > >> library(affy) > >> mydata<- ReadAffy() > >> eset.mas5 = mas5(mydata,sc=250,normalize=TRUE) > >> write.exprs(eset.mas5,"GSE4824_GPL570.txt",sep='\t') > >> eset=exprs(eset.mas5) > >> median = apply(eset, 2, median) > >> median1=median(median) >> exprs<-eset/median*median1 The output from mas5() isn't log transformed, so you should be subtracting and adding, not dividing and multiplying. This assumes that by 'median normalization' the original authors simply meant median centering. Best, Jim >> write.table(exprs,"GSE4824_GPL570_Median.txt",sep='\t') > > Please let me know if the my code performs corectly the above task, > especially if last few steps would perform median normalization correctly or > not? Also let me know if this is the right way to do median normalization. > Thanks, > Viritha > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD COMMENT • link 12.7 years ago James W. MacDonald 65k

0

Entering edit mode

Hi James, Thanks for your quick reply. So my last step should be exprs<-eset-median+median1. Is it? In the paper they say "RNA fluorescent labeling reaction and hybridization were performed using the Affymetrix Gene Chips HG-U133A and HG-U133B according to the manufacturers instructions (http://www.affymetrix.com/). The arrays consist of 22,283 (HG-U133A) and 22,645 (HG-U133B) probe sets, which together amount to 23,583 unique genes based on Unigene build 173. Microarray analysis was performed using Affymetrix Microarray Suite 5.0 and in-house Visual Basic software MATRIX 1.26. Array data were median normalized, and replicate genes were combined by averaging. Samples (or averages of samples) were then compared against each other by calculating log ratios for each gene, and statistical significance was presented as a p value calculated by Students t test. The microarray data have been uploaded to GEO (Gene Expression Omnibus), and the accession number is GSE-4824." In GSE4824 in geo: "Data processing:Data were analyzed with Microarray Suite version 5.0 (MAS 5.0) using Affymetrix default analysis settings and global scaling as normalization method. The trimmed mean target intensity of each array was arbitrarily set to 250". Could you please suggest whether my steps are correct : Since there are 3 platforms namely GPL570 which contains 6 arrays which are referenece profiles and other 2 i.e HG-U133A and HG-U133B contains 79 samples. Steps as suggested by paper: 1)combine both platforms with mas5 intensity of both the platform for the 79samples from series matrix file(unlogged) * what about the common probes between HG U133A and HG U133B(168)? 2)Add annotation with Unigene and then combine the reference profile which is HGU133 plus2 *Do I ignore those probes which are unique to HGU133plus2 (9921) and probes of HGU133A and HGU133B(6) 3)Then perform median normalization or median centering. 4)Then averaging the replicate genes. 5)Log ratios for each genes(fold change) 6)then perform statistical student t-test. * During which step do I convert the expression to log2 ? wiating for your suggestions. Thanks, Viritha On Mon, Aug 29, 2011 at 1:27 PM, James W. MacDonald <jmacdon@med.umich.edu>wrote: > Hi Viritha, > > > On 8/29/2011 12:10 PM, viritha kaza wrote: > >> Hi group, >> I am trying to replicate a dataset GSE4824 from a paper. >> There are actually 3 platforms in them. But right now I am concentrating >> only on one platform GPL570.This contains 6 arrays. >> I have written the code to perform Microarray Suite version 5.0 (MAS 5.0) >> using Affymetrix default analysis settings and global scaling as >> normalization method. The trimmed mean target intensity of each array was >> arbitrarily set to 250.After which median normalization. >> >> >> source("http://bioconductor.**org/biocLite.R<http: bioconductor.or="" g="" bioclite.r=""> >>> ") >>> >> >> biocLite("affy") >>> >> >> library(affy) >>> >> >> mydata<- ReadAffy() >>> >> >> eset.mas5 = mas5(mydata,sc=250,normalize=**TRUE) >>> >> >> write.exprs(eset.mas5,"**GSE4824_GPL570.txt",sep='\t') >>> >> >> eset=exprs(eset.mas5) >>> >> >> median = apply(eset, 2, median) >>> >> >> median1=median(median) >>> exprs<-eset/median*median1 >>> >> > The output from mas5() isn't log transformed, so you should be subtracting > and adding, not dividing and multiplying. > > This assumes that by 'median normalization' the original authors simply > meant median centering. > > Best, > > Jim > > > write.table(exprs,"GSE4824_**GPL570_Median.txt",sep='\t') >>> >> >> Please let me know if the my code performs corectly the above task, >> especially if last few steps would perform median normalization correctly >> or >> not? Also let me know if this is the right way to do median normalization. >> Thanks, >> Viritha >> >> [[alternative HTML version deleted]] >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > > -- > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > University of Michigan > Department of Human Genetics > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 > ************************************************************ > Electronic Mail is not secure, may not be read every day, and should not be > used for urgent or sensitive issues > [[alternative HTML version deleted]]

ADD REPLY • link 12.7 years ago viritha kaza ▴ 580

0

Entering edit mode

Hi Viritha, On 8/31/2011 9:36 AM, viritha kaza wrote: > Hi James, > Thanks for your quick reply. > > So my last step should be > exprs<-eset-median+median1. > Is it? If your data are not log transformed, yes. This of course assumes that this is what the original authors meant by 'median normalization'. > > In the paper they say > "RNA fluorescent labeling reaction and hybridization were performed > using the Affymetrix Gene Chips HG-U133A and HG-U133B according to the > manufacturer?s instructions (http://www.affymetrix.com/). The arrays > consist of 22,283 (HG-U133A) and 22,645 (HG-U133B) probe sets, which > together amount to 23,583 unique genes based on Unigene build 173. > Microarray analysis was performed using Affymetrix Microarray Suite 5.0 > and in-house Visual Basic software MATRIX 1.26. Array data were median > normalized, and replicate genes were combined by averaging. Samples (or > averages of samples) were then compared against each other by > calculating log ratios for each gene, and statistical significance was > presented as a p value calculated by Student?s t test. The microarray > data have been uploaded to GEO (Gene Expression Omnibus), and the > accession number is GSE-4824." > > In GSE4824 in geo: > "Data processing:Data were analyzed with Microarray Suite version 5.0 > (MAS 5.0) using Affymetrix default analysis settings and global scaling > as normalization method. The trimmed mean target intensity of each array > was arbitrarily set to 250". > Could you please suggest whether my steps are correct : > Since there are 3 platforms namely GPL570 which contains 6 arrays which > are referenece profiles and other 2 i.e HG-U133A and HG-U133B contains > 79 samples. > Steps as suggested by paper: > 1)combine both platforms with mas5 intensity of both the platform for > the 79samples from series matrix file(unlogged) > * what about the common probes between HG U133A and HG U133B(168)? > 2)Add annotation with Unigene and then combine the reference profile > which is HGU133 plus2 > *Do I ignore those probes which are unique to HGU133plus2 (9921) and > probes of HGU133A and HGU133B(6) > 3)Then perform median normalization or median centering. > 4)Then averaging the replicate genes. > 5)Log ratios for each genes(fold change) > 6)then perform statistical student t-test. > * During which step do I convert the expression to log2 ? I would assume somewhere before step 5. But this is your project, so you have to make that decision for yourself. Best, Jim > wiating for your suggestions. > Thanks, > Viritha > On Mon, Aug 29, 2011 at 1:27 PM, James W. MacDonald > <jmacdon at="" med.umich.edu="" <mailto:jmacdon="" at="" med.umich.edu="">> wrote: > > Hi Viritha, > > > On 8/29/2011 12:10 PM, viritha kaza wrote: > > Hi group, > I am trying to replicate a dataset GSE4824 from a paper. > There are actually 3 platforms in them. But right now I am > concentrating > only on one platform GPL570.This contains 6 arrays. > I have written the code to perform Microarray Suite version 5.0 > (MAS 5.0) > using Affymetrix default analysis settings and global scaling as > normalization method. The trimmed mean target intensity of each > array was > arbitrarily set to 250.After which median normalization. > > > source("http://bioconductor.__org/biocLite.R > <http: bioconductor.org="" bioclite.r="">") > > > biocLite("affy") > > > library(affy) > > > mydata<- ReadAffy() > > > eset.mas5 = mas5(mydata,sc=250,normalize=__TRUE) > > > write.exprs(eset.mas5,"__GSE4824_GPL570.txt",sep='\t') > > > eset=exprs(eset.mas5) > > > median = apply(eset, 2, median) > > > median1=median(median) > exprs<-eset/median*median1 > > > The output from mas5() isn't log transformed, so you should be > subtracting and adding, not dividing and multiplying. > > This assumes that by 'median normalization' the original authors > simply meant median centering. > > Best, > > Jim > > > write.table(exprs,"GSE4824___GPL570_Median.txt",sep='\t') > > > Please let me know if the my code performs corectly the above task, > especially if last few steps would perform median normalization > correctly or > not? Also let me know if this is the right way to do median > normalization. > Thanks, > Viritha > > [[alternative HTML version deleted]] > > _________________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/__listinfo/bioconductor > <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: > http://news.gmane.org/gmane.__science.biology.informatics.__conductor > <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> > > > -- > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > University of Michigan > Department of Human Genetics > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 <tel:734-615-7826> > ******************************__**************************** > Electronic Mail is not secure, may not be read every day, and should > not be used for urgent or sensitive issues > > -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD REPLY • link 12.7 years ago James W. MacDonald 65k

Login before adding your answer.