data normalization

0

Entering edit mode

Barbara Uszczynska ▴ 60

@barbara-uszczynska-3582

Last seen 9.6 years ago

Dear R-Users, I use home-made spotted arrays to do some research contected with alergies. The matrix consist of : 60% of the genes are up-regulated and 40% of genes that are down-regulated and spikes. I didn't use any genes with constant expression. How I should analyse this experiment? According to statistics I should focuse on external spike controls and compare all genes with spikes. It is two coulour experiment. So I have to build quite complicated statistical model. I'm not sure if it is a right pathway. What do you think? Regards, Barbara [[alternative HTML version deleted]]

• 782 views

ADD COMMENT • link updated 14.8 years ago by James W. MacDonald 65k • written 14.8 years ago by Barbara Uszczynska ▴ 60

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 1 hour ago

United States

Hi Barbara, Barbara Uszczynska wrote: > Dear R-Users, > > I use home-made spotted arrays to do some research contected with alergies. > The matrix consist of : 60% of the genes are up-regulated and 40% of genes > that are down-regulated and spikes. I didn't use any genes with constant > expression. How I should analyse this experiment? According to statistics I > should focuse on external spike controls and compare all genes with spikes. > It is two coulour experiment. So I have to build quite complicated > statistical model. I'm not sure if it is a right pathway. What do you > think? I think two things: First, asking the same question over and over will not endear you to the listserv community, and will increase the likelihood that your posts will simply be deleted by those who might help you. Second, what you are asking for is statistical help in analyzing your experiment rather than help using software. Since many of the people on this list are practicing statisticians, what you are asking is for them to do what they get paid to do for you for free. I would suggest that a more reasonable approach is to find a local statistician to help you with your analysis, as you are unlikely to get any (reasonable) help on a listserv. Best, Jim > > Regards, > > Barbara > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826

ADD COMMENT • link 14.8 years ago James W. MacDonald 65k

0

Entering edit mode

No one pays me for my opinions on this subject, so you may have mine for free. First, normalization is a slightly nasty business when it comes to microarrays. The basic idea, of course, is to use some mechanism to remove obvious systematic effects. For example, in a two color system, the two dyes may have slightly different intensity profiles when measuring the same sample. My first piece of advice is that you use some mechanism to SEE what that effect looks like in your system. I think the limma package (a big favorite in these parts) has a function called plotDensities(). One can make these in R using the density() function. You can also create a plot of the log fold change vs average log intensity for this type of array, you will generally observe a pattern looks like a banana. In other words, the residuals of the regression line through this plot show obvious local trends. If you ignore this, then you are accepting that your experimental conditions have somehow conjured this up, and this is not at all likely. The most intuitively obvious solution is to straighten the banana out, and you can achieve this by loess, also available in limma. Loess creates a local regression curve through the middle of the banana, then applies predictions based on this line to adjust one channel or the other, straightening it. Interestingly, you can get a rather similar result by quantile normalization, which forces two data sets to share a common distribution. It took me a minute to envision why this is true, but it is. Another possibility, one that I have not tried, is based on variance stabilization. This makes are rather different set of assumptions, and I am also going to play with this in the near future. Whatever approach you choose, you can be assured that your normalization approach will be creating new artifacts in your data. There is no perfect world here. This fact alone makes people edgy sometimes. Second, there are many other systematic effects that are much more complicated than intensity dependent dye effects. The good news is that if you understand the magnitude of your unwanted systematic effects pretty well, you can hopefully do enough normalization of the right sort to partly compensate for it without introducing enormous artifacts. In summary, this is not a turnkey system where you just drop all the numbers into a magical grinder and out pops the correct answer without any though or understanding on anyone's part. It takes time and consideration to do these things, a fact that most (but not all) of the people who pay the rent around here understand. Best, Tom On Jul 21, 2009, at 8:56 AM, James W. MacDonald wrote: > Hi Barbara, > > Barbara Uszczynska wrote: >> Dear R-Users, >> I use home-made spotted arrays to do some research contected with >> alergies. >> The matrix consist of : 60% of the genes are up-regulated and 40% >> of genes >> that are down-regulated and spikes. I didn't use any genes with >> constant >> expression. How I should analyse this experiment? According to >> statistics I >> should focuse on external spike controls and compare all genes >> with spikes. >> It is two coulour experiment. So I have to build quite complicated >> statistical model. I'm not sure if it is a right pathway. What do >> you >> think? > > I think two things: > > First, asking the same question over and over will not endear you to > the listserv community, and will increase the likelihood that your > posts will simply be deleted by those who might help you. > > Second, what you are asking for is statistical help in analyzing > your experiment rather than help using software. Since many of the > people on this list are practicing statisticians, what you are > asking is for them to do what they get paid to do for you for free. > I would suggest that a more reasonable approach is to find a local > statistician to help you with your analysis, as you are unlikely to > get any (reasonable) help on a listserv. > > Best, > > Jim > > >> Regards, >> Barbara >> [[alternative HTML version deleted]] >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > University of Michigan > Department of Human Genetics > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 14.8 years ago Thomas Hampton ▴ 750

Login before adding your answer.