Hierarchical clustering of RMA data

0

Entering edit mode

Ryan Kirkbride ▴ 10

@ryan-kirkbride-3006

Last seen 9.6 years ago

Hello all! I have a basic conceptual question: I have a set of RMA normalized data that I am looking to carry out hierarchical clustering. In the past we've usually been working with MAS5 data which we import into dCHIP to carry out the clustering. I'm now looking to do the same with RMA data, and I'm wondering if I should transform to a linear scale or leave it the typical log2 scale. dChip does a per gene normalization (subtracts the mean and then divides by the standard deviation), and it appears that linear or log2 scale affects the results. I'm assuming most people just leave it log2 scale, am I overthinking the whole issue? Thanks, _________________________ Ryan Kirkbride Plant Biology Graduate Student Harada Lab UC Davis [[alternative HTML version deleted]]

Normalization Clustering Normalization Clustering • 1.1k views

ADD COMMENT • link updated 15.7 years ago by Deanne Taylor ▴ 50 • written 15.7 years ago by Ryan Kirkbride ▴ 10

0

Entering edit mode

Deanne Taylor ▴ 50

@deanne-taylor-2380

Last seen 9.6 years ago

Ryan: This might be a naive question as I'm not sure how dChip is doing the normalization, but is there a setting in dChip to let it know it's a log2 scale? Otherwise the mathematics between log and linear scale would be much different... and that might be the source of the difference, as subtracting log2 data is akin to dividing at the linear scale. --- Deanne Taylor PhD Executive Director, Bioinformatics Core Department of Biostatistics Harvard School of Public Health 655 Huntington Avenue Boston, MA 02115 dtaylor at hsph.harvard.edu >>> Ryan Kirkbride <rkirkbride at="" ucdavis.edu=""> 08/28/08 8:27 PM >>> Hello all! I have a basic conceptual question: I have a set of RMA normalized data that I am looking to carry out hierarchical clustering. In the past we've usually been working with MAS5 data which we import into dCHIP to carry out the clustering. I'm now looking to do the same with RMA data, and I'm wondering if I should transform to a linear scale or leave it the typical log2 scale. dChip does a per gene normalization (subtracts the mean and then divides by the standard deviation), and it appears that linear or log2 scale affects the results. I'm assuming most people just leave it log2 scale, am I overthinking the whole issue? Thanks, _________________________ Ryan Kirkbride Plant Biology Graduate Student Harada Lab UC Davis [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 15.7 years ago Deanne Taylor ▴ 50

0

Entering edit mode

RESENDING WITHOUT ATTACHMENT: Scaling is well known to cause different hierarchical (and non- hierarchical) clustering results. The decision to transform the data has to be considered in terms of how the transformation will impact the distance calculations. We are very comfortable with transforming to induce things such as normality or homoscedasticty, however, this is not why we would necessarily do it in a clustering problem. I have attached a review article (in a previous post) on clustering microarray data that shows a simple example of how scaling results in different clusters, and why one would be used over the other. (Pharmacogenomics, 2003, Vol 4(1), pps. 41-52). Bill Shannon Associate Professor of Biostatistics in Medicine Washington University School of Medicine St Louis President-Elect, Classification Society --- On Fri, 8/29/08, Deanne Taylor <dtaylor@hsph.harvard.edu> wrote: From: Deanne Taylor <dtaylor@hsph.harvard.edu> Subject: Re: [BioC] Hierarchical clustering of RMA data To: bioconductor@stat.math.ethz.ch, rkirkbride@ucdavis.edu Date: Friday, August 29, 2008, 6:35 AM Ryan: This might be a naive question as I'm not sure how dChip is doing the normalization, but is there a setting in dChip to let it know it's a log2 scale? Otherwise the mathematics between log and linear scale would be much different... and that might be the source of the difference, as subtracting log2 data is akin to dividing at the linear scale. --- Deanne Taylor PhD Executive Director, Bioinformatics Core Department of Biostatistics Harvard School of Public Health 655 Huntington Avenue Boston, MA 02115 dtaylor@hsph.harvard.edu >>> Ryan Kirkbride <rkirkbride@ucdavis.edu> 08/28/08 8:27 PM >>> Hello all! I have a basic conceptual question: I have a set of RMA normalized data that I am looking to carry out hierarchical clustering. In the past we've usually been working with MAS5 data which we import into dCHIP to carry out the clustering. I'm now looking to do the same with RMA data, and I'm wondering if I should transform to a linear scale or leave it the typical log2 scale. dChip does a per gene normalization (subtracts the mean and then divides by the standard deviation), and it appears that linear or log2 scale affects the results. I'm assuming most people just leave it log2 scale, am I overthinking the whole issue? Thanks, _________________________ Ryan Kirkbride Plant Biology Graduate Student Harada Lab UC Davis [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD REPLY • link 15.7 years ago William Shannon ▴ 280

Login before adding your answer.