Question

Pearson Correlation & log transformed data.

0

Entering edit mode

Paul Geeleher ★ 1.3k

@paul-geeleher-2679

Last seen 9.7 years ago

Hi Folks, Hopefully this will be easy for somebody to answer, but I'm interested in clustering the expression profiles of genes from 8 timepoints using Pearson Correlation. I'm using code like this: dist_samples_pea <- as.dist(1-cor(t(filtMat), method ="pearson")) hc_samples_pea <- hclust(dist_samples_pea, method="average") plot(hc_samples_pea, hang=-1, ann=T, cex=0.75, main="Pearson") where filtMat is a martix of my data (basically exprs(eset) with some genesets removed). The code is from this document: http://www.google.com/url?sa=U&start=1&q=http://www.giu.fi/portals/0/s cience/Courses/Microarrays/Practical%2520Bioinformatics%25202007/Exerc ises/Class%2520discovery%2520using%2520R%2520Bioconductor,%252012-4-20 07_2.doc&ei=ee32Sdq-IILz-Ab4oOTBDw&usg=AFQjCNEqcWn- oQs5ggMAdZ_QZofs5P3W0g My question is about whether it makes a difference that I'm using the log transformed data? I know that the log transform is not linear, meaning that logged data and raw data will yield different clusters. I'd very much appreciate if somebody could justify one or the other course of action. Thanks a bunch, Paul. -- Paul Geeleher School of Mathematics, Statistics and Applied Mathematics National University of Ireland Galway Ireland

Clustering Clustering • 1.6k views

ADD COMMENT • link 15.1 years ago Paul Geeleher ★ 1.3k