Entering edit mode
Paul Geeleher
★
1.3k
@paul-geeleher-2679
Last seen 10.3 years ago
Hi Folks,
Hopefully this will be easy for somebody to answer, but I'm interested
in clustering the expression profiles of genes from 8 timepoints using
Pearson Correlation. I'm using code like this:
dist_samples_pea <- as.dist(1-cor(t(filtMat), method ="pearson"))
hc_samples_pea <- hclust(dist_samples_pea, method="average")
plot(hc_samples_pea, hang=-1, ann=T, cex=0.75, main="Pearson")
where filtMat is a martix of my data (basically exprs(eset) with some
genesets removed). The code is from this document:
http://www.google.com/url?sa=U&start=1&q=http://www.giu.fi/portals/0/s
cience/Courses/Microarrays/Practical%2520Bioinformatics%25202007/Exerc
ises/Class%2520discovery%2520using%2520R%2520Bioconductor,%252012-4-20
07_2.doc&ei=ee32Sdq-IILz-Ab4oOTBDw&usg=AFQjCNEqcWn-
oQs5ggMAdZ_QZofs5P3W0g
My question is about whether it makes a difference that I'm using the
log transformed data? I know that the log transform is not linear,
meaning that logged data and raw data will yield different clusters.
I'd very much appreciate if somebody could justify one or the other
course of action.
Thanks a bunch,
Paul.
--
Paul Geeleher
School of Mathematics, Statistics and Applied Mathematics
National University of Ireland
Galway
Ireland