Using an RNASeq experiment of 16 samples, I would like to cluster genes, and not samples. My aim is to identify subsets of genes that behave similarly across the samples, which could be possible using a good metric (maybe correlation?).
After a normalization performed with DESeq2, I computed TPM - normalized counts could be used as well I guess.
Applying kmeans on ~25000 genes gives:
Kmeans5=kmeans(TPMData,5,iter.max = 1000, nstart = 10000)
1 4 24859 1 90
Kmeans10=kmeans(TPMData,10,iter.max = 1000, nstart = 1000)
1 1 22848 13 1 75 1742 270 3 1
The problem of cluster size may be due to the sensitivity to outliers.
Would you have suggestions, like other methods, that could help for this purpose?
Thank you in advance