Question: Gene clusters based on gene expression across samples
11 months ago
Jane Merlevede90 wrote:

Dear all,

Using an RNASeq experiment of 16 samples, I would like to cluster genes, and not samples. My aim is to identify subsets of genes that behave similarly across the samples, which could be possible using a good metric (maybe correlation?).

After a normalization performed with DESeq2, I computed TPM - normalized counts could be used as well I guess.
Applying kmeans on ~25000 genes gives:
Kmeans5=kmeans(TPMData,5,iter.max = 1000, nstart = 10000)
Kmeans5$size 1 4 24859 1 90 or Kmeans10=kmeans(TPMData,10,iter.max = 1000, nstart = 1000) Kmeans10$size
1     1 22848    13     1    75  1742   270     3     1

The problem of cluster size may be due to the sensitivity to outliers.
Would you have suggestions, like other methods, that could help for this purpose?

modified 10 months ago • written 11 months ago
11 months ago
United States
James W. MacDonald47k wrote:

Have you looked at WGCNA?

No, I did not know.

I will take a look, thank you

10 months ago
Jane Merlevede90 wrote:

Well, I tried WGCNA and indeed, it seems to fit my needs. Thank you.

I would like to test some simpler clustering methods, like methods based on medoid partitioning.

Using kmeans(), on 2 distinct datasets, with 5 or 10 clusters, I got several clusters (between 2 or 4) containing a single gene. Did you meet this problem?

kmeans() does neither allow to require a minimum number of genes per cluster, nor to change the Euclidean distance between objects, for correlation for example.

Then, I tried skmeans() that uses cosine dissimilarity between objects. On both datasets, I did not get cluster of a single gene, but rather this type of repartition:

Class sizes: 842, 1152, 1072, 1206, 4990, 12207, 1064, 555, 1102, 765

Here again, I cannot use correlation between genes to find genes that vary similarly across the samples.

Using kcca() from flexclust package, it should be possible to use the correlation, but I was not successful for now.

Do some of you use "simple" clustering methods on genes with success to describe gene expression similarity?

10 months ago
Jane Merlevede90 wrote:

Any feedback on gene clustering using classical clustering methods?