Question: Gene clusters based on gene expression across samples
gravatar for Jane Merlevede
4 weeks ago by
Jane Merlevede90 wrote:

Dear all,

Using an RNASeq experiment of 16 samples, I would like to cluster genes, and not samples. My aim is to identify subsets of genes that behave similarly across the samples, which could be possible using a good metric (maybe correlation?).

After a normalization performed with DESeq2, I computed TPM - normalized counts could be used as well I guess.
Applying kmeans on ~25000 genes gives:
Kmeans5=kmeans(TPMData,5,iter.max = 1000, nstart = 10000)
1     4 24859     1    90
Kmeans10=kmeans(TPMData,10,iter.max = 1000, nstart = 1000)
1     1 22848    13     1    75  1742   270     3     1

The problem of cluster size may be due to the sensitivity to outliers.
Would you have suggestions, like other methods, that could help for this purpose?
Thank you in advance

ADD COMMENTlink modified 13 days ago • written 4 weeks ago by Jane Merlevede90
gravatar for James W. MacDonald
4 weeks ago by
United States
James W. MacDonald45k wrote:

Have you looked at WGCNA?

ADD COMMENTlink written 4 weeks ago by James W. MacDonald45k

No, I did not know.

I will take a look, thank you

ADD REPLYlink written 4 weeks ago by Jane Merlevede90
gravatar for Jane Merlevede
21 days ago by
Jane Merlevede90 wrote:

Well, I tried WGCNA and indeed, it seems to fit my needs. Thank you.

I would like to test some simpler clustering methods, like methods based on medoid partitioning.

Using kmeans(), on 2 distinct datasets, with 5 or 10 clusters, I got several clusters (between 2 or 4) containing a single gene. Did you meet this problem?

kmeans() does neither allow to require a minimum number of genes per cluster, nor to change the Euclidean distance between objects, for correlation for example.

Then, I tried skmeans() that uses cosine dissimilarity between objects. On both datasets, I did not get cluster of a single gene, but rather this type of repartition:

Class sizes: 842, 1152, 1072, 1206, 4990, 12207, 1064, 555, 1102, 765

Here again, I cannot use correlation between genes to find genes that vary similarly across the samples.

Using kcca() from flexclust package, it should be possible to use the correlation, but I was not successful for now.

Do some of you use "simple" clustering methods on genes with success to describe gene expression similarity?

ADD COMMENTlink written 21 days ago by Jane Merlevede90
gravatar for Jane Merlevede
13 days ago by
Jane Merlevede90 wrote:

Any feedback on gene clustering using classical clustering methods?

ADD COMMENTlink written 13 days ago by Jane Merlevede90
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 332 users visited in the last hour