Question: Gene clusters based on gene expression across samples
gravatar for Jane Merlevede
3 months ago by
Jane Merlevede90 wrote:

Dear all,

Using an RNASeq experiment of 16 samples, I would like to cluster genes, and not samples. My aim is to identify subsets of genes that behave similarly across the samples, which could be possible using a good metric (maybe correlation?).

After a normalization performed with DESeq2, I computed TPM - normalized counts could be used as well I guess.
Applying kmeans on ~25000 genes gives:
Kmeans5=kmeans(TPMData,5,iter.max = 1000, nstart = 10000)
1     4 24859     1    90
Kmeans10=kmeans(TPMData,10,iter.max = 1000, nstart = 1000)
1     1 22848    13     1    75  1742   270     3     1

The problem of cluster size may be due to the sensitivity to outliers.
Would you have suggestions, like other methods, that could help for this purpose?
Thank you in advance

ADD COMMENTlink modified 10 weeks ago • written 3 months ago by Jane Merlevede90
gravatar for James W. MacDonald
3 months ago by
United States
James W. MacDonald45k wrote:

Have you looked at WGCNA?

ADD COMMENTlink written 3 months ago by James W. MacDonald45k

No, I did not know.

I will take a look, thank you

ADD REPLYlink written 3 months ago by Jane Merlevede90
gravatar for Jane Merlevede
11 weeks ago by
Jane Merlevede90 wrote:

Well, I tried WGCNA and indeed, it seems to fit my needs. Thank you.

I would like to test some simpler clustering methods, like methods based on medoid partitioning.

Using kmeans(), on 2 distinct datasets, with 5 or 10 clusters, I got several clusters (between 2 or 4) containing a single gene. Did you meet this problem?

kmeans() does neither allow to require a minimum number of genes per cluster, nor to change the Euclidean distance between objects, for correlation for example.

Then, I tried skmeans() that uses cosine dissimilarity between objects. On both datasets, I did not get cluster of a single gene, but rather this type of repartition:

Class sizes: 842, 1152, 1072, 1206, 4990, 12207, 1064, 555, 1102, 765

Here again, I cannot use correlation between genes to find genes that vary similarly across the samples.

Using kcca() from flexclust package, it should be possible to use the correlation, but I was not successful for now.

Do some of you use "simple" clustering methods on genes with success to describe gene expression similarity?

ADD COMMENTlink written 11 weeks ago by Jane Merlevede90
gravatar for Jane Merlevede
10 weeks ago by
Jane Merlevede90 wrote:

Any feedback on gene clustering using classical clustering methods?

ADD COMMENTlink written 10 weeks ago by Jane Merlevede90
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 140 users visited in the last hour