Question

Gene clusters based on gene expression across samples

0

Entering edit mode

Jane Merlevede ▴ 90

@jane-merlevede-5019

Last seen 7.3 years ago

Dear all,

Using an RNASeq experiment of 16 samples, I would like to cluster genes, and not samples. My aim is to identify subsets of genes that behave similarly across the samples, which could be possible using a good metric (maybe correlation?).

After a normalization performed with DESeq2, I computed TPM - normalized counts could be used as well I guess.
Applying kmeans on ~25000 genes gives:
Kmeans5=kmeans(TPMData,5,iter.max = 1000, nstart = 10000)
Kmeans5$size
1 4 24859 1 90
or
Kmeans10=kmeans(TPMData,10,iter.max = 1000, nstart = 1000)
Kmeans10$size
1 1 22848 13 1 75 1742 270 3 1

The problem of cluster size may be due to the sensitivity to outliers.
Would you have suggestions, like other methods, that could help for this purpose?
Thank you in advance

kmeans gene expression clustering • 2.0k views

ADD COMMENT • link 8.3 years ago • updated 8.2 years ago Jane Merlevede ▴ 90

score 0 · Answer 1 · 2017-10-16

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 18 hours ago

United States

Have you looked at WGCNA?

ADD COMMENT • link 8.3 years ago James W. MacDonald 68k

0

Entering edit mode

No, I did not know.

I will take a look, thank you

ADD REPLY • link 8.3 years ago Jane Merlevede ▴ 90

score 0 · Answer 2 · 2017-10-30

Well, I tried WGCNA and indeed, it seems to fit my needs. Thank you.

I would like to test some simpler clustering methods, like methods based on medoid partitioning.

Using kmeans(), on 2 distinct datasets, with 5 or 10 clusters, I got several clusters (between 2 or 4) containing a single gene. Did you meet this problem?

kmeans() does neither allow to require a minimum number of genes per cluster, nor to change the Euclidean distance between objects, for correlation for example.

Then, I tried skmeans() that uses cosine dissimilarity between objects. On both datasets, I did not get cluster of a single gene, but rather this type of repartition:

Class sizes: 842, 1152, 1072, 1206, 4990, 12207, 1064, 555, 1102, 765

Here again, I cannot use correlation between genes to find genes that vary similarly across the samples.

Using kcca() from flexclust package, it should be possible to use the correlation, but I was not successful for now.

Do some of you use "simple" clustering methods on genes with success to describe gene expression similarity?

score 0 · Answer 3 · 2017-11-07

0

Entering edit mode

Jane Merlevede ▴ 90

@jane-merlevede-5019

Last seen 7.3 years ago

Any feedback on gene clustering using classical clustering methods?

ADD COMMENT • link 8.2 years ago Jane Merlevede ▴ 90