Question: Choosing genes for clustering
gravatar for lirongrossmann
12 days ago by
lirongrossmann0 wrote:

Dear all, 

I am running a differential gene expression between 2 groups and got 124 differentially expressed genes using the limma package. 

When I run hierarchal clustering on the dataset using 30 top genes I get pretty clear separation between the 2 groups. When I increase the number of genes to 50 the separation is not so clear and with 124 genes, I don't see the separation on heatmap between the 2 groups.

Has anyone come across a similar situation when you choose different number of differentially expressed genes (all with adjp < 0.05) you get very different clustering of samples?

Is there a way to choose the best set of DE genes (within the ones I get from limma) that separates the two sample groups the best? Thanks a lot,


Thank you very much!


ADD REPLYlink written 10 days ago by lirongrossmann0
gravatar for chris86
12 days ago by
UCL, United Kingdom
chris86320 wrote:

This is common. There are ways of calculating cluster strength and stability which you could apply to different numbers of DE genes, however, this is a danger of being a bit too selective here. I would just test a few different numbers, 30, 50, 100, etc. Then choose what you need, it depends on what you want to do with the data.

ADD COMMENTlink written 12 days ago by chris86320
gravatar for Gordon Smyth
7 days ago by
Gordon Smyth31k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth31k wrote:

Your question assumes that there is some trick to choosing DE genes, but really there isn't. By definition, the DE analysis is choosing the genes that best separate the two groups. So the best set of DE genes to separate the two sample groups are just the top DE genes. The top gene separates the groups best of all, the second gene 2nd best, the third gene 3rd best, and so on.

I find it very surprising that you could cluster on significantly DE genes but not separate the groups. Did you do a simple DE analysis between two groups or did you include any batch effects or factors other than the groups in the design matrix?

There are dozens of ways to run hierarchical clustering, and I wonder whether you are choosing a good way. Have you tried coolmap() in the limma package?

ADD COMMENTlink modified 7 days ago • written 7 days ago by Gordon Smyth31k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 288 users visited in the last hour