Clustering with differentially expressed genes in RNAseq
1
0
Entering edit mode
User34591 • 0
@user34591-8699
Last seen 6.0 years ago
France

I have performed differential analysis with DESeq2, edgeR and voom-limma. My goal was to evaluate each methods on my data. Now I would like to perform clustering with the genes found differentially expressed (DEG) by each method to see how well these genes discriminat my two conditions. To do so, I have to shoot my counts through some type of variance stabilizing transformation

Should I do all my clusterings with rlog data and the 3 sets of DEG ?

I wonder if it is relevant to do a clustering with DEG from voom on data that have been transform with rlog or with DEG from DESeq2 on data that have been transform by voom ?

Any clues ?
Many thanks

deseq2 limma voom edger • 2.8k views
ADD COMMENT
3
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 1 hour ago
The city by the bay

This doesn't seem like a particularly useful evaluation strategy. By definition, all of the methods will identify DEGs that will have different expression between conditions. Thus, I suspect that all methods would form "correct" clusters (i.e., samples would be clustered together corresponding to their conditions) based on the reported DEGs, unless something goes catastrophically wrong. The main differences between methods are more subtle, and lie in how they model the variability and in error rate control, neither of which can be easily assessed with clustering - or at all, in fact, unless you have a data set with known truths, e.g., simulations.

Anyway, when I want to cluster RNA-seq data, I usually apply a log-transformation:

expr.vals <- cpm(y, log=TRUE, prior.count=2) # where 'y' is a DGEList.

... and then do my clustering on the matrix of expression values. The log-transformation provides some level of variance stabilization, especially at large counts (see the voom paper for some discussion of this). The large prior count just protects against poorly defined log-fold changes at low counts. It's better to use this for clustering instead of the E-values reported by voom, as voom only adds a continuity correction of 0.5. (Note that this small addition is fine within limma, as observations are precision-weighted anyway, but if you're doing unweighted analyses outside of limma with those E-values, then you could run into problems.)

I suppose if you wanted to make comparisons between methods easier, you should use the same variance-stabilized expression values (either with cpm above, or with rlog in DESeq) to make your clusters for all methods. I don't think this would pose any problems, but then again, I don't think this is a worthwhile analysis in the first place, so make of that what you will.

ADD COMMENT

Login before adding your answer.

Traffic: 522 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6