Pearson correlation between gene expression and phenotype
Entering edit mode
weichengz • 0
Last seen 3.0 years ago
Melbourne, Australia

In addition to find DEGs, I was hoping to using RNA-seq count data to do correlation analysis (Pearson correlation) between gene expression level and a specific phenotype across samples. In order to do that, I have to extract count info (as an indicator of gene expression level) of my interest genes. I used EdgeR and after creating the raw count matrix, I followed the steps:

keep <- filterByExpr(y)


y <- y[keep, , keep.lib.sizes=FALSE]


#Apply TMM (trimmed mean of M-values) normalization to normalise gene expression distributions and eliminate the composition biases between libraries
y <- calcNormFactors(y,method = "TMM")

Is the count table from y$counts the right one I can use for further correlation analysis?

edgeR RNASeq • 1.8k views
Entering edit mode
Last seen 10 hours ago
WEHI, Melbourne, Australia

The whole purpose of edgeR is to correlate gene expression with phenotype, so to do so you just follow the usual edgeR pipeline. If the phenotype is numeric, then you simply create a design matrix:

design <- model.matrix(~phenotype)

and find DE genes in the usual edgeR way. The genes that are DE are correlated with the phenotype.

Your proposal to compute Pearson correlations is an ad hoc way of doing the same thing. If you wanted to do that, you certainly couldn't use the count matrix, you'd use cpms. See the User's Guide for how to compute cpms.

Entering edit mode

Thanks Gordon, I followed the edgeR pipeline and have ready got my DEG list. The DEGs were generated from design <- model.matrix(~conditions)and the condition is a categorical variable (Control vs. Treatment), not numeric. Just to clarify that if I am still interested in correlation between gene expression and a numeric phenotype (say a particular hormone concentration) in the sample, and this numeric phenotype was not included in model.matrix before. Is getting cpms still the best approach for this correlation analysis? Is cmps the normalised count table?

Entering edit mode

My answer is the same today as it was yesterday. Just put phenotype in the design matrix, for example by

design <- model.matrix(~conditions+phenotype)

Login before adding your answer.

Traffic: 775 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6