Pearson correlation between gene expression and phenotype
1
0
Entering edit mode
weichengz • 0
@weichengz-23557
Last seen 21 months ago
Melbourne, Australia

In addition to find DEGs, I was hoping to using RNA-seq count data to do correlation analysis (Pearson correlation) between gene expression level and a specific phenotype across samples. In order to do that, I have to extract count info (as an indicator of gene expression level) of my interest genes. I used EdgeR and after creating the raw count matrix, I followed the steps:


#Filtering
keep <- filterByExpr(y)

table(keep)

y <- y[keep, , keep.lib.sizes=FALSE]

dim(y)

#Apply TMM (trimmed mean of M-values) normalization to normalise gene expression distributions and eliminate the composition biases between libraries
y <- calcNormFactors(y,method = "TMM")
y$counts  Is the count table from y$counts the right one I can use for further correlation analysis?

edgeR RNASeq • 659 views
0
Entering edit mode
@gordon-smyth
Last seen 50 minutes ago
WEHI, Melbourne, Australia

The whole purpose of edgeR is to correlate gene expression with phenotype, so to do so you just follow the usual edgeR pipeline. If the phenotype is numeric, then you simply create a design matrix:

design <- model.matrix(~phenotype)


and find DE genes in the usual edgeR way. The genes that are DE are correlated with the phenotype.

Your proposal to compute Pearson correlations is an ad hoc way of doing the same thing. If you wanted to do that, you certainly couldn't use the count matrix, you'd use cpms. See the User's Guide for how to compute cpms.

0
Entering edit mode

Thanks Gordon, I followed the edgeR pipeline and have ready got my DEG list. The DEGs were generated from design <- model.matrix(~conditions)and the condition is a categorical variable (Control vs. Treatment), not numeric. Just to clarify that if I am still interested in correlation between gene expression and a numeric phenotype (say a particular hormone concentration) in the sample, and this numeric phenotype was not included in model.matrix before. Is getting cpms still the best approach for this correlation analysis? Is cmps the normalised count table?

0
Entering edit mode

My answer is the same today as it was yesterday. Just put phenotype in the design matrix, for example by

design <- model.matrix(~conditions+phenotype)