Question

Further analysis of gene expression data

0

Entering edit mode

bharata1803 ▴ 60

@bharata1803-7698

Last seen 5.0 years ago

Japan

Hello,

So, I use voom/limma workflow to analyse my RNA-seq data. Now, I want to do further analysis, like measuring genes correlation and try some machine learning method (regression, random forest, etc.). My question is, is the output of voom step can be used for this kind of analysis? The voom step I mean is like below:

dgeList<-DGEList(readCountClean)
dgeList<-calcNormFactors(dgeList)
vRnaSeq <- voom(dgeList,designVoom,plot=TRUE)

exprs <- vRnaSeq$E

From the code above, what I understand is the exprs variable is a matrix of genes x sample for the log of gene expression. Is this data good for further analysis or I need to use raw read count? What kind of analysis is good and not good to be applied for this data? Maybe you can give some criteria for analysis type which good and not good based on log of gene expression data. Also, if you know some papers about machine learning analysis for gene expression data, maybe you can tell me which to read. Thank you.

limma voom rnaseq • 1.1k views

ADD COMMENT • link updated 8.3 years ago by Aaron Lun ★ 28k • written 8.3 years ago by bharata1803 ▴ 60

score 0 · Answer 1 · 2016-01-05

You'll have to be more specific about what downstream analysis you want to apply. If the downstream method expects count data, then you need to supply the raw counts. If it expects continuous observations, then supplying the log-CPMs (computed from voom or via the cpm function) may work well if the counts are reasonably large. This is probably the case for most methods I can think of (e.g., PCA, t-SNE, Euclidean distance-based clustering), but you'll have to check for whatever method you want to apply.

Obviously, the log-CPMs are implicitly normalized for library size and composition biases, which ensures that those factors don't influence your downstream methods (e.g., avoid clustering based on library size). The log-transformation also provides some rough variance stabilisation, which ensures that your inferences aren't dominated by large counts with high variances. However, the raison d'être of voom is the calculation of precision weights - this will only be useful if your downstream methods are able to accept weights.