Question: Further analysis of gene expression data
gravatar for bharata1803
2.5 years ago by
bharata180320 wrote:


So, I use voom/limma workflow to analyse my RNA-seq data. Now, I want to do further analysis, like measuring genes correlation and try some machine learning method (regression, random forest, etc.). My question is, is the output of voom step can be used for this kind of analysis? The voom step I mean is like below:

vRnaSeq <- voom(dgeList,designVoom,plot=TRUE)

exprs <- vRnaSeq$E


From the code above, what I understand is the exprs variable is a matrix of genes x sample for the log of gene expression. Is this data good for further analysis or I need to use raw read count? What kind of analysis is good and not good to be applied for this data? Maybe you can give some criteria for analysis type which good and not good based on log of gene expression data. Also, if you know some papers about machine learning analysis for gene expression data, maybe you can tell me which to read. Thank you.


ADD COMMENTlink modified 2.5 years ago by Aaron Lun19k • written 2.5 years ago by bharata180320
gravatar for Aaron Lun
2.5 years ago by
Aaron Lun19k
Cambridge, United Kingdom
Aaron Lun19k wrote:

You'll have to be more specific about what downstream analysis you want to apply. If the downstream method expects count data, then you need to supply the raw counts. If it expects continuous observations, then supplying the log-CPMs (computed from voom or via the cpm function) may work well if the counts are reasonably large. This is probably the case for most methods I can think of (e.g., PCA, t-SNE, Euclidean distance-based clustering), but you'll have to check for whatever method you want to apply.

Obviously, the log-CPMs are implicitly normalized for library size and composition biases, which ensures that those factors don't influence your downstream methods (e.g., avoid clustering based on library size). The log-transformation also provides some rough variance stabilisation, which ensures that your inferences aren't dominated by large counts with high variances. However, the raison d'être of voom is the calculation of precision weights - this will only be useful if your downstream methods are able to accept weights.

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by Aaron Lun19k

Thank you for your answer. Basically, I still don't know what I want to do and I with your explanation, it helps me to choose which method and which type of data. Currently, I'm trying to analyse correlation between gene. I want to check whether which gene is strongly correlated with some target genes. Also, I still don't understand about precision weights. I think I need to read the paper, but if you could explain in a simple way, I would really thank you.

ADD REPLYlink written 2.5 years ago by bharata180320
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 309 users visited in the last hour