Further analysis of gene expression data
1
0
Entering edit mode
bharata1803 ▴ 60
@bharata1803-7698
Last seen 5.7 years ago
Japan

Hello,

So, I use voom/limma workflow to analyse my RNA-seq data. Now, I want to do further analysis, like measuring genes correlation and try some machine learning method (regression, random forest, etc.). My question is, is the output of voom step can be used for this kind of analysis? The voom step I mean is like below:

dgeList<-DGEList(readCountClean)
dgeList<-calcNormFactors(dgeList)
vRnaSeq <- voom(dgeList,designVoom,plot=TRUE)

exprs <- vRnaSeq$E

 

From the code above, what I understand is the exprs variable is a matrix of genes x sample for the log of gene expression. Is this data good for further analysis or I need to use raw read count? What kind of analysis is good and not good to be applied for this data? Maybe you can give some criteria for analysis type which good and not good based on log of gene expression data. Also, if you know some papers about machine learning analysis for gene expression data, maybe you can tell me which to read. Thank you.

 

limma voom rnaseq • 1.2k views
ADD COMMENT
0
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 43 minutes ago
The city by the bay

You'll have to be more specific about what downstream analysis you want to apply. If the downstream method expects count data, then you need to supply the raw counts. If it expects continuous observations, then supplying the log-CPMs (computed from voom or via the cpm function) may work well if the counts are reasonably large. This is probably the case for most methods I can think of (e.g., PCA, t-SNE, Euclidean distance-based clustering), but you'll have to check for whatever method you want to apply.

Obviously, the log-CPMs are implicitly normalized for library size and composition biases, which ensures that those factors don't influence your downstream methods (e.g., avoid clustering based on library size). The log-transformation also provides some rough variance stabilisation, which ensures that your inferences aren't dominated by large counts with high variances. However, the raison d'être of voom is the calculation of precision weights - this will only be useful if your downstream methods are able to accept weights.

ADD COMMENT
0
Entering edit mode

Thank you for your answer. Basically, I still don't know what I want to do and I with your explanation, it helps me to choose which method and which type of data. Currently, I'm trying to analyse correlation between gene. I want to check whether which gene is strongly correlated with some target genes. Also, I still don't understand about precision weights. I think I need to read the paper, but if you could explain in a simple way, I would really thank you.

ADD REPLY

Login before adding your answer.

Traffic: 959 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6