Following the voom-limma workflow and using RNA-seq count data, I created the DGEList, ran the calcNormFactor (TMM) step, ran Voom step, and got the v object. I then extracted the $E object from v.
Here is my code:
x <- DGEList(counts=RNA)
x <- calcNormFactors(x, method = "TMM")
par(mfrow=c(1,2))
v <- voom(x, design, plot=TRUE)
Elist <-v$E
My questions are: 1) expression profile in Elist has performed log(CPM) and been normalized for library size, right?; 2) Has the expression profile in Elist incorporated the precision weights (represented as $weights in v object) yet? Thank you.
You could just read the help page for voom to find this out. From ?voom
Details:
This function is intended to process RNA-Seq or ChIP-Seq data
prior to linear modelling in limma.
voom is an acronym for mean-variance modelling at the
observational level. The key concern is to estimate the
mean-variance relationship in the data, then use this to compute
appropriate weights for each observation. Count data almost show
non-trivial mean-variance relationships. Raw counts show
increasing variance with increasing count size, while log-counts
typically show a decreasing mean-variance trend. This function
estimates the mean-variance trend for log-counts, then assigns a
weight to each observation based on its predicted variance. The
weights are then used in the linear modelling process to adjust
for heteroscedasticity.
voom performs the following specific calculations. First, the
counts are converted to logCPM values, adding 0.5 to all the
counts to avoid taking the logarithm of zero. The matrix of logCPM
values is then optionally normalized. The lmFit function is used
to fit row-wise linear models. The lowess function is then used
to fit a trend to the square-root-standard-deviations as a
function of average logCPM. The trend line is then used to predict
the variance of each logCPM value as a function of its fitted
value, and the inverse variances become the estimated precision
weights.
But I should point out that the object you call "v" is an EList that has the weights, and the thing you call Elist is a regular matrix that just contains the E list item from v.
You should not strip things out of objects like that unless you know what you are doing. The EList object (called v) is something that lmFit will know how to use, and will do something sensible with. If you use your Elist object with lmFit, it won't know about the weights, and will fit a conventional unweighted model.
1) expression profile in Elist has performed log(CPM) and been normalized for library size, right?
Yes
2) Has the expression profile in Elist incorporated the precision weights (represented as $weights in v object) yet?
No
Note that if you want to use the expression data in your voomed "v" object for anything except the limma differential expression pipeline, the recommendation is to rather get logged cpms from your DGEList using a higher prior.count value, ie.
E <- cpm(x, log=TRUE, prior.count=5)
Or you can take your counts matrix and shoot it through DESeq2::vst
Thank you very much for your helpful response, Steve, especially the comment on DESeq2::vst. You helped me solve the problem I was trying to figure out.
But I should point out that the object you call "v" is an EList that has the weights, and the thing you call Elist is a regular matrix that just contains the E list item from v.
You should not strip things out of objects like that unless you know what you are doing. The EList object (called v) is something that
lmFit
will know how to use, and will do something sensible with. If you use your Elist object withlmFit
, it won't know about the weights, and will fit a conventional unweighted model.