#### The support.bioconductor.org editor has been updated to markdown! Please see more info at: Tutorial: Updated Support Site Editor

Question: extracting cpm expression from limma fit
0
3.0 years ago by
map208510
United States
map208510 wrote:

I would like to extract mean log2cpm and log2cpm standard deviation per condition after eBayes:

e.g.

.....

design  <-  model.matrix( ~0 + conditions + other.variables )

......

fit  <-  eBayes(fit)

If length(conditions) > 2, then fit$coefficients contains directly interpretable mean log2cpm per condition. However, if length(conditions) == 2, then how do I get the average log2cpm from fit? In this case, "coefficients" gives the fold change, and Amean is the "average" of the log2cpm expression for the two conditions. But how do I obtain the log2cpm per gene per condition? As for the standard deviation of log2cpm expression per gene, per condition, is the following is correct? fit$stdev.unscaled / fit$sigma Thank you, Matthew limma cpm • 622 views ADD COMMENTlink modified 2.7 years ago by José Luis Lavín10 • written 3.0 years ago by map208510 Answer: extracting cpm expression from limma fit 1 3.0 years ago by Aaron Lun22k Cambridge, United Kingdom Aaron Lun22k wrote: I presume you've processed your count data with voom. However, your example doesn't make much sense. If the length of the conditions factor is two, that means you only have two samples. It is impossible to fit to a linear model to this setup with lmFit if the two samples belong to different conditions, as you won't have any residual degrees of freedom. Thus, the only possibility is if the two samples belong the same condition, in which case the coefficient just represents the average log-expression in that condition - no log-fold changes involved. More generally, if you're using an intercept-free model, then the coefficients related to condition should represent the average expression in each condition (with caveats depending on the other.variables). As for the other question, the standard errors of the coefficients (including the average expression within each condition, depending on how you've set up your model) should be: fit$stdev.unscaled * fit$sigma ... if I recall correctly. To use limma's moderation, replace fit$sigma with sqrt(fit$s2.post). ADD COMMENTlink modified 2.9 years ago • written 3.0 years ago by Aaron Lun22k Answer: extracting cpm expression from limma fit 0 2.9 years ago by map208510 United States map208510 wrote: Dear Aaron, Pardon my poor explanation above. I meant: there are two conditions, i.e. nlevels(conditions) = 2. Each has multiple samples, of course. With regards to the Standard Deviation, I thought that: standard error = fit$stdev.unscaled * fit$sigma but I am looking for standard deviation. In the typical case, SD = SE * sqrt(N) . But in our complicated scenario (different library sizes, normalization methods, etc.) I suspect that deriving the standard deviation is much more involved... Also, fit$s2.post is only available after a call to eBayes.  But the object returned by eBayes has contrast (i.e. log2 fold change) coefficients, and thus the s2.post pertains to the fold-change.  However, I am looking for the standard deviation of expression in each condition (not contrast)..

Thank you,

Yes, I was referring to the standard error (this has been fixed above). This would seem to be the more relevant metric - my rule of thumb is that standard errors refer to the variability of your coefficient estimates, while standard deviations refer to the variability of your observations. I'm not sure whether the concept of the standard deviation of an estimate makes any sense. The variability of the coefficient estimate will naturally drop as you get more information (i.e., larger N, hence smaller SE) to obtain that estimate, while the variability of the observations won't change regardless of how many observations you collect.

For your second point, you need to make sure that the coefficients in your design matrix represent the average expression in each condition. If so, then the computed standard errors will represent those of average expression estimates. Either way, the shrunken variances are agnostic to the parameterisation of the design matrix so it shouldn't matter in that respect.

Answer: extracting cpm expression from limma fit
0
2.7 years ago by
Spain
José Luis Lavín10 wrote:

Dear limma team,

First of all, I want to thank you for this wonderful package. This said, I need to ask you a question on the subject mentioned in this post, Standard error calculation from my fit object using the command in the post:

>std.error =  fit$stdev.unscaled * fit$sigma
> std.error
numeric(0)
> fit$stdev.unscaled NULL > fit$sigma
NULL

I'd ask if some experienced limma user may help me figuring out what the problem is.

Here is the structure of my fit object:

> str(fit)

List of 1
$:Formal class 'MArrayLM' [package "limma"] with 1 slot .. ..@ .Data:List of 12 .. .. ..$ : num [1:19568, 1:2] 7.93 9.38 7.46 7.81 10.88 ...
.. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. ..$: chr [1:19568] "ILMN_2417611" "ILMN_2896528" "ILMN_2721178" "ILMN_3033922" ... .. .. .. .. ..$ : chr [1:2] "g1" "g2"
.. .. ..$: int 2 .. .. ..$ : NULL
.. .. ..$:List of 5 .. .. .. ..$ qr   : num [1:24, 1:2] -3.464 0.289 0.289 0.289 0.289 ...
.. .. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. .. ..$: NULL .. .. .. .. .. ..$ : chr [1:2] "g1" "g2"
.. .. .. ..$qraux: num [1:2] 1.29 1 .. .. .. ..$ pivot: int [1:2] 1 2
.. .. .. ..$tol : num 1e-07 .. .. .. ..$ rank : int 2
.. .. .. ..- attr(*, "class")= chr "qr"
.. .. ..$: int [1:19568] 22 22 22 22 22 22 22 22 22 22 ... .. .. ..$ : Named num [1:19568] 0.213 0.156 0.16 0.26 0.135 ...
.. .. .. ..- attr(*, "names")= chr [1:19568] "ILMN_2417611" "ILMN_2896528" "ILMN_2721178" "ILMN_3033922" ...
.. .. ..$: num [1:2, 1:2] 0.0833 0 0 0.0833 .. .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. .. ..$ : chr [1:2] "g1" "g2"
.. .. .. .. ..$: chr [1:2] "g1" "g2" .. .. ..$ : num [1:19568, 1:2] 0.289 0.289 0.289 0.289 0.289 ...
.. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. ..$: chr [1:19568] "ILMN_2417611" "ILMN_2896528" "ILMN_2721178" "ILMN_3033922" ... .. .. .. .. ..$ : chr [1:2] "g1" "g2"
.. .. ..$: int [1:2] 1 2 .. .. ..$ : Named num [1:19568] 7.85 9.39 7.46 7.89 10.86 ...
.. .. .. ..- attr(*, "names")= chr [1:19568] "ILMN_2417611" "ILMN_2896528" "ILMN_2721178" "ILMN_3033922" ...
.. .. ..$: chr "ls" .. .. ..$ : num [1:24, 1:2] 1 1 1 1 1 1 1 1 1 1 ...
.. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. ..$: NULL .. .. .. .. ..$ : chr [1:2] "g1" "g2"

Please accept my apologies if this can be considered a very naive question...

Best wishes

JL

Here is my session info (I couldn't include it in the previous post...

And here is my session info:

Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252    LC_MONETARY=Spanish_Spain.1252
[4] LC_NUMERIC=C                   LC_TIME=Spanish_Spain.1252

attached base packages:
[1] grid      stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] lumiMouseIDMapping_1.10.0 lumiMouseAll.db_1.22.0    org.Mm.eg.db_3.2.3
[4] fields_8.3-5              maps_3.0.0-2              spam_1.3-0
[7] igraph_1.0.1              gtools_3.5.0              gplots_2.17.0
[10] KEGG.db_3.2.2             KEGGgraph_1.28.0          GOstats_2.36.0
[13] Category_2.36.0           Matrix_1.2-2              lumiHumanIDMapping_1.10.0
[16] annotate_1.48.0           XML_3.98-1.3              lumiHumanAll.db_1.22.0
[19] limma_3.26.3              lumi_2.22.0               Rgraphviz_2.14.0
[22] graph_1.48.0              SparseM_1.7               clusterProfiler_2.99.1
[25] DOSE_2.9.7                GO.db_3.2.2               org.Hs.eg.db_3.2.3
[28] RSQLite_1.0.0             DBI_0.3.1                 AnnotationDbi_1.32.0
[31] IRanges_2.4.4             S4Vectors_0.8.3           Biobase_2.30.0
[34] AnnotationHub_2.2.2       BiocGenerics_0.16.1