Question: Problem regarding the explanation of log-fold changes of individual coefficients in limma after eBayes
1
4.2 years ago by
svlachavas740
Greece/Athens/National Hellenic Research Foundation
svlachavas740 wrote:

Dear Bioconductor Community,

i would like to ask for a specific question(maybe a bit) naive regarding the interprentation of log-fc values of specific contrasts from my object created after the eBayes function.

In detail, i have performed a paired analysis in limma regarding cancer vs adjucent control samples-60 samples and in total 30 patients

condition <- factor(eset.2$Disease, levels=c("Normal","Cancer")) pairs <- factor(rep(1:30, each = 2)) design <- model.matrix(~condition +pairs) head(design) (Intercept) conditionCancer pairs2 pairs3 pairs4 pairs5 pairs6 pairs7 pairs8 pairs9 1 1 0 0 0 0 0 0 0 0 0 2 1 1 0 0 0 0 0 0 0 0 3 1 0 1 0 0 0 0 0 0 0 4 1 1 1 0 0 0 0 0 0 0 5 1 0 0 1 0 0 0 0 0 0 6 1 1 0 1 0 0 0 0 0 0 pairs10 pairs11 pairs12 pairs13 pairs14 pairs15 pairs16 pairs17 pairs18 pairs19 1 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 pairs20 pairs21 pairs22 pairs23 pairs24 pairs25 pairs26 pairs27 pairs28 pairs29 1 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 pairs30 1 0 2 0 3 0 4 0 5 0 6 0... fit <- lmFit(eset.2,design) keep <- fit$Amean > median(fit$Amean) fit2 <- eBayes(fit[keep,],trend=TRUE) So, based on the fact that the rows of my phenoData information have exactly the same row(as they should have regarding the pairs argument), in order to perform the paired analysis, so: head(fit2$coefficients)
(Intercept)         conditionCancer     pairs2     pairs3        pairs4          pairs5    .....
10_at      10.702103     -2.59651160 0.02887899 -2.2916191 -0.27313819 -0.9222351
10001_at    7.520263      0.45086268 0.21904745 -0.1089900  0.09669243  0.1641267
10005_at    8.507699     -0.36371671 0.64127830  0.2583321 -0.32163256  0.7301137
10006_at   10.568751     -0.02002545 0.31434534 -0.3029731 -0.27448646  0.1310583
10007_at    8.552246      0.70665240 0.59376462  0.3137518  0.28487314  0.6358412
10010_at    7.899482     -0.09973025 0.54297155  0.2390233  0.26514041 -0.2023462

Thus, except the "conditionCancer" coefficient which is the most important, i would also like to inspect the log-fcs between in patient(pair) to have some further exploration. So, as i can see from above, the "pairs1"-that is the first two samples are used as the intercept, while each of the following pairs-represents the log-fold change between each cancer vs its adjucent control correct ?

Disease      Location Meta_        factor       Study
St_1_WL57.CEL  Normal         sigmoid_colon           0 hgu133plus2
St_2_WL57.CEL  Cancer         sigmoid_colon           0 hgu133plus2
St_N_EC59.CEL  Normal         sigmoid_colon           0 hgu133plus2
St_T_EC59.CEL  Cancer         sigmoid_colon           0 hgu133plus2
St_N_EJ58.CEL  Normal          cecum                     0 hgu133plus2
St_T_EJ58.CEL  Cancer          cecum                    0 hgu133plus2

But why when i manually used:

eset.3 <- eset.2[keep,] # to have the same probesets with the same order as the fit2 object

and then test manually one probeset-the first :

mean(exprs(eset.3)[1,4]) - mean(exprs(eset.3)[1,3])-0.2808175 # for the 4rth and 3rd sample consisting the pairs2

returns a different result from the first row of "pairs2" regarding the 10_at probeset, which is the first line (0.02887899 from above). So why they are different ? or i miss something and it is complete other explanation? these should not be the same for the log-fold change value ?

modified 4.2 years ago by Ryan C. Thompson7.4k • written 4.2 years ago by svlachavas740
Answer: Problem regarding the explanation of log-fold changes of individual coefficients
2
4.2 years ago by
Scripps Research, La Jolla, CA
Ryan C. Thompson7.4k wrote:

> Thus, except the "conditionCancer" coefficient which is the most important, i would also like to inspect the log-fcs between in patient(pair) to have some further exploration. So, as i can see from above, the "pairs1"-that is the first two samples are used as the intercept, while each of the following pairs-represents the log-fold change between each cancer vs its adjucent control correct ?

No, you've interpreted the coefficients incorrectly. Your design is ~condition + pairs, which is an additive model. This means that there is only one coefficient for the condition, and it represents the average logFC of cancer vs normal across all patients. The remaining coefficients (the pairs ones) represent the logFC between patient 1 and each other patient, and are likely of little interest to you.

Dear Ryan C. Tompson,

thank you for your recommendation. i wrongfully assumed that because the analysis i perform is actually a paired(moderated) t-test, the resulting pairs will represent the log-fold changes between the values !!

So, if i understand well , the first level of the pairs factor used as the intercept, is also used for the logFCs you mention above correctly ? That is the difference between each of the other groups of patients vs the first group (i.e for the above mentioned log fold change, is produced: mean(exprs(eset.2)[1,1:2]) - mean(exprs(eset.2)[1,3:4])

Thus, the factor pairs is actually used for the addictive model you mention to include patient-pair effects in the linear modeling, while the coefficients returned is something complete different and represents the correct difference you mention !!

Finally, i only have one final question(which i also checked but to be certain): So with or without the use of pairs in the design matrix(paired or non-paired), the actual important coefficient of cancer vs normal log fc remains the same("conditionCancer" )-but the other statistics change correct(i.e. t statistic, p-values, B statistic) ?

Yes, I think you've got it more or less correct. Your formula for the inter-patient logFC is probably slightly different from the actual fitted coefficient, because the linear model fits all the coefficients simultaneously, rather than compute each one individually using formulas like the one you give.