Dear Bioconductor Community,
i would like to ask for a specific question(maybe a bit) naive regarding the interprentation of log-fc values of specific contrasts from my object created after the eBayes function.
In detail, i have performed a paired analysis in limma regarding cancer vs adjucent control samples-60 samples and in total 30 patients
condition <- factor(eset.2$Disease, levels=c("Normal","Cancer"))
pairs <- factor(rep(1:30, each = 2))
design <- model.matrix(~condition +pairs)
head(design)
(Intercept) conditionCancer pairs2 pairs3 pairs4 pairs5 pairs6 pairs7 pairs8 pairs9
1 1 0 0 0 0 0 0 0 0 0
2 1 1 0 0 0 0 0 0 0 0
3 1 0 1 0 0 0 0 0 0 0
4 1 1 1 0 0 0 0 0 0 0
5 1 0 0 1 0 0 0 0 0 0
6 1 1 0 1 0 0 0 0 0 0
pairs10 pairs11 pairs12 pairs13 pairs14 pairs15 pairs16 pairs17 pairs18 pairs19
1 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0
pairs20 pairs21 pairs22 pairs23 pairs24 pairs25 pairs26 pairs27 pairs28 pairs29
1 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0
pairs30
1 0
2 0
3 0
4 0
5 0
6 0...
fit <- lmFit(eset.2,design)
keep <- fit$Amean > median(fit$Amean)
fit2 <- eBayes(fit[keep,],trend=TRUE)
So, based on the fact that the rows of my phenoData information have exactly the same row(as they should have regarding the pairs argument), in order to perform the paired analysis, so:
head(fit2$coefficients)
(Intercept) conditionCancer pairs2 pairs3 pairs4 pairs5 .....
10_at 10.702103 -2.59651160 0.02887899 -2.2916191 -0.27313819 -0.9222351
10001_at 7.520263 0.45086268 0.21904745 -0.1089900 0.09669243 0.1641267
10005_at 8.507699 -0.36371671 0.64127830 0.2583321 -0.32163256 0.7301137
10006_at 10.568751 -0.02002545 0.31434534 -0.3029731 -0.27448646 0.1310583
10007_at 8.552246 0.70665240 0.59376462 0.3137518 0.28487314 0.6358412
10010_at 7.899482 -0.09973025 0.54297155 0.2390233 0.26514041 -0.2023462
Thus, except the "conditionCancer" coefficient which is the most important, i would also like to inspect the log-fcs between in patient(pair) to have some further exploration. So, as i can see from above, the "pairs1"-that is the first two samples are used as the intercept, while each of the following pairs-represents the log-fold change between each cancer vs its adjucent control correct ?
head(pData(eset.2))
Disease Location Meta_ factor Study
St_1_WL57.CEL Normal sigmoid_colon 0 hgu133plus2
St_2_WL57.CEL Cancer sigmoid_colon 0 hgu133plus2
St_N_EC59.CEL Normal sigmoid_colon 0 hgu133plus2
St_T_EC59.CEL Cancer sigmoid_colon 0 hgu133plus2
St_N_EJ58.CEL Normal cecum 0 hgu133plus2
St_T_EJ58.CEL Cancer cecum 0 hgu133plus2
But why when i manually used:
eset.3 <- eset.2[keep,] # to have the same probesets with the same order as the fit2 object
and then test manually one probeset-the first :
mean(exprs(eset.3)[1,4]) - mean(exprs(eset.3)[1,3]) = -0.2808175 # for the 4rth and 3rd sample consisting the pairs2
returns a different result from the first row of "pairs2" regarding the 10_at probeset, which is the first line (0.02887899 from above). So why they are different ? or i miss something and it is complete other explanation? these should not be the same for the log-fold change value ?
Dear Ryan C. Tompson,
thank you for your recommendation. i wrongfully assumed that because the analysis i perform is actually a paired(moderated) t-test, the resulting pairs will represent the log-fold changes between the values !!
So, if i understand well , the first level of the pairs factor used as the intercept, is also used for the logFCs you mention above correctly ? That is the difference between each of the other groups of patients vs the first group (i.e for the above mentioned log fold change, is produced: mean(exprs(eset.2)[1,1:2]) - mean(exprs(eset.2)[1,3:4])
Thus, the factor pairs is actually used for the addictive model you mention to include patient-pair effects in the linear modeling, while the coefficients returned is something complete different and represents the correct difference you mention !!
Finally, i only have one final question(which i also checked but to be certain): So with or without the use of pairs in the design matrix(paired or non-paired), the actual important coefficient of cancer vs normal log fc remains the same("conditionCancer" )-but the other statistics change correct(i.e. t statistic, p-values, B statistic) ?
Yes, I think you've got it more or less correct. Your formula for the inter-patient logFC is probably slightly different from the actual fitted coefficient, because the linear model fits all the coefficients simultaneously, rather than compute each one individually using formulas like the one you give.
I see- i followed the limmas vignette guide on page 42(9.4.1) (and with some valuable help from the specialists on the field on previous posts on paired samples), where also the samples in the phenoData object belong to the same patient with the same order.