Entering edit mode
cookm346
•
0
@cookm346-10665
Last seen 8.5 years ago
I know this question has been asked before, but even reading through older posts I cannot replicate the LogFC output in R.
design <- cbind(Grp1=1,Grp2vs1=c(0,0,0,1,1,1))
data <- 1:6
fit <- lmFit(data, design)
fit <- eBayes(fit)
topTable(fit, coef=2)
I get a LogFC of 3. Can you please show me how this is calculated? Thank you so much.
Hi,
I have a doubt, the fold change is defined by the following formula:
FC = mean(condition_A_replicates)n/ mean(control_replicates)n
N : number of replicates
where the replicate expression values are not log2-transformed and the final FC value is log2-transformed to make more readable the up (positive log2fc) and down (negative log2fc) genes.
According to logarithmic rules the log2[(mean(A_n)/mean(Ctrl_n)] = log2(mean(A_n)) - log2(mean(Ctrl_n)).
As you said, limFit function performs the "log2fc" making the difference between the averages considering directly the log2-transformed values, but I see an inconsistency with the logarithmic rule, in fact, the following expression
log2(mean(A_n)) - log2(mean(Ctrl_n)) is different from mean(log2(A_n)) - mean( log2(Ctrl_n)), in other words the difference between the logarithm of the average of treated condition(A) and control condition is not the same as the difference between the mean of logarithmic values of treated condition(A) and control condition. Just to be more clear, given three expression values x1, x2, x3
According log2fc definition (Result 1)
mean(x1,x2,x3) = (x1+x2+x3 )/3
log2(mean(x1,x2,x3)) = log2( (x1+x2+x3 )/3) = log2( x1+x2+x3 )) -log2(3)
according limFit function (result 2)
mean(log2(x1),log2(x2),log2(x3)) = log2(x1) + log2(x2) + log2(x3))/3 = log2(x1*x2*x3)/3
comparing the result 1 and 2
log2( x1+x2+x3 )) -log2(3) and log2(x1*x2*x3)/3
we can observe that matematically the results are not the same.
I am aware that the results differ very little, anyway, I was wondering if my observation is correct and why is the result of limFit called "log2fc" even if it is not correctly computed from a mathematic point of view?
Thanks in advance for any clarification.
Elisa
This is not a significant issue. The difference in mean log-values is a good approximation for the log-fold change in many cases, especially when the coefficient of variation for the original data is low and/or constant with respect to the mean. You can convince yourself of this by writing out the second-order Taylor approximation for the expectation of the difference of the log-values. You will see that the terms corresponding to the discrepancy cancel out if the CV is constant, or can be ignored altogether if the CV is low.
In short, the field is called
LogFC
because the difference in the log-values is used as an estimate of the log-fold change. There may be some minor "bias" in there caused by computing means on the log-scale; but nonetheless, the resulting value is an estimate of the log-fold change so the name is appropriate.I would also add that your naive definition of the log-fold change is not appropriate in cases where one or both groups contains small or zero counts. Your definition would either be undefined, if one group contained all zeroes; or highly imprecise, if one group contained low counts. To protect against this, all DE analysis frameworks perform some kind of shrinkage to stabilize the log-fold change estimates. Thus, even log-fold changes from count-based models will differ from your definition - and I would argue that the "correctness" of your approach is largely meaningless if the reported log-fold changes are not actually useful.
Thanks for your quick reply and explanation.
No, the formula you give is not the definition of "fold change ". The formula you give is just one possible formula for estimating the true fold change. The limma computation is generally closer to the true log-fold-change (in the sense of minimising the expected squared error) than is the estimator one would get by logging your formula for the FC.
If the expression values were log-normally distributed (which is approximately true for microarray data) then the limma computation is theoretically optimal (best possible) for estimating the log-fold-change.