I'm using edgeR
to do differential expression analysis. In the output, there is this column logFC
. I'm wondering how it was calculated.
Let's use one of my real cases for example. I have the expression of one gene from 6 treated samples and 6 control samples. The raw counts are as follows:
treated samples: 411 359 497 349 1091 861
control samples: 18 5 17 13 26 27
And the normalized counts (normalized using "TMM" integrated in edgeR) were generated as follows
exp.normalized.counts <- calcNormFactors(exp.raw.counts, method="TMM")
cpm.normalized.counts <- cpm(exp.normalized.counts)
And what I got from cpm.normalized.counts
for this gene is:
treated samples: 29.86926837 26.2474782 36.72150731 19.91655285 66.49821842 56.14122252
control samples: 1.193338995 0.423771513 1.353243931 1.081332845 2.074234472 2.15046386
Supposing that the logFC is calculated as dividing the mean of treat
by the mean of control
, and then log2. Then the logFC calculated (I manually calculated with the numbers above) from the raw counts is: 5.072979445, and logFC calculated from the normalized counts is: 4.82993439
But the logFC in the output from edgeR is: 4.8144125776515
It isn't the same as neither of what I manually calculated results (it's slightly different from what I got from the normalized counts though). So I'm wondering how edgeR exactly calculates the logFC...
That makes sense. I should have thought of this LOL
Though you can see that in a very simple case, with no other experimental factors being corrected for, and counts that are not near noise level, the manually calculated value is pretty darn close to what fancy software tells you.
Yep, pretty close to what I got with the normalized counts