Question

Differences between limma voom E values and edgeR cpm values?

7

Entering edit mode

John Brothers II ▴ 90

@john-brothers-ii-6579

Last seen 5.8 years ago

Cambridge, MA

Hello,

I have a quick question about E-values in voom versus cpm from edgeR

E-values from voom are calculated in the following way ->
t(log2(t(counts + 0.5)/(lib.size + 1) * 1e+06))

If I understand this correctly, this is log2 counts per million of counts with a pseudo-count of 0.5, normalized on the library size + 2 * pseudocount (which was manually set to 0.5)

However, the cpm function in edgeR is slightly different when you want use
cpm(x, log=T, prior.count=0.5).

It calculates the following:

# First scales the prior.count/pseudo-count and adds 2x the scaled prior count to the libsize
prior.count.scaled <- lib.size/mean(lib.size)*prior.count
lib.size <- lib.size+2*prior.count.scaled
lib.size <- 1e-6*lib.size
# Calculates log2 log2(t( (t(x)+prior.count.scaled) / lib.size ))

Is there a reason the pseudocount/prior-count is able to be set by the user and then scaled to library size in the edgeR cpm function, but is manually set as 0.5 regardless of library size in voom?

That's the only difference I see between the E-value calculation and the cpm function (and when I choose a value for the prior.count that returns a prior.count.scaled value equal to 0.5, I then get the same values for cpm in edgeR as I would when using voom E values).

Thanks,
John

edgeR • 7.9k views

ADD COMMENT • link updated 9.8 years ago by Gordon Smyth 51k • written 10.2 years ago by John Brothers II ▴ 90

score 15 · Accepted Answer · 2014-06-04

Dear John,

Yes, it is true that you can't reproduce exactly the voom log-cpm values in edgeR. The reasons for this are somewhat subtle.

First, why does edgeR allow choice of prior.count while voom presets it at 0.5? The edgeR logCPM values are only for descriptive purposes, so it is easy to compute it in different ways. Allowing a choice of prior.count values allows users to choose where they want to be on the noise-bias trade-off spectrum. Choosing a large prior.count may be valuable to damp down the variability of small count cpm values. In voom, changes to prior.count cannot easily be made because it would affect the whole downstream analysis process. Other prior.count values may not give the nice predictable mean-variance trend that we see with 0.5. Nor does voom need the different choices, because it is able to deal with decreased precision at low count values by assigning lower precision weights.

Why is the prior.count scaled to library size in edgeR? Because this ensures that any fold change that was equal to 1 before the prior.count was added stays equal to 1 after adding. In voom, however, the prior.counts are not scaled because the mean-variance modelling in voom requires the size of the counts to have an absolute meaning, not relative to library size. Scaling the prior.count interferes with the variance modelling. Empirical testing shows the voom performs very well for very unequal library sizes, so the cost of not scaling doesn't seem to be great.

Best wishes
Gordon