Question

DESeq2, why are the signs of EDASeq offsets flipped compared to CQN ?

0

Entering edit mode

Saulius ▴ 10

@saulius-24060

Last seen 4.6 years ago

Germany

From the DESeq2 vignette, the following snippet provides the example of how cqn offsets can be used in the analysis:

cqnOffset <- cqnObject$glm.offset
cqnNormFactors <- exp(cqnOffset)

Meanwhile, the EDAseq normfactors should be used after flipping their sign (note the -1):

EDASeqNormFactors <- exp(-1 * EDASeqOffset)

The same usage is echoed in the documentation of these tools. However, from my understanding both cqn and EDASeq return offsets that are of the following form:

offset = log(normed_counts) - log(actual_counts)

See page 4 of cqn vignette (starting "the normalized expression values are", rearrange to move y (equiv. to log2(actual_counts)) left of equation to get the expression above), and see section 7.3 of EDASeq vignette which uses this expression as is.

I may be missing something very obvious here, but, given this, why multiply the EDASeq offset by -1? Or, equivalently, why not do that for cqn offsets?

DESeq2 cqn EDASeq • 2.0k views

ADD COMMENT • link 4.6 years ago Saulius ▴ 10

score 2 · Accepted Answer · 2021-05-10

2

Entering edit mode

Michael Love 43k

@mikelove

Last seen 10 days ago

United States

Consider the EDASeq equation:

o = log(y_norm) - log(y_raw)

So o is negative when y_raw is larger than expected and positive when y_raw is smaller than expected, due to technical factors.

Now think about the scaling factor s_ij in DESeq2. s_ij is large when the count (raw) is larger than expected and small when the count is smaller than expected, due to technical factors.

ADD COMMENT • link 4.6 years ago Michael Love 43k

0

Entering edit mode

But wouldn't the same be true for cqn offsets then? Why don't we flip those?

From cqn vignette:

RPKM.cqn <- cqn.subset$y + cqn.subset$offset

Which is equivalent to (as $y should be log(y_raw) and RPKM.cqn is equivalent to log(y_norm))
giving

log(y_norm) = log(y_raw) + o

which gives back

o = log(y_norm) - log(y_raw)

ADD REPLY • link 4.6 years ago Saulius ▴ 10

0

Entering edit mode

I don't use offset in the DESeq2 vignette. I use glm.offset.

See the last section the cqn vignette.

ADD REPLY • link 4.6 years ago Michael Love 43k

0

Entering edit mode

Yes - this is the key I now realise.

The last section of the vignette does not really explain how exactly glm.offset is different, nor should be used instead of offset, so I had a look at the code:

glm.offset <- sweep(-offset, 2, -log2(sizeFactors / 10^6))
glm.offset <- log(2)*glm.offset

Note that sweep function subtracts by default so the minus in the last term cancels out, giving

glm.offset = log(2) * (-offset + log2(size_factors / 10^6))

So there are three distinct steps happening in the glm.offset calculation:

Sign of offset is flipped (- offset)
The library size scaling is now made explicit in the offset (+log2(size_factors /10^6)) (this statement was edited, see comments below).
Offset is converted to natural logarithm (log(2) *)

So this answers the question of why EDASeq and cqn are treated differently in DESeq2, and the answer for that is that cqn does some extra work in the package to make it compatible, EDASeq doesn't.

ADD REPLY • link 4.6 years ago Saulius ▴ 10

0

Entering edit mode

Re: (2), I read the code and docs as adding in library size scaling, not removing.

This is because, after (1) the offset is in the direction of the bias, e.g. higher raw count, higher offset. So by adding log2 of size factor, this is going in the direction of the sequencing depth bias.

Also, the man page implies this as well:

glm.offset: An offset useful for supplying to a GLM type model function. It is on the natural log scale and includes correcting for sizeFactors.

ADD REPLY • link 4.6 years ago Michael Love 43k

0

Entering edit mode

Yes - I think you are right once more. Thanks for taking so much time to look into this.

Size factors come in twice in the code of cqn: at the very beginning, and in the snippet I pasted before. At the beginning, they are subtracted away from the counts (the $y variable is actually in counts per million). This makes the effective offset (from raw counts) to be equal to offset - log2(size_factors/10^6). Taking the negative of this effective offset results into -offset + log2(size_factors/10^6) which is what we see in the equation above.

I'll edit my previous answer to have the correct information in case somebody from the future is also wondering about this.

ADD REPLY • link 4.6 years ago Saulius ▴ 10