Question

DESeq2 size factors change with fixed geometric means

0

Entering edit mode

Megatron • 0

@megatron-15960

Last seen 4.3 years ago

RNA-Seq count size factors are defined in formula 5 of Anders & Huber (2010)

With pre-specified geometric means, are size factors supposed to be the same for identical samples regardless of total count matrix context?

That is, if I calculate the size factor for a single sample or if I extract that size factor for that sample from a larger context, shouldn't they be identical if the geometric mean was fixed?

For example:

library(DESeq2)

set.seed(353567)

ddsRaw <- makeExampleDESeqDataSet(n=1000, m=40)
gm <- exp(rowMeans(log(counts(ddsRaw))))

dds <- estimateSizeFactors(ddsRaw, geoMeans=gm)

ddsSubset <- estimateSizeFactors(ddsRaw[, 10:20], geoMeans=gm)

all.equal(sizeFactors(dds)[10:20], sizeFactors(ddsSubset))  # Size factors are not equal

I think the code below from estimateSizeFactorsForMatrix() appears to be responsible for the dataset-dependent size factors, but I do not understand how it relates to formula 5, because it is now no longer solely dependent on the reference geometric means.

if (incomingGeoMeans) {
  sf <- sf/exp(mean(log(sf)))
}

Thanks!

size factors deseq2 • 1.2k views

ADD COMMENT • link updated 4.3 years ago by Michael Love 43k • written 4.3 years ago by Megatron • 0

score 1 · Answer 1 · 2020-07-29

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 23 hours ago

United States

The last chunk there is just to set size factors to have geometric mean of 1 for any particular dataset, regardless of their relation to geoMeans. I implemented it this way intentionally. So the normalization of a new dataset will be identical up to a single scaling factor regardless of the samples. What would your desired behavior be? I don't think it makes sense to have the size factors far from 1.

ADD COMMENT • link 4.3 years ago Michael Love 43k

0

Entering edit mode

Thanks for the quick reply!

I was under the impression that using an external geometric mean reference meant that size factors become context-independent. So you could normalize a single sample to a reference and get the same size factor.

ADD REPLY • link 4.3 years ago Megatron • 0

0

Entering edit mode

You do get the same scaling across samples, up to a single global scaling. So the relative scaling between samples is fixed by fixing the geometric means.

ADD REPLY • link 4.3 years ago Michael Love 43k

0

Entering edit mode

Ok. Would it be possible to make this an option in a future release (ie. optionally disable relative scaling of size factors) or mention it somehow in the documentation? When the size factors for the same samples using reference are different it may not be apparent.

ADD REPLY • link 4.3 years ago Megatron • 0

0

Entering edit mode

Sure, I've added this to the documentation:

The size factors will be scaled to have a geometric mean of 1 when supplying geoMeans.