Question

DESeq2 size.factor estimation

1

Entering edit mode

Assa Yeroslaviz ★ 1.5k

@assa-yeroslaviz-1597

Last seen 3 months ago

Germany

Hi all,

I have trouble understanding the results of my deseq() command, espacially the calculated size.factors().

This is the list of my library sizes (I have shortened it for better overview):

> colSums(counts(dds))
A              24593612
B              24477676
C              25137143
D              23676295
E              23581553
...
Q2             18092067
R              19495619
R2              3808119
...
W              23762686
X              25669615

My question regards sample R2. This is a very small library, So I have expected it to have a very large size.factor() when compared to libraries almost 10fold larger.But the size factor I get here is very small.

> sizeFactors(dds)
A         1.0167371
B         0.9574096
C         1.0823689
D         0.9329557
E         0.9519349
F         1.0187297
...
Q2        0.8638798
R         0.9388412
R2        0.1831432
...
W         1.2248133
X         1.2921096

Can someone please explain to me why this is happening. I always thought that a size.factor = 1 would mean that the library size is equal to the calculated " reference genome", but if a library is smaller, the size factor will be higher than 1.

thanks a lot in advance

Assa

deseq2 estimatesizefactors • 3.8k views

ADD COMMENT • link updated 7.4 years ago by Steve Lianoglou ★ 13k • written 7.4 years ago by Assa Yeroslaviz ★ 1.5k

score 1 · Answer 1 · 2016-11-16

1

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 13 months ago

United States

I'm not sure what you mean about the size of the "reference genome", but to put it simply: the sizeFactors are meant to account for the different sequencing depths among your libraries in your DESeqDataSet.

So the fact that your R2 experiment is an outlier as far as number of reads it generated, it is no surprise that is an outlier as far as its size factor value is concerned, as well.

ADD COMMENT • link 7.4 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Thanks Steve for the answer. I meant reference sample deseq is creating to calculate the size factors, sorry for the misspelling.

I have expected that R2 would be an outlier in my data set. i just didn't expect it to be so much smaller than the other samples. I would have thought it would be a higher size factor.

ADD REPLY • link 7.4 years ago Assa Yeroslaviz ★ 1.5k