Question

DESeq2 baseMean counts

0

Entering edit mode

zpingfeng • 0

@zpingfeng-9282

Last seen 11 months ago

Australia

Hi Mike,

I wonder how did you calculate the "baseMean counts" in the output of DESeq2. I looked the Genome Biology paper and additional files, it seems you used mean counts or mean expression there. If you give the "baseMean counts" as an output, it seems you have normalised the data against gene lengths (RPKM), but I didn't input anything related to gene lengths in the analysis with DESeq2. Did you estimate gene lengths or just using a same length for all the genes or should it be "mean CPM counts"?

Zhiping

deseq2 • 36k views

ADD COMMENT • link updated 10.2 years ago by Michael Love 43k • written 10.2 years ago by zpingfeng • 0

score 8 · Answer 1 · 2015-11-27

8

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

The base mean is the mean of normalized counts of all samples, normalizing for sequencing depth. It does not take into account gene length. The base mean is used in DESeq2 only for estimating the dispersion of a gene (it is used to estimate the fitted dispersion). For this task, the range of counts for a gene is relevant but not the gene's length (or other technical factors influencing the count, like sequence content).

ADD COMMENT • link 10.2 years ago Michael Love 43k

1

Entering edit mode

But how is it calculated?

ADD REPLY • link 9.9 years ago Marcelo Pereira ▴ 70

2

Entering edit mode

rowMeans(counts(dds, normalized=TRUE))

counts(dds, normalized=TRUE) is a matrix of elements: K_ij / s_j (see DESeq2 paper for definitions)

ADD REPLY • link 9.9 years ago Michael Love 43k

0

Entering edit mode

If there were outliers replaced you also need replaced=TRUE. The following evaluates to TRUE:

all(rowMeans(counts(dds, normalized=TRUE, replaced=TRUE)) == res$baseMean)

ADD REPLY • link 7.1 years ago mlbendall • 0