DESeq2, tximport normalization, normalization factors, size factors
1
0
Entering edit mode
svenbioinf • 0
@svenbioinf-11239
Last seen 16 days ago
Münster

Dear bioconductor community,

First of all, Deseq2 and tximport are great tools and offer a lot of to the research community. After reading through the user guides and various online posts, one point is still unclear (for me).

Normalization factors, size factors and how they are calculated/accessed/implemented in tximport and Deseq2.

So usually I import my samples with tximport followed by Deseq2, see code example below:

> txi <- tximport(files, type="salmon",tx2gene = TIDtoGID)
> dds <- DESeqDataSetFromTximport(txi,colData = sampleSheet, design = ~ Batch+AgeGroup)
> dds=DESeq(dds)
estimating size factors
using 'avgTxLength' from assays(dds), correcting for library size
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing


Here are some statements put together that are open for discussion or easy "yes "no" answers:

1) "counts(dds,normalized=TRUE)" returns counts that are corrected for library size and transcript length.

2) "counts(dds,normalized=TRUE)" would be the correct data to plot in a simple gene expression plot (like Deseq2::plotCounts()). Why not "counts(dds,normalized=FALSE)" since DESeq2 input is unnormalized counts?

3) I can get size factors by

 nm <- assays(dds)[["avgTxLength"]]
sf3 <- estimateSizeFactorsForMatrix(counts(dds) / nm) #https://support.bioconductor.org/p/97676/


4) I can get the normalization factors per gene by normalizationFactors(dds)

5) I can get the library size normalized and per gene normalized counts by counts(dds)/normalizationFactors(dds)

It would be great if someone could confirm or correct some of the above points.

Thank you!

0
Entering edit mode
@mikelove
Last seen 2 hours ago
United States

1) "counts(dds,normalized=TRUE)" returns counts that are corrected for library size and transcript length.

Yes, and the indication is the message "using 'avgTxLength' from assays(dds), correcting for library size"

2) counts(dds,normalized=TRUE) would be the correct data to plot in a simple gene expression plot (like Deseq2::plotCounts()). Why not counts(dds,normalized=FALSE)

Because the y values in the plot would show library size differences, when we are interested in biological differences.

3) I can get size factors by

...

Yes, these are the size factors calculated to deal with library size differences.

4) I can get the normalization factors per gene by normalizationFactors(dds)

Yes

5) I can get the library size normalized and per gene normalized counts by counts(dds)/normalizationFactors(dds)

This is identical to counts(dds, normalized=TRUE).