Dear bioconductor community,
First of all, Deseq2 and tximport are great tools and offer a lot of to the research community. After reading through the user guides and various online posts, one point is still unclear (for me).
Normalization factors, size factors and how they are calculated/accessed/implemented in tximport and Deseq2.
So usually I import my samples with tximport followed by Deseq2, see code example below:
> txi <- tximport(files, type="salmon",tx2gene = TIDtoGID)
> dds <- DESeqDataSetFromTximport(txi,colData = sampleSheet, design = ~ Batch+AgeGroup)
> dds=DESeq(dds)
estimating size factors
using 'avgTxLength' from assays(dds), correcting for library size
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
Here are some statements put together that are open for discussion or easy "yes "no" answers:
1) "counts(dds,normalized=TRUE)" returns counts that are corrected for library size and transcript length.
2) "counts(dds,normalized=TRUE)" would be the correct data to plot in a simple gene expression plot (like Deseq2::plotCounts()). Why not "counts(dds,normalized=FALSE)" since DESeq2 input is unnormalized counts?
3) I can get size factors by
nm <- assays(dds)[["avgTxLength"]]
sf3 <- estimateSizeFactorsForMatrix(counts(dds) / nm) #https://support.bioconductor.org/p/97676/
4) I can get the normalization factors per gene by normalizationFactors(dds)
5) I can get the library size normalized and per gene normalized counts by counts(dds)/normalizationFactors(dds)
It would be great if someone could confirm or correct some of the above points.
Thank you!