How to use specific size factors and average transcript lengths from tximport in DESeq2?
1
0
Entering edit mode
grothmn • 0
@9903acb3
Last seen 6 weeks ago
Germany

Hi,

I would like to use Salmon count data and DEseq2 to identify differentially expressed genes with pre-defined size factors for the different samples (as total transcripts are biased between samples) :

sizeFactors(dds) <- size_factors
dds <- DESeq(dds)
#using pre-existing size factors
#estimating dispersions
#gene-wise dispersion estimates
#mean-dispersion relationship
#final dispersion estimates
#fitting model and testing


Do I understand correctly that using pre-existing size factors ignores normalisation factors derived from using 'avgTxLength' from assays(dds), correcting for library size (i.e. when no pre-existing size factors were defined)? How can I combine pre-existing size factors with average transcript length normalisation to derive normalisationFactors?

Thanks!

DESeq2 tximport • 635 views
1
Entering edit mode
@mikelove
Last seen 18 hours ago
United States

You can just use your own normalization factors.

This is the code that does internal size factor estimation with avgTxLength from Salmon or other transcript abundance tools:

https://github.com/thelovelab/DESeq2/blob/devel/R/methods.R#L384-L390

https://github.com/thelovelab/DESeq2/blob/devel/R/core.R#L2185-L2189

Instead of estimating sf here, you want to apply your own predefined vector

Then you have:

sf # predefined, should have geometric mean of ~1
nm <- assays(dds)[["avgTxLength"]]
nm <- nm / exp(rowMeans(log(nm))) # divide out the row-wise geometric mean
nf <- t( t(nm) * sf )
normalizationFactors(dds) <- nf

0
Entering edit mode

t( t(nm) * sf )

That's exactly what I was looking for!

Thanks a lot, Michael