Question

Robust FPM with normalization factors (from tximport)?

0

Entering edit mode

Roger • 0

@roger-12673

Last seen 2.6 years ago

Netherlands

Hi,

I am using Salmon to quantify reads in transcripts and aggregating them with tximport. I load the resulting object into a DESeq object with DESeqDataSetFromTximport and proceed as described in the vignette. Normally I like to present normalized expression data of certain genes as CPM/FPM, which can conveniently achieved with the fpm function. However, when data is obtained via tximport, average transcript lengths are present in the DESeq object and the fpm function does not apply any normalization.

Now, I am considering two ways to deal with this, but I am not sure what is more appropriate:

1) Calculate FPM on the normalized counts:

k <- counts(object,normalized=T)
library.sizes <- colSums(k)
1e+06 * sweep(k, 2, library.sizes, "/")

2) Estimate the sizeFactors of the DESeq object and proceed as usual:

> k <- counts(object,normalized=F)
> sf <- estimateSizeFactorsForMatrix(counts(object) )
> library.sizes <- sf * exp(mean(log(colSums(k))))
> 1e+06 * sweep(k, 2, library.sizes, "/")

3) Same as above, but dividing by average transcript length

sf <- estimateSizeFactorsForMatrix(counts(object) ) / assays(object)[["avgTxLength"]]

What would be more correct in this case? Is there a superior alternative?

deseq2 tximport normalization • 697 views

ADD COMMENT • link updated 5.0 years ago by Michael Love 41k • written 5.0 years ago by Roger • 0

score 0 · Answer 1 · 2019-05-07

I'm forgetting why I have the part about not performing robust normalization with fpm when there are average transcript lengths now that I read over my documentation. In order to preserve the information in the average transcript lengths, which both summarizes changes in gene length from splicing or sample-specific biases, I would take the normalized count matrix, counts(dds, normalized=TRUE), which is almost what you want but it's not the correct scale. You can divide this by mean(k) and multiply by 1e6.

k <- counts(dds, normalized=TRUE)
cpm <- k / mean(k) * 1e6