Question: Robust FPM with normalization factors (from tximport)?
gravatar for Roger
5 months ago by
Roger0 wrote:


I am using Salmon to quantify reads in transcripts and aggregating them with tximport. I load the resulting object into a DESeq object with DESeqDataSetFromTximport and proceed as described in the vignette. Normally I like to present normalized expression data of certain genes as CPM/FPM, which can conveniently achieved with the fpm function. However, when data is obtained via tximport, average transcript lengths are present in the DESeq object and the fpm function does not apply any normalization.

Now, I am considering two ways to deal with this, but I am not sure what is more appropriate:

1) Calculate FPM on the normalized counts:

k <- counts(object,normalized=T)
library.sizes <- colSums(k)
1e+06 * sweep(k, 2, library.sizes, "/")

2) Estimate the sizeFactors of the DESeq object and proceed as usual:

> k <- counts(object,normalized=F)
> sf <- estimateSizeFactorsForMatrix(counts(object) )
> library.sizes <- sf * exp(mean(log(colSums(k))))
> 1e+06 * sweep(k, 2, library.sizes, "/")

3) Same as above, but dividing by average transcript length

sf <- estimateSizeFactorsForMatrix(counts(object) ) / assays(object)[["avgTxLength"]]

What would be more correct in this case? Is there a superior alternative?

normalization deseq2 tximport • 102 views
ADD COMMENTlink modified 5 months ago by Michael Love25k • written 5 months ago by Roger0
Answer: Robust FPM with normalization factors (from tximport)?
gravatar for Michael Love
5 months ago by
Michael Love25k
United States
Michael Love25k wrote:

I'm forgetting why I have the part about not performing robust normalization with fpm when there are average transcript lengths now that I read over my documentation. In order to preserve the information in the average transcript lengths, which both summarizes changes in gene length from splicing or sample-specific biases, I would take the normalized count matrix, counts(dds, normalized=TRUE), which is almost what you want but it's not the correct scale. You can divide this by mean(k) and multiply by 1e6.

k <- counts(dds, normalized=TRUE)
cpm <- k / mean(k) * 1e6
ADD COMMENTlink written 5 months ago by Michael Love25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 247 users visited in the last hour