Question: Robust FPM with normalization factors (from tximport)?
0
gravatar for Roger
7 weeks ago by
Roger0
Roger0 wrote:

Hi,

I am using Salmon to quantify reads in transcripts and aggregating them with tximport. I load the resulting object into a DESeq object with DESeqDataSetFromTximport and proceed as described in the vignette. Normally I like to present normalized expression data of certain genes as CPM/FPM, which can conveniently achieved with the fpm function. However, when data is obtained via tximport, average transcript lengths are present in the DESeq object and the fpm function does not apply any normalization.

Now, I am considering two ways to deal with this, but I am not sure what is more appropriate:

1) Calculate FPM on the normalized counts:

k <- counts(object,normalized=T)
library.sizes <- colSums(k)
1e+06 * sweep(k, 2, library.sizes, "/")

2) Estimate the sizeFactors of the DESeq object and proceed as usual:

> k <- counts(object,normalized=F)
> sf <- estimateSizeFactorsForMatrix(counts(object) )
> library.sizes <- sf * exp(mean(log(colSums(k))))
> 1e+06 * sweep(k, 2, library.sizes, "/")

3) Same as above, but dividing by average transcript length

sf <- estimateSizeFactorsForMatrix(counts(object) ) / assays(object)[["avgTxLength"]]

What would be more correct in this case? Is there a superior alternative?

ADD COMMENTlink modified 7 weeks ago by Michael Love24k • written 7 weeks ago by Roger0
Answer: Robust FPM with normalization factors (from tximport)?
0
gravatar for Michael Love
7 weeks ago by
Michael Love24k
United States
Michael Love24k wrote:

I'm forgetting why I have the part about not performing robust normalization with fpm when there are average transcript lengths now that I read over my documentation. In order to preserve the information in the average transcript lengths, which both summarizes changes in gene length from splicing or sample-specific biases, I would take the normalized count matrix, counts(dds, normalized=TRUE), which is almost what you want but it's not the correct scale. You can divide this by mean(k) and multiply by 1e6.

k <- counts(dds, normalized=TRUE)
cpm <- k / mean(k) * 1e6
ADD COMMENTlink written 7 weeks ago by Michael Love24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 185 users visited in the last hour