Correct use of tximport in combination with edgeR cpm()
2
1
Entering edit mode
ATpoint ★ 4.0k
@atpoint-13662
Last seen 5 hours ago
Germany

I imported a set of salmon quantifications into R with tximport default settings and exactly used the code on the manual page for tximport to prepare data for use with edgeR. The result is a DGElist with the offsets for the downstream DGE analysis.

Issue: The DGElist (y$samples) does not contain the lib.size factors (they are all 1) for obtaining TMM-normalized counts via cpm(y, log=F).

Therefore, the question is how to feed normalization factors into y$samples$norm.factors while still using the information from tximport. One can of course run calcNormFactors(y) manually but then the length offsets from tximport are lost. Is there a recommended approach?

Edit 01/2020: For those interested, the code suggestions have now been integrated into the tximport manual, see: https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html#edger

Thanks to Aaron and Mike for the support!

tximport edgeR • 2.1k views
ADD COMMENT
0
Entering edit mode

This is beyond my knowledge of edgeR. I checked that chunk of tximport vignette code with Aaron at some point, to make sure we were doing it properly.

ADD REPLY
0
Entering edit mode

Oops. Spotted something. Will open an issue.

Edit: actually, ignore that, it's fine - phew.

ADD REPLY
0
Entering edit mode

Maybe add a short comment to the tximport vignette referencing the suggestions from Aaron below. Using the corrected counts for things like clustering etc. is standard so I was actually surprised no one asked this before (by best knowledge, maybe I missed the respective threads).

ADD REPLY
3
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 17 hours ago
The city by the bay

Have a look at csaw::calculateCPM(), which does exactly as you request (see usage here). You'll need to convert it back into a SummarizedExperiment, though, the function doesn't take DGEList objects... or you can use csaw::normFactors() instead of calcNormFactors() to keep everything in a SummarizedExperiment form. (Note the difference in the weighted default, though, as this was built for ChIP-seq data.)

ADD COMMENT
0
Entering edit mode

Thanks Aaron,

I think that should do it. If you move your comment to answer I can accept it.

For completeness, here is the code I used:

## convert DGElist to SummarizedExperiments given a DGElist "y" from the code in toplevel question
library(csaw)
se <- SummarizedExperiment(assays = y$counts)
names(assays(se))[1] <- "counts"
se$totals <- y$samples$lib.size
assay(se, "offset") <- y$offset
se.cpm <- calculateCPM(se, use.norm.factors = F, use.offsets = T, log = F)
ADD REPLY
0
Entering edit mode

hi Aaron and AT,

I'll add this to the tximport vignette if Aaron gives the ok. I'm just less knowledgeable about internals so want to make sure I don't promulgate something not accurate.

ADD REPLY
0
Entering edit mode

Looks fine to me. You needn't use.norm.factors if you have use.offsets=TRUE, the latter overrides the former. The only other comments are to avoid T and F, but I know that Mike would never put those in a vignette anyway.

Mike, if you open a PR on the vignette, I can put in some comments to explain what and why, especially around the offset calculation part. Otherwise I'll have to re-remember everything the next time this pops up.

ADD REPLY
1
Entering edit mode

I added the following to the vignette in the devel branch of tximport:

https://github.com/mikelove/tximport/commit/225953efef09f2a925c99242034abfa4d933a0f7

Let me know if that looks ok.

Thanks AT

ADD REPLY
0
Entering edit mode

Thank you both for the outstanding responsiveness to questions and issues. Can you move Aaron's comment to answer so it is the toplevel answer?

ADD REPLY
1
Entering edit mode
@james-w-macdonald-5106
Last seen 5 hours ago
United States

If you have an offsets matrix in your DGEList then you won't use the norm.factors anyway, so it wouldn't matter if you did something with them or not. Put a different way, the offsets are supposed to be better than simple normalization factors, and are preferentially used by glmFit.

ADD COMMENT
0
Entering edit mode

This I understand but how about obtaining normalized counts for non-GLM applications like clustering or checking normalization efficiency by MA plots. Can you use the offsets for those, too?

ADD REPLY

Login before adding your answer.

Traffic: 551 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6