Question: Correct use of tximport in combination with edgeR cpm()
0
gravatar for ATpoint
9 weeks ago by
ATpoint10
Germany
ATpoint10 wrote:

I imported a set of salmon quantifications into R with tximport default settings and exactly used the code on the manual page for tximport to prepare data for use with edgeR. The result is a DGElist with the offsets for the downstream DGE analysis.

Issue: The DGElist (y$samples) does not contain the lib.size factors (they are all 1) for obtaining TMM-normalized counts via cpm(y, log=F).

Therefore, the question is how to feed normalization factors into y$samples$norm.factors while still using the information from tximport. One can of course run calcNormFactors(y) manually but then the length offsets from tximport are lost. Is there a recommended approach?

edger tximport • 215 views
ADD COMMENTlink modified 9 weeks ago by James W. MacDonald50k • written 9 weeks ago by ATpoint10

This is beyond my knowledge of edgeR. I checked that chunk of tximport vignette code with Aaron at some point, to make sure we were doing it properly.

ADD REPLYlink written 9 weeks ago by Michael Love24k

Oops. Spotted something. Will open an issue.

Edit: actually, ignore that, it's fine - phew.

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by Aaron Lun24k

Maybe add a short comment to the tximport vignette referencing the suggestions from Aaron below. Using the corrected counts for things like clustering etc. is standard so I was actually surprised no one asked this before (by best knowledge, maybe I missed the respective threads).

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by ATpoint10
Answer: C: Correct use of tximport in combination with edgeR cpm()
3
gravatar for Aaron Lun
9 weeks ago by
Aaron Lun24k
Cambridge, United Kingdom
Aaron Lun24k wrote:

Have a look at csaw::calculateCPM(), which does exactly as you request (see usage here). You'll need to convert it back into a SummarizedExperiment, though, the function doesn't take DGEList objects... or you can use csaw::normFactors() instead of calcNormFactors() to keep everything in a SummarizedExperiment form. (Note the difference in the weighted default, though, as this was built for ChIP-seq data.)

ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by Aaron Lun24k

Thanks Aaron,

I think that should do it. If you move your comment to answer I can accept it.

For completeness, here is the code I used:

## convert DGElist to SummarizedExperiments given a DGElist "y" from the code in toplevel question
library(csaw)
se <- SummarizedExperiment(assays = y$counts)
names(assays(se))[1] <- "counts"
se$totals <- y$samples$lib.size
assay(se, "offset") <- y$offset
se.cpm <- calculateCPM(se, use.norm.factors = F, use.offsets = T, log = F)
ADD REPLYlink written 9 weeks ago by ATpoint10

hi Aaron and AT,

I'll add this to the tximport vignette if Aaron gives the ok. I'm just less knowledgeable about internals so want to make sure I don't promulgate something not accurate.

ADD REPLYlink written 9 weeks ago by Michael Love24k

Looks fine to me. You needn't use.norm.factors if you have use.offsets=TRUE, the latter overrides the former. The only other comments are to avoid T and F, but I know that Mike would never put those in a vignette anyway.

Mike, if you open a PR on the vignette, I can put in some comments to explain what and why, especially around the offset calculation part. Otherwise I'll have to re-remember everything the next time this pops up.

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by Aaron Lun24k
1

I added the following to the vignette in the devel branch of tximport:

https://github.com/mikelove/tximport/commit/225953efef09f2a925c99242034abfa4d933a0f7

Let me know if that looks ok.

Thanks AT

ADD REPLYlink written 8 weeks ago by Michael Love24k

Thank you both for the outstanding responsiveness to questions and issues. Can you move Aaron's comment to answer so it is the toplevel answer?

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by ATpoint10
Answer: Correct use of tximport in combination with edgeR cpm()
1
gravatar for James W. MacDonald
9 weeks ago by
United States
James W. MacDonald50k wrote:

If you have an offsets matrix in your DGEList then you won't use the norm.factors anyway, so it wouldn't matter if you did something with them or not. Put a different way, the offsets are supposed to be better than simple normalization factors, and are preferentially used by glmFit.

ADD COMMENTlink written 9 weeks ago by James W. MacDonald50k

This I understand but how about obtaining normalized counts for non-GLM applications like clustering or checking normalization efficiency by MA plots. Can you use the offsets for those, too?

ADD REPLYlink written 9 weeks ago by ATpoint10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 269 users visited in the last hour