Offsets and normalization in EdgeR
1
0
Entering edit mode
ognjen011 • 0
@ognjen011-22005
Last seen 3.5 years ago

I am trying to understand the choices when using qunatification pseduoaligners like Kallisto for per-gene estimates. EdgeR official documentation mentions that we can use Tximport "which produces gene-level estimated counts and an associated edgeR offset matrix". In another place I read that EdgeR ignores estimated normalization factors if it detects provided offsets. Finally, I've read that GC content can be used to generate offsets as well.

1) How big a difference these make if we are doing alternative splicing and per-transcript analysis as well? 2) Is it true that calculated offsets are used instead of internal normalization? Is there an explanation somewhere how y$offset variable is handled in each function? 3) If we want to normalize on multiple criteria, can we add all those offsets and is that recommended?

Thanks!

edgeR DifferentialExpression • 1.4k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 10 minutes ago
WEHI, Melbourne, Australia

If you are using tximport to input data to edgeR, just follow the advice in the tximport vignette about how to do that. Once you create the DGEList object for edgeR, you can proceed with a standard edgeR analysis.

1) How big a difference these make if we are doing alternative splicing and per-transcript analysis as well?

Gene-level differential expression, transcript-level differential expression and testing for alternative splicing are quite different things and need three different approaches to quantification and normalization. The tximport import protocol and offset matrix is only for the gene-level differential expression.

2) Is it true that calculated offsets are used instead of internal normalization?

Yes. Offsets are normalization and encode observation-specific effective library sizes. There would be no point in supplying an offset matrix to edgeR if edgeR then overwrote it.

Is there an explanation somewhere how y$offset variable is handled in each function?

Every function has a help page. Basically, offsets are used throughout. The offsets are used whenever edgeR fits a glm and hence the offset becomes part of any downstream analysis such as dispersion estimation or testing.

3) If we want to normalize on multiple criteria, can we add all those offsets and is that recommended?

edgeR accepts offset matrices from external normalization packages such as EDASeq, cqn or tximport but does not create observation-specific offset matrices itself. If you want create your own offset matrix according to your own criteria, then making sure the offset matrix is sensible is your responsibility. I would not recommend just adding up separate offset matrices. If you are worried about GC content, you could use Salmon, which already adjusts for GC content as part of the transcript quantification.

ADD COMMENT

Login before adding your answer.

Traffic: 860 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6